\item If they didn't match in the end, why do you think that might be?
\item Did the author's results match your own?
\item Did the author's results and conclusion match your own?
\item If so, does this increase your confidence in your own assesment?
\item If not, which result are you more confident in?
\item How well did the author use the evaluation techniques covered in this
\item Did they use an appropriate method to select the number of runs
per experiment?
\item When reporting results, did they indicate what type of index of
central tendency (eg. mean, median, mode) they used, and if appropriate,
did they justify why?
\item Did they present some measure of variability? (eg. error bars in
plots, confidence intervals, reporting variance or stddev, etc.) If
presented, did they justify the one they used?
\item If mean or median results were similar between the alternatives,
did the author use appropriate techniques (eg. examining confidence
intervals for the means) to show that the results were or were not
statistically significant?
\item Did you notice any of the common mistakes we've discussed (eg. the
ones in Chapter~2 from the book)?
