 ... @@ -25,9 +25,14 @@ ... @@ -25,9 +25,14 @@ of the statistics; eg. sample mean of the statistics; eg. sample mean \2 \textit{When might your samples not be independent of each other?} \2 \textit{When might your samples not be independent of each other?} \3 This is key because a lot of statistical tests require iid variables \3 This is key because a lot of statistical tests require iid variables \3 Throughput and latency \3 Arrival times between events \3 Two properties of an events (eg. read/write and latency) \2 You can multiply together probs. when independent, have to start using \2 You can multiply together probs. when independent, have to start using conditional probabilities when not conditional probabilities when not \3 Example: Sampling with replacement, sampling w/o replacement \2 \textit{Why do we consider our measurements random variables?} \2 \textit{Why do we consider our measurements random variables?} \3 They are affected by underlying random processes \2 CDF vs. PDF vs. PMF \2 CDF vs. PDF vs. PMF \2 \textit{How do we calculate probability a value will be within a range?} \2 \textit{How do we calculate probability a value will be within a range?} \3 Integral (CDF) at point $b$, minus integral at point $a$ \3 Integral (CDF) at point $b$, minus integral at point $a$ ... @@ -37,6 +42,7 @@ ... @@ -37,6 +42,7 @@ \2 Probability must be in range 0 to 1 \2 Probability must be in range 0 to 1 \2 Independence \2 Independence \2 Adding: mostly used for mutually exclusive events in the same trial \2 Adding: mostly used for mutually exclusive events in the same trial \3 eg. prob. of a write is prob. of insert plus prob. of update \2 Multiplication: Used to calculate probability across multiple trials \2 Multiplication: Used to calculate probability across multiple trials \2 Sampling w/ replacement vs. w/o replacement: relationship to independence \2 Sampling w/ replacement vs. w/o replacement: relationship to independence ... @@ -44,10 +50,9 @@ ... @@ -44,10 +50,9 @@ \2 The value you can expect to get'' \2 The value you can expect to get'' \2 AKA the mean \2 AKA the mean \2 PDF / PMF is balanced on the expected value \2 PDF / PMF is balanced on the expected value \2 Variance (sigma squared) is the expected deviation from the mean \2 Variance (sigma squared) is the expected deviation from the mean \\ (squared) (squared) \2 $E[X]E[Y]$ is expected value of $X$ times expected value of $Y$ \2 $E[X]E[Y]$ is expected value of $X$ times expected value of $Y$ \2 $E[XY]$ is expected value of $X * Y$ \2 $E[XY]$ is expected value of $X * Y$ (joint probability) \2 Linearity of expectation \2 Linearity of expectation \1 Mean, median, mode \1 Mean, median, mode ... @@ -57,7 +62,7 @@ ... @@ -57,7 +62,7 @@ \2 Mistakes with means: large range, skewness, multiplying when not \2 Mistakes with means: large range, skewness, multiplying when not independent, ratio with different bases independent, ratio with different bases \1 Covariance \1 Covariance XXX \2 Measure whether two random variables vary together \2 Measure whether two random variables vary together \2 Sign shows the tendency (together, or opposite) \2 Sign shows the tendency (together, or opposite) \2 Joint probability distribution: eg. A given B \2 Joint probability distribution: eg. A given B ... @@ -79,24 +84,19 @@ ... @@ -79,24 +84,19 @@ \2 Don't look at skewness \2 Don't look at skewness \2 Can only multiply means if independent \2 Can only multiply means if independent \2 \textit{When to use arithmetic vs. Geometric vs. harmonic mean} \2 \textit{When to use arithmetic vs. Geometric vs. harmonic mean} \3 Total is of interest (eg. time), product is of interest (eg. \1 Means of ratios speedup) \2 Case 1: Sum of numerators and denominators both have physical meanings \3 eg. sum of CPU busy times over sum of experiment durations \2 Case 1a: Arithmetic mean can be used if bases are constant \2 Case 1b: Harmonic mean can be used if numerators are constant \2 Case 2: If cases are expected'' to be $a_i = cb_i$, can estimate $c$ by taking geometric mean \1 Picking index of dispersion \1 Picking index of dispersion \2 Range (when bounded) \2 Range (when bounded) \3 Use a variance based metric when using mean, using a percentile based metric when using median \2 Var or stddev (sttdev is in the right units) --- see also CoV \2 Var or stddev (sttdev is in the right units) --- see also CoV \2 Percentiles - 10 and 90, or 5 and 95 (want a sense of how long \2 Percentiles - 10 and 90, or 5 and 95 (want a sense of how long things will take in extreme case things will take in extreme case \2 SIQR (semi interquanile range): middle 50% / 2: very outlier-robust \2 SIQR (semi interquanile range): middle 50\% / 2: very outlier-robust \2 Mean absolute dev (use least) \2 Mean absolute dev (use least) \1 Quantile-quantile plots \1 Quantile-quantile plots XXX \2 For each quartile: plot pairs of what the theoretical distribution \2 For each quartile: plot pairs of what the theoretical distribution should be, and what the empirical (sample) distribution actually is should be, and what the empirical (sample) distribution actually is \2 $x$-axis: theoretical distribution \2 $x$-axis: theoretical distribution ... @@ -106,10 +106,8 @@ ... @@ -106,10 +106,8 @@ \2 Heavy tail / light tail \2 Heavy tail / light tail \1 For next time \1 For next time \2 HW \#5 due tonight \2 Read Chapter 13 on comparing systems \2 Reach Chapter 13 on comparing systems \2 HW \#5 due Friday \2 HW \#6 posted \3 There is a part that you need to do \emph{before} class ... ...
