Commit 2a25ef9a authored by Robert Ricci's avatar Robert Ricci

Lecture notes for Lecture 11

parent 9a54cb61
DOCUMENTS= lecturenotes
include ../../Makerules
\documentclass{article}[12pt]
\usepackage[no-math]{fontspec}
\usepackage{sectsty}
\usepackage[margin=1.25in]{geometry}
\usepackage{outlines}
\setmainfont[Numbers=OldStyle,Ligatures=TeX]{Equity Text A}
\setmonofont{Inconsolata}
\newfontfamily\titlefont[Numbers=OldStyle,Ligatures=TeX]{Equity Caps A}
\allsectionsfont{\titlefont}
\title{CS6963 Lecture \#11}
\author{Robert Ricci}
\date{February 13, 2014}
\begin{document}
\maketitle
\begin{outline}
\1 From last time
\1 Big idea for the day: all statements we make from evals are probabilistic
\1 Quick refresher - sample vs. population
\2 Parameters of prob distribution vs. statistics of the sample
\1 We measure a sample mean, but it is really just an estimate of population
mean
\2 We can get a confidence interval that the true mean is within some range:
significance level / confidence level
\2 Book explanation of way to get confidence level
\3 Get multiple samples (multiple trials per sample), compute stats on the means, treat that as a sample set and take confidence intervals
\2 Again, iid comes up, and this is why you need to be careful in experiment design
\3 \textit{When might you not meet identically distributed criteria?}
\2 Standard error --- not to be confused with standard deviation or STDERR
\1 Confidence interval for sample mean
\2 Lower: $\overline{x} - \frac{z_{1-\alpha/2}s}{\sqrt{n}}$
\2 Upper: $\overline{x} + \frac{z_{1-\alpha/2}s}{\sqrt{n}}$
\2 $\overline{x}$ is sample mean
\2 $s$ is sample stddev
\2 $z_{1-\alpha/2}$ is $(1 - \alpha/2)$ quantile of unit normal dist ($\mu = 0$ and $\sigma = 1$) - note, you are picking $\alpha$
\2 $n$ is the sample size
\2 \textit{So, what does this tell us?}
\3 We are x\% certain that the population mean is between $x$ and $y$
\2 \textit{What do we need to apply this result?}
\3 iid sample
\3 Large samples (30 or greater)
\3 Or sample itself is normally distribted
\2 \textit{When is it not worth computing this?}
\3 When the means are extremely far apart
\2 \textit{When is it important?}
\3 Close enough that it's possible that means lie within each others'
confidence intervals
\2 Testing for mean of particular value - does it lie within the CI?
\2 \textit{When might you want your mean to be the same as another mean?}
\3 Showing insignificant overhead
\1 Showing significance: Paired samples (eg. same benchmarks)
\2 Take samples for two systems under the same workload
\2 Compute statistics of the difference
\2 Compute CI of mean of the difference
\2 If CI contains zero, not statistically different: The hypothesis ``the two
systems are the same'' is supported by the data
\1 Showing significance: t-test (eg. truly random samples)
\2 Best to leave the implementation of this up to someone else
\2 Degrees of freedom: number of independent sources of data that go into
the model: number of samples minus steps that go into the estimation
\2 eg. R includes this as a module
\2 Fun fact: t-test invented as a way of measuring the quality of beer
(Guinness Stout)
\1 Showing significance: visual check
\2 Draw both confidence intervals and means
\2 If CIs don't overlap, one is clearly better
\2 If CIs do overlap, both means fall inside CI of the other: effectively
the same
\2 If the mean of one is in the CI of the other, but this is not true for
both, t-test required
\1 Picking CIs
\2 As discussed before, degree of confidence has to do with the gain/loss of
being outside the range
\2 Reiterate plane example, you don't want to fly on a plane built with
only 99\% confidence intervals
\1 The value of hypothesis testing
\2 State your goal, test whether or not you achieved it
\2 ``On Bullshit''
\2 eg. a good thesis statement is a testable hypothesis
\1 Proportions
\2 Similar, but for categorical outcomes (range not domain)
\2 What proportion of the population consists of category X?
\2 Sample proportion: $\frac{n_1}{n}$
\2 CI for sample proportion $p \mp z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}}$
\2 $np > 10$ required
\2 \textit{Why is this symmetric?}
\1 Picking a sample size
\2 Sample size being too big is rarely a problem
\2 It's just that it can take too much time to get that many samples
\2 All dependent on the variance, which is intuitive
\2 $n = \left(\frac{100zs}{r\overline{z}}\right)^2$
\2 $n = z^2\frac{p(1-p)}{r^2}$
\2 For comparing two, upper edge of lower must be below lower edge of upper
\2 $x \mp z \frac{s}{\sqrt{n}} $
\2 Leave $n$ unbound, set the plus and minus versions with the appropriate
comparison operator, solve for $n$
\1 For next time
\2 Bring your laptop
\2 Sign up for GENI account
\2 Read GENI paper posted on Canvas
\end{outline}
\end{document}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment