Commit e0afb25b authored by Robert Ricci's avatar Robert Ricci

Finish Chapter 14 contents, still need lab stuff

parent b9642f00
......@@ -57,6 +57,49 @@
\2 In the end, gives us $b_1 = \frac{s^2_{xy}}{s_x^2}$
\3 Correlation of $x$ and $y$ divided by variance of $x$
\3 $\frac{\sum{xy} - n \overline{x} \overline{y}}{\sum{x^2} - n(\overline{x})^2}$
\1 SS*
\2 SSE = Sum of squared errors
\2 SST = total sum of squares (TSS): difference from mean
\2 SS0 = square $\overline{y}$ $n$ times
\2 SSY = square of all $y$, so SST = SSY - SS0
\2 SSR = Error explained by regression: SST - SSE
\1 Point of above: we can talk about two sources that explain variance: sum of
squared difference from mean, and sum of errors
\2 $R^2 = \frac{SSR}{SST}$
\2 The ratio is the amount that was explained by the regression - close to 1 is good (1 is max possible)
\2 If the regression sucks, SSR will be close to 0
\1 Remember, our error terms and $b$s are random variables
\2 We can calculate stddev, etc. on them
\2 Variance is $s_e^2 = \frac{SSE}{n-2}$ - MSE, mean squared error
\2 Confidence intervals, too
\2 \textit{What do confidence intervals tell us in this case?}
\3 A: Our confidence in how close to the true slope our estimate is
\3 For example: How sure are we that two slopes are actually different
\2 \textit{When would we want to show that the confidence interval for $b_1$ includes zero?}
\1 Confidence intervals for predictions
\2 Confidence intervals tightest near middle of sample
\2 If we go far out, our confidence is low, which makes intuitive sense
\2 $s_e \big(\frac{1}{m} + \frac{1}{n} + \frac{(x_p - \overline{x}^2)}{\sum_{x^2} - n \overline{x}^2}\big)^\frac{1}{2}$
\2 $s_e$ is sttdev of error
\2 $m$ is how many predictions we are making
\2 $p$ is value at which we are predicting ($x$)
\2 $x_p - \overline{x}$ is capturing difference from center of sample
\2 \textit{Why is it smaller for more $m$}?
\3 Accounts for variance, assumption of normal distribution
\1 Residuals
\2 AKA error values
\2 We can expect several things from them if our assumptions about regressions are correct
\2 They will not show trends: \textit{why would this be a problem}
\3 Tells us that an assumption has been violated
\3 If not randomly distributed for different $x$, tells us there is a systematic error at high or low values - error and predictor not independent
\2 Q-Q plot of error distribution vs. normal ditribution
\2 Want the spread of stddev to be constant across range
\1 For next time
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment