All new accounts created on Gitlab now require administrator approval. If you invite any collaborators, please let Flux staff know so they can approve the accounts.

lecturenotes.tex 4.19 KB
Newer Older
Robert Ricci's avatar
Robert Ricci committed
1 2
\documentclass{article}[12pt]

Robert Ricci's avatar
Robert Ricci committed
3 4
\input{../../texstuff/fonts.sty}
\input{../../texstuff/notepaper.sty}
Robert Ricci's avatar
Robert Ricci committed
5 6
\usepackage{outlines}

Robert Ricci's avatar
Robert Ricci committed
7
\title{CS6963 Lecture \#16}
Robert Ricci's avatar
Robert Ricci committed
8
\author{Robert Ricci}
Robert Ricci's avatar
Robert Ricci committed
9
\date{March 12, 2014}
Robert Ricci's avatar
Robert Ricci committed
10 11 12 13 14 15 16

\begin{document}

\maketitle

\begin{outline}

17
\1 Today: How well does your data fit a line?
18
    \2 Talk about linear in detail, look at some more complicated ones in R
19
    \2 Eyeballing is just not rigorous enough
20

21 22 23 24 25 26 27 28 29
\1 Basic model: $y_i = b_0 + b_1x_i + e_i$
    \2 $y_i$ is the prediction
    \2 $b_0$ is the y-intercept
    \2 $b_1$ is the slope
    \2 $x_i$ is the predictor
    \2 $e_i$ is the error
    \2 \textit{Which of these are random variables?}
        \3 A: All but $x_i$ the $b$s are estimated from random variables, $e$ is difference between random variables
        \3 So, we can compute statistics on them
30

31 32 33 34 35 36
\1 Two criteria for getting $b$s
    \2 Zero total error
    \2 Minimize SSE (sum of squared errors)
    \2 Example of why one is not enough: two points, infinite lines with zero total error
    \2 Squared errors always positive, so this criterion alone could overshoot
        or undershoot
37

38 39 40 41 42
\1 Deriving $b_0$ is easy
    \2 Solve for $e_i$: $y_i - (b_0 + b_i x_i)$
    \2 Take the mean over all $i$: $\overline{x} = \overline{y} - b_0 - b_1 \overline{x}$
    \2 Set mean error to 0 to get $b_0 = \overline{y} - b_1 \overline{x}$
    \2 Now we just need $b_1$
43

44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
\1 Deriving $b_1$ is harder
    \2 SSE = sum of errors squared over all $i$
    \2 We want a minimum value for this
    \2 It's a function with one local maximum 
    \2 So we can differentiate and look for zero
    \2 $s_y^2 - 2b_1s^2_{xy} + b_1^2s_x^2$, then take derivative
    \2 $s_{xy}$ is correlation coefficient of $x$ and $y$ (see p. 181)
    \2 In the end, gives us $b_1 = \frac{s^2_{xy}}{s_x^2}$
        \3 Correlation of $x$ and $y$ divided by variance of $x$
    \3 $\frac{\sum{xy} - n \overline{x} \overline{y}}{\sum{x^2} - n(\overline{x})^2}$
    
\1 SS*
    \2 SSE = Sum of squared errors
    \2 SST = total sum of squares (TSS): difference from mean
    \2 SS0 = square $\overline{y}$ $n$ times
    \2 SSY = square of all $y$, so SST = SSY - SS0
    \2 SSR = Error explained by regression: SST - SSE
61

62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
\1 Point of above: we can talk about two sources that explain variance: sum of
    squared difference from mean, and sum of errors
    \2 $R^2 = \frac{SSR}{SST}$
    \2 The ratio is the amount that was explained by the regression - close to 1 is good (1 is max possible)
    \2 If the regression sucks, SSR will be close to 0

\1 Remember, our error terms and $b$s are random variables
    \2 We can calculate stddev, etc. on them
    \2 Variance is $s_e^2 = \frac{SSE}{n-2}$ - MSE, mean squared error
    \2 Confidence intervals, too
    \2 \textit{What do confidence intervals tell us in this case?}
        \3 A: Our confidence in how close to the true slope our estimate is
        \3 For example: How sure are we that two slopes are actually different
    \2 \textit{When would we want to show that the confidence interval for $b_1$ includes zero?}

\1 Residuals
    \2 AKA error values
    \2 We can expect several things from them if our assumptions about regressions are correct
    \2 They will not show trends: \textit{why would this be a problem}
        \3 Tells us that an assumption has been violated
        \3 If not randomly distributed for different $x$, tells us there is a systematic error at high or low values - error and predictor not independent
    \2 Q-Q plot of error distribution vs. normal ditribution   
    \2 Want the spread of stddev to be constant across range
85

86 87 88 89 90
\1 Switch to R
    \2 Show example of linear fitting (good fit)
    \2 Show example of linear fitting (bad fit)
    \2 Show example of polynomial fit (intercept and 3 coefficients)

91
\1 For next time
Robert Ricci's avatar
Robert Ricci committed
92 93 94 95 96 97 98
    \2 I won't be here week after spring break
    \2 papers3 due Tuesday of spring break week
    \2 On Thursday, we will have some guest students talk about paper
        writing process
    \2 lab2 now due Friday after spring break
        \3 I want some more from you now, so be sure to update your fork
        \3 Mainly, I want to know how you will improve the graph you
99 100
            are reproducing, and to actually look a bit at the code you
            find
Robert Ricci's avatar
Robert Ricci committed
101 102 103 104

\end{outline}

\end{document}