2. Locally Weighted Linear Regression

Revision for “2. Locally Weighted Linear Regression” created on October 15, 2015 @ 12:13:47

2. Locally Weighted Linear Regression
<h4><strong>Locally Weighted Linear Regression</strong></h4> <em>This locally weighted linear regression function is a <strong>non-parametric Learning algorithm, </strong>where the size of h0(x) is linearly proportional to the size of our training set <strong>m</strong>. Thus memory sizes increase with the training set.</em> Finding a new algorithm that is easy to fit curved lines <p id="GjNObiy"><img class="alignnone wp-image-1490 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561da7d8cde48.png" alt="" width="339" height="224" /></p> <ol> <li>Look at the data at a small point that you're interested in</li> <li>Build a local hypothesis just for that section and try to predict that area<img class="alignnone size-full wp-image-1491 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561da80428dcd.png" alt="" /></li> <li>Given location X where we want to make a prediction, <img class="alignnone size-full wp-image-1492 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561da92cb9a5d.png" alt="" />, where <img class="alignnone size-full wp-image-1493 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561da9512a40c.png" alt="" /></li> <li>The weights depend on the particular point x at which we’re trying to evaluate x if |x(i) − x| is small, then w(i) is close to 1 if |x(i) − x| is large, then w(i) is small (close to 0)</li> <li>So how do we determine the appropriate values of θ? We pick a θ that gives the highest weight based on training examples that are closest to the query point</li> <li><strong>Bandwidth Parameter</strong>: The function is selected because we want a bell-shaped curve that peaks close to x and then falls of quickly after <img class="alignnone wp-image-1495 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561dab16796dc.png" alt="" width="301" height="166" /> <p id="OvuubaZ"><img class="alignnone size-full wp-image-1498 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561dabe8ca075.png" alt="" /> helps to identify the shape of the curve (fat vs thin)</p> </li> </ol> Regular Normal Equation: <img class="latex" title="W = (X^{T}X)^{-1}(X^{T}Y) " alt="W = (X^{T}X)^{-1}(X^{T}Y) " /> Normal: <img class="latex" title="W = (X^{T}Wei X)^{-1}(X^{T}Wei Y) " alt="W = (X^{T}Wei X)^{-1}(X^{T}Wei Y) " /> <strong>Probabilistic interpretation of data</strong> <img class="alignnone size-full wp-image-1509 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db7c4415fb.png" alt="" /> Where <img class="alignnone size-full wp-image-1510 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db7d28ec31.png" alt="" /> is an error term which captures unmodeled effects or random noise The density of the<img class="alignnone size-full wp-image-1510 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db7d28ec31.png" alt="" /> is given by <p id="FYbyXdL"><img class="alignnone size-full wp-image-1511 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db8363fd14.png" alt="" /></p> This implies that <p id="RPYkSwL"><img class="alignnone size-full wp-image-1512 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db848c1694.png" alt="" /> where</p> the distribution of y(i) <img class="alignnone size-full wp-image-1514 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db8a96d9ac.png" alt="" /> <h4><strong>Likelihood function</strong></h4> Given the design matrix X which contains all the <img class="alignnone size-full wp-image-1517 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db8e05e669.png" alt="" /> <img class="alignnone size-full wp-image-1515 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db8d486e93.png" alt="" /> <p id="qbBgpds"><img class="alignnone size-full wp-image-1518 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db904736e3.png" alt="" /></p> <h4><strong>Maximum likelihood estimation</strong></h4> We should choose θ so as to make the data as high probability as possible. We can maximize the log likelihood l(θ): <p id="kIxcTGi"><img class="alignnone size-full wp-image-1519 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561db9cd0639f.png" alt="" /></p> Maximizing <img class="alignnone size-full wp-image-1592 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561f061a5e715.png" alt="" /> is the same as minimizing <img class="alignnone size-full wp-image-1593 " src="http://theroadchimp.com/wp-content/uploads/sites/3/2015/10/img_561f0ae9478fb.png" alt="" /> , which is the cost function J(θ). &nbsp;

OldNewDate CreatedAuthorActions
October 15, 2015 @ 12:13:47 roadchimp
October 15, 2015 @ 12:13:37 [Autosave] roadchimp
October 15, 2015 @ 01:44:03 roadchimp
October 14, 2015 @ 02:23:14 roadchimp
October 14, 2015 @ 02:12:54 roadchimp
October 14, 2015 @ 02:11:27 roadchimp
October 14, 2015 @ 02:05:24 roadchimp
October 14, 2015 @ 01:55:01 roadchimp
October 14, 2015 @ 01:18:27 roadchimp
October 14, 2015 @ 01:12:25 roadchimp
October 14, 2015 @ 01:08:55 roadchimp