**Locally Weighted Linear Regression**

*This locally weighted linear regression function is a non-parametric Learning algorithm, where the size of h0(x) is linearly proportional to the size of our training set m. Thus memory sizes increase with the training set.*

Finding a new algorithm that is easy to fit curved lines

- Look at the data at a small point that you’re interested in
- Build a local hypothesis just for that section and try to predict that area
- Given location X where we want to make a prediction,

, where

- The weights depend on the particular point x at which we’re trying to evaluate x

if |x(i) − x| is small, then w(i) is close to 1

if |x(i) − x| is large, then w(i) is small (close to 0) - So how do we determine the appropriate values of θ?

We pick a θ that gives the highest weight based on training examples that are closest to the query point **Bandwidth Parameter**: The function is selected because we want a bell-shaped curve that peaks close to x and then falls of quickly after

helps to identify the shape of the curve (fat vs thin)

Regular Normal Equation:

Normal:

**Probabilistic interpretation of data**

Where is an error term which captures unmodeled effects or random noise

The density of the is given by

This implies that

where

the distribution of y(i)

**Likelihood function**

Given the design matrix X which contains all the

**Maximum likelihood estimation**

We should choose θ so as to make the data as high probability as possible.

We can maximize the log likelihood l(θ):

Maximizing is the same as minimizing , which is the cost function J(θ).