2. Locally Weighted Linear Regression

Locally Weighted Linear Regression

This locally weighted linear regression function is a non-parametric Learning algorithm, where the size of h0(x) is linearly proportional to the size of our training set m. Thus memory sizes increase with the training set.

Finding a new algorithm that is easy to fit curved lines

  1. Look at the data at a small point that you’re interested in
  2. Build a local hypothesis just for that section and try to predict that area
  3. Given location X where we want to make a prediction,
    , where
  4. The weights depend on the particular point x at which we’re trying to evaluate x
    if |x(i) − x| is small, then w(i) is close to 1
    if |x(i) − x| is large, then w(i) is small (close to 0)
  5. So how do we determine the appropriate values of θ?
    We pick a θ that gives the highest weight based on training examples that are closest to the query point
  6. Bandwidth Parameter: The function is selected because we want a bell-shaped curve that peaks close to x and then falls of quickly after

     helps to identify the shape of the curve (fat vs thin)

Regular Normal Equation: W = (X^{T}X)^{-1}(X^{T}Y)
Normal: W = (X^{T}Wei X)^{-1}(X^{T}Wei Y)

Probabilistic interpretation of data

Where  is an error term which captures unmodeled effects or random noise
The density of the is given by

This implies that

 where

the distribution of y(i) 

Likelihood function

Given the design matrix X which contains all the 

Maximum likelihood estimation

We should choose θ so as to make the data as high probability as possible.

We can maximize the log likelihood l(θ):

Maximizing  is the same as minimizing  , which is the cost function J(θ).