我是靠谱客的博主 俊逸寒风,最近开发中收集的这篇文章主要介绍CS229 Lecture Note(1): Linear Regression,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

1. LMS Algorithm

  • The Ordinary Least Squares Regression Model:

    h(θ)=θTx

  • Cost Function:

    J(θ)=12i=1m(hθ(xi)yi)2

  • Gradient Descent Algorithm:

    θ:=θαθJ(θ)

  • LMS (least mean squares) update rule (also called for Widrow-Hoff learning rule):

    θj:=θj+αi=1m(yihθ(xi))xij

  • Batch Gradient Descent vs. Stochastic Gradient Descent

    
    # BGD
    
    Repeat until convergence {
        theta = theta + alpha * sum((y_i - h_i) * x_i)
    }
    
    
    # SGD
    
    Loop {
        for i=1 to m, {
            theta = theta + alpha * (y_i - h_i) * x_i
        }
    }
  • Normal Equation Solution:

    θ=(XTX)1XTY

2. Probabilistic Interpretation

  • Predictive Probability Assumption: a Gaussian Distribution

    p(y|x;θ)=12πσe(yθTx)22σ2(θTx,σ2)

  • Likelihood Function of θ : the probability of the given data y (given i.i.d. assumption)

    L(θ)=i=1mp(yi|xi;θ)=i=1m12πσe(yiθTxi)22σ2

  • Maximum Likelihood Method: choose θ to maximize L(θ) or the log likelihood l(θ) :

    l(θ)=logL(θ)12i=1m(yiθTxi)2
    θ=argmaxθl(θ)

The least-squares regression model corresponds to the maximum likelihood estimation of θ under a Gaussian distribution assumption on data.

3. Locally Weighted Linear Regression

  • Motivation: get rid of the problem of feature selection (which leads to the underfitting and overfitting problems)

  • Parametric vs. Non-parametric learning algorithm

  • LWR algorithm:
    Querying a certain point x ,

    1. Fit θ to minimize iwi(yiθTxi)2 , where wi=e(xix)22τ2

    2. Output θTx
    3. Hence, the (errors on) training examples close to the query point x would be given a much higher weight to determine θ (local linearity).

最后

以上就是俊逸寒风为你收集整理的CS229 Lecture Note(1): Linear Regression的全部内容,希望文章能够帮你解决CS229 Lecture Note(1): Linear Regression所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(67)

评论列表共有 0 条评论

立即
投稿
返回
顶部