概述
1. LMS Algorithm
The Ordinary Least Squares Regression Model:
h(θ)=θTxCost Function:
J(θ)=12∑i=1m(hθ(xi)−yi)2Gradient Descent Algorithm:
θ:=θ−α∂∂θJ(θ)LMS (least mean squares) update rule (also called for Widrow-Hoff learning rule):
θj:=θj+α∑i=1m(yi−hθ(xi))xijBatch Gradient Descent vs. Stochastic Gradient Descent
# BGD Repeat until convergence { theta = theta + alpha * sum((y_i - h_i) * x_i) } # SGD Loop { for i=1 to m, { theta = theta + alpha * (y_i - h_i) * x_i } }
Normal Equation Solution:
θ=(XTX)−1XTY
2. Probabilistic Interpretation
Predictive Probability Assumption: a Gaussian Distribution
p(y|x;θ)=12π‾‾‾√σe−(y−θTx)22σ2∼(θTx,σ2)Likelihood Function of θ : the probability of the given data y (given i.i.d. assumption)
L(θ)=∏i=1mp(yi|xi;θ)=∏i=1m12π‾‾‾√σe−(yi−θTxi)22σ2 Maximum Likelihood Method: choose θ to maximize L(θ) or the log likelihood l(θ) :
l(θ)=logL(θ)⇒12∑i=1m(yi−θTxi)2θ=argmaxθl(θ)
The least-squares regression model corresponds to the maximum likelihood estimation of θ under a Gaussian distribution assumption on data.
3. Locally Weighted Linear Regression
Motivation: get rid of the problem of feature selection (which leads to the underfitting and overfitting problems)
Parametric vs. Non-parametric learning algorithm
LWR algorithm:
Querying a certain point x ,- Fit
θ to minimize ∑iwi(yi−θTxi)2 , where wi=e−(xi−x)22τ2 - Output θTx
Hence, the (errors on) training examples close to the query point x would be given a much higher weight to determine
θ (local linearity).
- Fit
最后
以上就是俊逸寒风为你收集整理的CS229 Lecture Note(1): Linear Regression的全部内容,希望文章能够帮你解决CS229 Lecture Note(1): Linear Regression所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复