Linear Regression - Normal Equation and Regularization

65 阅读 0 评论 43 点赞

我是靠谱客的博主拼搏树叶，最近开发中收集的这篇文章主要介绍Linear Regression - Normal Equation and Regularization，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

In linear regression problems, we can use a method called Normal Equation to fit the parameters.
Suppose we have a training set like this:
$left[begin{matrix}(x^{(1)})^T \ (x^{(2)})^T \ ... \ (x^{(m)})^Tend{matrix}right]$
where:
$x^{(i)} = left[begin{matrix}x_0^{(i)} \ x_1^{(i)} \ ... \ x_n^{(i)}end{matrix}right]$
and the label set:
$left[begin{matrix}y^{(1)} \ y^{(2)} \ ... \ y^{(m)}end{matrix}right]$
We wants to fit parameters
$left[begin{matrix}theta_0 \ theta_1 \ ... \ theta_nend{matrix}right]$
to make this equation:
$y||^2$
to have its global minimum. which is:
${argmin}_{theta} ||Xcdottheta - y||^2 = (X^TX)^{-1}cdot X^Ty$
Let’s prove it.
We take the partial derivatives of each parameters. for $theta_j$ , we find that:
$theta_j} = sum_{i = 1}^{m} ((x^{(i)})^Ttheta -y^{(i)})cdot x_j^{(i)} = 0$
tranform this quation, we find that:
$begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]Xcdottheta = left[ begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]cdot y$
combine all the n+1 equations, we find that:
$X^TXtheta = X^Ty$
$(X^TX)^{-1}X^Ty$

Then we involve reguarization, which means, we want to change the function J to be:
$y||^2 + lambda sum_{j=1}^n theta_j^2$
where $λ$ is a constant called the regularization parameter.
Still, we calculate the partial derivative for each $theta_j$ . Note that the partial derivative for $theta_0$ is not change.
$theta_j} = sum_{i = 1}^{m} ((x^{(i)})^Ttheta -y^{(i)})cdot x_j^{(i)} + lambdatheta_j= 0 (for j>0)$
$begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]Xcdottheta + lambdatheta_j= left[ begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]cdot y$
$lambdatheta_j = lambda e_j^Ttheta$
where $e_j$ is the unit vector with the jth element be 1 and others be 0
We add all the n+1 equations up, to find :
$(X^TX+lambda L)theta = X^Ty$
$(X^TX + lambda L)^{-1}X^Ty$
where
$L = d i a g (0, 1, 1, . . ., 1)$