Deep Learning 2: Linear Regression Note

74 阅读 0 评论 49 点赞

我是靠谱客的博主安静海燕，最近开发中收集的这篇文章主要介绍Deep Learning 2: Linear Regression Note，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

In this Blog, I would summarise the theory and implementation of Linear Regression.

The based materials what I used are CS229 Lecture Note Part 1 and Cousera's Machine Learning Course Lecture 2 to 3.

If you haven't read such materials, I suggest you could read it first.

Linear Regression is the basic Problem we would concern about when we study Supervised Learning.

Since there are lots of formulas and figures, i have no time to make screen-shotcut for all of them. May i am too lazy.

This is my first blog written in english. Why? Because it is nearly impossible for us to study deep learning in chinese. I believe it is much more efficient to use english to describe all the relative things.

OK. This blog may be in chaos, but it is in my thinking style.

OK!

Linear Regression

Simple Problem is Linear Regression with one variable

we assume a hypothesis h(x) = y ,x is a matrix of data set. we want to find a optimistic thetas to fit the h(x) therefore we could use this for predication.

we have already had some training data set. we would use this data to solve the problem.

Step 1: Cost Function LMS (least mean squares)

Step 2: Gradient descent to find the thetas.

repeat until convergence

The key problem is to solve partial derivatives!!

It is easy in Linear Regression, but far more complicated in Neural Network. Back Propagation is an approach.

I don't want to copy the formula here.

Stochastic gradient descent (use all training set per iteration)

Batch gradient descent (use some or one training set per iteration, much faster ,and still easy to converge to a local optima)

Since it is a formulas solving problem. if we can set all partial derivatives to 0 and solve it to get the suitable result.

therefore the normal equations is another approach.

why might the least-squares cost function J be a reasonable choice?

there is probabilistic interpretation.

underfitting and overfitting depend on the parameter we choose!

locally weighted linear regression :

basic idea: give different training set different weight.

sometimes we would face such a problem that the newest data is influenced by the latest dataset. especially in time series. More close,More important!

how to set the weights? remain consideration!

OK.

Next, Linear Regression with Multiple variable!

Approach: transform h(x) into one variable problem

Above figure is from Andrew Ng's lecture 4 ppt.

Feature Scaling Mean Normalization

Learning Rate : if alpha is too small:slow convergence. if alpha is too large: cost function may on decrease on every iteration;may not converge.

OK.

Next topic is the implementation of Linear Regression in Matlab.

we use the Assignment 1 Of ML course to explain.

1 how to compute cost function?

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
%J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

predictions = X*theta;

sqrErrors = (predictions - y).^2;

J = 1/(2*m)*sum(sqrErrors);



% =========================================================================

end

2 how to compute gradient descent

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha

% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %


    theta = theta - alpha/m*X'*(X*theta - y);


    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);

end

end

3 how to solve normal equation

function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression 
%   NORMALEQN(X,y) computes the closed-form solution to linear 
%   regression using the normal equations.

%theta = zeros(size(X, 2), 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
%               to linear regression and put the result in theta.
%

% ---------------------- Sample Solution ----------------------

theta = pinv(X'*X)*X'*y;


% -------------------------------------------------------------


% ============================================================

end

That's it!

Remember X is a matrix!