Programming Exercise 1: Linear Regression with one variableIntroductionCode

77 阅读 0 评论 51 点赞

我是靠谱客的博主清脆白昼，这篇文章主要介绍Programming Exercise 1: Linear Regression with one variableIntroductionCode，现在分享给大家，希望可以做个参考。

Linear Regression with one variable

Introduction
Code
- Plotting the Data
- Gradient Descent

Introduction

In this part of this exercise, you will implement linear regression with one variable to predict proﬁts for a food truck. Suppose you are the CEO of a restaurant franchise and are considering diﬀerent cities for opening a new outlet. The chain already has trucks in various cities and you have data for proﬁts and populations from the cities.You would like to use this data to help you select which city to expand to next. The ﬁle ex1data1.txt contains the dataset for our linear regression problem. The ﬁrst column is the population of a city and the second column is the proﬁt of a food truck in that city. A negative value for proﬁt indicates a loss.

Code

Plotting the Data

Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (proﬁt and population). (Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot.)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


path = 'ex1data1.txt'
data = pd.read_csv(path,header=None,names=['Population','Profit'])
data.head()
# print(data.head())

Population Profit
0 6.1101 17.5920
1 5.5277 9.1302
2 8.5186 13.6620
3 7.0032 11.8540
4 5.8598 6.8233

data.describe()
# print(data.describe())

      Population     Profit

count 97.000000 97.000000
mean 8.159800 5.839135
std 3.869884 5.510262
min 5.026900 -2.680700
25% 5.707700 1.986900
50% 6.589400 4.562300
75% 8.578100 7.046700
max 22.203000 24.147000

data.plot(kind='scatter',x='Population',y='Profit',figsize=(12,8))
# plt.show()

在这里插入图片描述
Figure 1: Scatter plot of training data

Gradient Descent

In this part, you will ﬁt the linear regression parameters θ to our dataset using gradient descent.

Upadata Equations
The obiective of linear regression is to minimize the cost function
J( $theta_0, theta_1$ ) = $1 2 m dfrac {1}{2m}$ $sum_{i=1}^m$ $hat{y}_{i}- y_{i} right)^2$ = $1 2 m dfrac {1}{2m}$ $_{i=1}^m$ $(h_theta (x^{(i)}) - y^{(i)} right)^2$
Where the hypothesis $h_theta (x)$ is given by the linear model
$h_theta(x)=theta^Tx=theta_0+theta_1$
Recall that the parameters of your model are the $theta_j$ values. These are the values you will adjust to minimize cost $J (θ)$ . One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update
$theta_j := theta_j - alphadfrac{1}{m}displaystylesum_{i=1}^m(h_theta(x^{(i)}-y^{(i)})x^{(i)}_j$ (simultaneously updata $theta_j$ for all $j$ ).
With each step of gradient descent, your parameters $θ_j$ come closer to the optimal values that will achieve the lowest cost $J (θ)$ .
Note:We store each example as a row in the the X matrix in Python. To take into account the intercept term ( $θ_0$ ), we add an additional ﬁrst column to X and set it to all ones. This allows us to treat $θ_0$ as simply another ‘feature’.
Implementation
In the following lines, we add another dimension[维度] to our data to accommodate the $θ_0$ intercept term. We also initialize the initial parameters to 0 and the learning rate alpha to 0.01.
Computing the cost $J (θ)$
As you perform gradient descent to learn minimize the cost function $J (θ)$ , it is helpful to monitor the convergence by computing the cost. In this section, you will implement a function to calculate $J (θ)$ so you can check the convergence of your gradient descent implementation.

def computeCost(X,y,theta):
    # $$Jleft( theta right)=frac{1}{2m}sumlimits_{i=1}^{m}{{{left( {{h}_{theta }}left( {{x}^{(i)}} right)-{{y}^{(i)}} right)}^{2}}}$$

    inner = np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2*len(X))

# 在训练集中添加一列，以便我们可以使用向量化的解决方案来计算代价和梯度
data.insert(0,'Ones',1)
# Set X(training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]
X.head()
y.head()
# print(X.head())
# print(y.head())

# 代价函数是numpy矩阵，所以将转换X和y
# 初始化X,y,theta
X = np.matrix(X.values)
y = np.matrix(y.values)
theta = np.matrix(np.array([0,0]))
'''
查看theta,X,y的矩阵维度，以及代价函数（theta初始值为0）
print(theta)
print(X.shape,theta.shape,y.shape)
print(computeCost(X,y,theta))
'''

X.head()
Ones Population
0 1 6.1101
1 1 5.5277
2 1 8.5186
3 1 7.0032
4 1 5.8598
y.head()
Profit
0 17.5920
1 9.1302
2 13.6620
3 11.8540
4 6.8233

batch gradient decent
$theta_j := theta_j - alpha frac{partial}{partial theta_j} J(theta$ )

def gradientDescent(X,y,theta,alpha,iters):
    temp = np.zeros(theta.shape)
    parameters = int(theta.ravel().shape[1])
    cost = np.zeros(iters)

    for i in range(iters):
        error = (X*theta.T)-y

        for j in range(parameters):
            term = np.multiply(error,X[:,j])
            temp[0,j] = theta[0,j]-((alpha/len(X)))*np.sum(term)

        theta = temp
        cost[i] = computeCost(X,y,theta)

    return theta,cost

Debugging


# 初始化学习速率alpha和要执行的迭代次数
alpha = 0.01
iters = 1000
g,cost = gradientDescent(X,y,theta,alpha,iters)
# print(g)
computeCost(X,y,g)
# print(computeCost(X,y,g))

# 2.3 绘制线性模型以及数据
x = np.linspace(data.Population.min(),data.Population.max(),100)
f = g[0,0]+(g[0,1]*x)

fig,ax = plt.subplots(figsize=(12,8))
ax.plot(x,f,'r',label='Prediction')
ax.scatter(data.Population,data.Profit,label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs.Population Size')
plt.show()

fig,ax =plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters),cost,'r')
ax.set_xlabel('IIterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs,Training Epoch')
plt.show()