我是靠谱客的博主 聪慧音响,最近开发中收集的这篇文章主要介绍early stopping softmax批量梯度下降(BGD)手动实现,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

环境

  • scikit-learn==0.21.3
  • python==3.7
  • numpy==1.16.4
  • jupyter

数据集

使用sklearn鸢尾花数据集,是个字典,keys有[‘data’, ‘target’, ‘target_names’, ‘DESCR’, ‘feature_names’, ‘filename’], data有四个特征

from sklearn import datasets
iris = datasets.load_iris()
print(list(iris.keys()))
print(iris["data"][:3])
print(iris["data"].shape)
# label
iris["target_names"]
iris["target"]
['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']
[[5.1 3.5 1.4 0.2]
[4.9 3.
1.4 0.2]
[4.7 3.2 1.3 0.2]]
(150, 4)
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
array([0, 0, 0, 0..., 1,1,1...2, 2, 2]

训练集、验证集、测试集划分

import numpy as np
# 选两列特征做数据
X = iris["data"][:, (2, 3)]
# petal length, petal width
y = iris["target"]
# bias add the bias term for every instance ( ????0=1 )
X_with_bias = np.c_[np.ones([len(X), 1]), X]
print(X_with_bias[:3])
[[1.
1.4 0.2]
[1.
1.4 0.2]
[1.
1.3 0.2]]
from numpy import ndarray
np.random.seed(1000)
def split_test_train(data_array: ndarray, label_array: ndarray, test_ratio=0.2, validation_ratio=0.2):
total_size = len(data_array)
test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size
rnd_indices = np.random.permutation(total_size)
X_train = data_array[rnd_indices[:train_size]]
y_train = label_array[rnd_indices[:train_size]]
X_valid = data_array[rnd_indices[train_size:-test_size]]
y_valid = label_array[rnd_indices[train_size:-test_size]]
X_test = data_array[rnd_indices[-test_size:]]
y_test = label_array[rnd_indices[-test_size:]]
return X_train, y_train, X_valid, y_valid, X_test, y_test
X_train, y_train, X_valid, y_valid, X_test, y_test = split_test_train(X_with_bias, y)

将类别转换成 每种类别的参数向量

对于一个指定的实例x , 首先需要计算出每个类别k的分数,然后将这些分数应用到softmax函数,才能估算出每个类别的概率

# 将类别转成参数向量
# 3是类别数量
# y_train22 = np.eye(3)[y]
def to_one_hot(y):
n_classes = y.max() + 1
m = len(y)
Y_one_hot = np.zeros((m, n_classes))
Y_one_hot[np.arange(m), y] = 1
return Y_one_hot
Y_train_one_hot = to_one_hot(y_train)
Y_valid_one_hot = to_one_hot(y_valid)
Y_test_one_hot = to_one_hot(y_test)
print(y_train[:3])
print(Y_train_one_hot[:3])

输出结果

array([0, 1, 2])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

softmax BDG实现

softmax公式
σ ( s ( x ) ) k = exp ⁡ ( s k ( x ) ) ∑ j = 1 K exp ⁡ ( s j ( x ) ) sigmaleft(mathbf{s}(mathbf{x})right)_k = dfrac{expleft(s_k(mathbf{x})right)}{sumlimits_{j=1}^{K}{expleft(s_j(mathbf{x})right)}} σ(s(x))k=j=1Kexp(sj(x))exp(sk(x))
s k ( x ) s_k(mathbf{x}) sk(x)为每个类别k的分数, 每个类别参数向量的转置点积实例x
s k ( x ) = θ k T . x s_k(mathbf{x})=theta_k^T .mathbf{x} sk(x)=θkT.x

def softmax(logits):
exps = np.exp(logits)
exp_sums = np.sum(exps, axis=1, keepdims=True)
return exps / exp_sums

cost function

交叉熵
J ( Θ ) = − 1 m ∑ i = 1 m ∑ k = 1 K y k ( i ) log ⁡ ( p ^ k ( i ) ) J(mathbf{Theta}) =- dfrac{1}{m}sumlimits_{i=1}^{m}sumlimits_{k=1}^{K}{y_k^{(i)}logleft(hat{p}_k^{(i)}right)} J(Θ)=m1i=1mk=1Kyk(i)log(p^k(i))

交叉熵梯度向量
∇ θ ( k )   J ( Θ ) = 1 m ∑ i = 1 m ( p ^ k ( i ) − y k ( i ) ) x ( i ) nabla_{mathbf{theta}^{(k)}} , J(mathbf{Theta}) = dfrac{1}{m} sumlimits_{i=1}^{m}{ left ( hat{p}^{(i)}_k - y_k^{(i)} right ) mathbf{x}^{(i)}} θ(k)J(Θ)=m1i=1m(p^k(i)yk(i))x(i)

# 输入的特征数量 =3,包括偏置
n_inputs = X_train.shape[1]
# 输出的种类数量 =3
n_outputs = len(np.unique(y_train))
# 初始化学习率
eta = 0.1
# 迭代次数
n_iterations = 5001
m = len(X_train)
# 防止log=0而加的很小的偏置
epsilon = 1e-7
# l2正则超参数 
alpha = 0.1
best_loss = np.infty
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
logits = X_train.dot(Theta)
# 预测概率
Y_proba = softmax(logits)
xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))
# l2正则,权重平方范数的一半,权重第一位是偏置1,排除掉
l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
loss = xentropy_loss + alpha * l2_loss
error = Y_proba - Y_train_one_hot
gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
# 梯度步长
Theta = Theta - eta * gradients
# 交叉验证,计算在验证集上的loss
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
xentropy_loss = -np.mean(np.sum(Y_valid_one_hot * np.log(Y_proba + epsilon), axis=1))
l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
loss = xentropy_loss + alpha * l2_loss
if iteration % 500 == 0:
print(iteration, loss)
if loss < best_loss:
best_loss = loss
# 早期停止,下一轮迭代 验证集的loss比上一轮大,则停止训练,防过拟合
else:
print(iteration - 1, best_loss)
print(iteration, loss, "early stopping!")
break

计算测试集上的accuracy

logits = X_test.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_test)

最后

以上就是聪慧音响为你收集整理的early stopping softmax批量梯度下降(BGD)手动实现的全部内容,希望文章能够帮你解决early stopping softmax批量梯度下降(BGD)手动实现所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(30)

评论列表共有 0 条评论

立即
投稿
返回
顶部