early stopping softmax批量梯度下降(BGD)手动实现

59 阅读 0 评论 39 点赞

我是靠谱客的博主聪慧音响，最近开发中收集的这篇文章主要介绍early stopping softmax批量梯度下降(BGD)手动实现，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

环境

scikit-learn==0.21.3
python==3.7
numpy==1.16.4
jupyter

数据集

使用sklearn鸢尾花数据集，是个字典，keys有[‘data’, ‘target’, ‘target_names’, ‘DESCR’, ‘feature_names’, ‘filename’], data有四个特征

from sklearn import datasets
iris = datasets.load_iris()
print(list(iris.keys()))
print(iris["data"][:3])
print(iris["data"].shape)
# label
iris["target_names"]
iris["target"]

['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']
[[5.1 3.5 1.4 0.2]
[4.9 3.
1.4 0.2]
[4.7 3.2 1.3 0.2]]
(150, 4)
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
array([0, 0, 0, 0..., 1,1,1...2, 2, 2]

训练集、验证集、测试集划分

import numpy as np
# 选两列特征做数据
X = iris["data"][:, (2, 3)]
# petal length, petal width
y = iris["target"]
# bias add the bias term for every instance ( ????0=1 )
X_with_bias = np.c_[np.ones([len(X), 1]), X]
print(X_with_bias[:3])

[[1.
1.4 0.2]
[1.
1.4 0.2]
[1.
1.3 0.2]]

from numpy import ndarray
np.random.seed(1000)
def split_test_train(data_array: ndarray, label_array: ndarray, test_ratio=0.2, validation_ratio=0.2):
total_size = len(data_array)
test_size = int(total_size * test_ratio)
validation_size = int(total_size * validation_ratio)
train_size = total_size - test_size - validation_size
rnd_indices = np.random.permutation(total_size)
X_train = data_array[rnd_indices[:train_size]]
y_train = label_array[rnd_indices[:train_size]]
X_valid = data_array[rnd_indices[train_size:-test_size]]
y_valid = label_array[rnd_indices[train_size:-test_size]]
X_test = data_array[rnd_indices[-test_size:]]
y_test = label_array[rnd_indices[-test_size:]]
return X_train, y_train, X_valid, y_valid, X_test, y_test
X_train, y_train, X_valid, y_valid, X_test, y_test = split_test_train(X_with_bias, y)

将类别转换成每种类别的参数向量

对于一个指定的实例x , 首先需要计算出每个类别k的分数，然后将这些分数应用到softmax函数，才能估算出每个类别的概率

# 将类别转成参数向量
# 3是类别数量
# y_train22 = np.eye(3)[y]
def to_one_hot(y):
n_classes = y.max() + 1
m = len(y)
Y_one_hot = np.zeros((m, n_classes))
Y_one_hot[np.arange(m), y] = 1
return Y_one_hot
Y_train_one_hot = to_one_hot(y_train)
Y_valid_one_hot = to_one_hot(y_valid)
Y_test_one_hot = to_one_hot(y_test)
print(y_train[:3])
print(Y_train_one_hot[:3])

输出结果

array([0, 1, 2])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

softmax BDG实现

softmax公式
$sigmaleft(mathbf{s}(mathbf{x})right)_k = dfrac{expleft(s_k(mathbf{x})right)}{sumlimits_{j=1}^{K}{expleft(s_j(mathbf{x})right)}}$
$s_k(mathbf{x})$ 为每个类别k的分数，每个类别参数向量的转置点积实例x
$s_k(mathbf{x})=theta_k^T .mathbf{x}$

def softmax(logits):
exps = np.exp(logits)
exp_sums = np.sum(exps, axis=1, keepdims=True)
return exps / exp_sums

cost function

交叉熵
$dfrac{1}{m}sumlimits_{i=1}^{m}sumlimits_{k=1}^{K}{y_k^{(i)}logleft(hat{p}_k^{(i)}right)}$

交叉熵梯度向量
$nabla_{mathbf{theta}^{(k)}} , J(mathbf{Theta}) = dfrac{1}{m} sumlimits_{i=1}^{m}{ left ( hat{p}^{(i)}_k - y_k^{(i)} right ) mathbf{x}^{(i)}}$

# 输入的特征数量 =3，包括偏置
n_inputs = X_train.shape[1]
# 输出的种类数量 =3
n_outputs = len(np.unique(y_train))
# 初始化学习率
eta = 0.1
# 迭代次数
n_iterations = 5001
m = len(X_train)
# 防止log=0而加的很小的偏置
epsilon = 1e-7
# l2正则超参数 
alpha = 0.1
best_loss = np.infty
Theta = np.random.randn(n_inputs, n_outputs)
for iteration in range(n_iterations):
logits = X_train.dot(Theta)
# 预测概率
Y_proba = softmax(logits)
xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))
# l2正则，权重平方范数的一半，权重第一位是偏置1，排除掉
l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
loss = xentropy_loss + alpha * l2_loss
error = Y_proba - Y_train_one_hot
gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_outputs]), alpha * Theta[1:]]
# 梯度步长
Theta = Theta - eta * gradients
# 交叉验证，计算在验证集上的loss
logits = X_valid.dot(Theta)
Y_proba = softmax(logits)
xentropy_loss = -np.mean(np.sum(Y_valid_one_hot * np.log(Y_proba + epsilon), axis=1))
l2_loss = 1/2 * np.sum(np.square(Theta[1:]))
loss = xentropy_loss + alpha * l2_loss
if iteration % 500 == 0:
print(iteration, loss)
if loss < best_loss:
best_loss = loss
# 早期停止，下一轮迭代 验证集的loss比上一轮大，则停止训练，防过拟合
else:
print(iteration - 1, best_loss)
print(iteration, loss, "early stopping!")
break

计算测试集上的accuracy

logits = X_test.dot(Theta)
Y_proba = softmax(logits)
y_predict = np.argmax(Y_proba, axis=1)
accuracy_score = np.mean(y_predict == y_test)

最后

以上就是聪慧音响为你收集整理的early stopping softmax批量梯度下降(BGD)手动实现的全部内容，希望文章能够帮你解决early stopping softmax批量梯度下降(BGD)手动实现所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：机器学习
浏览次数：59 次浏览
发布日期：2024-01-25 02:40:51
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_okf2_14__23_c2.html

early stopping softmax批量梯度下降(BGD)手动实现

概述

环境

数据集

训练集、验证集、测试集划分

将类别转换成每种类别的参数向量

softmax BDG实现

最后

评论列表共有 0 条评论

发表评论取消回复

early stopping softmax批量梯度下降(BGD)手动实现

概述

环境

数据集

训练集、验证集、测试集划分

将类别转换成 每种类别的参数向量

softmax BDG实现

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

将类别转换成每种类别的参数向量

发表评论取消回复