概述
关于Keras里常用各种优化器的实验
- 前言
- 其他数据以及神经网络结构
- 优化器之SGD
- 优化器之Adagrad
- 优化器之Adam
- 优化器之RMSprop
- 优化器之Adadelta
- 优化器之Adamax
- 优化器之Nadam
- 后记
前言
一直以来只知道根据例程里的使用SGD优化器,也不知道有什么好处和速度,为此就列出Keras里常用的优化器,分别对比一下他们效果。
本实验有SGD,Adagrad,Adam,RMSprop,Adadelta,Adamax,Nadam的实验对比。
本人只合适小白学习者看,反正我也不懂什么公式的。
其他数据以及神经网络结构
为了验证这个实验,使用的方法是控制变量法(物理专业是不是很熟悉?)去进行实验。
数据:Keras自带的mnist数据集
神经网络结构:基础LeNet结构
Keras版本:
Python版本:
统一代码:(注:搭建神经网络有多种方法,这只是其中一种,其它方法请参考Keras添加网络层的N种方法)
import keras
from keras.datasets import mnist
from keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense, Activation, Input
from keras.models import Sequential, Model
from keras.losses import categorical_crossentropy
from keras.utils import to_categorical
from keras.optimizers import SGD, Adagrad, Adam, RMSprop, Adadelta, Adamax, Nadam
import numpy as np
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = np.expand_dims(X_train, 3)
X_test = np.expand_dims(X_test, 3)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
def net(input_shape, output_class):
zi = Input(input_shape)
z = Conv2D(20, (3, 3))(zi)
z = MaxPool2D((2, 2))(z)
z = Activation('tanh')(z)
z = Conv2D(30, (3, 3))(z)
z = MaxPool2D((2, 2))(z)
z = Activation('tanh')(z)
z = Dropout(0.5)(z)
z = Flatten()(z)
z = Dense(1000)(z)
z = Dropout(0.5)(z)
z = Dense(output_class)(z)
zo = Activation('softmax')(z)
model = Model(inputs=zi, output=zo)
model.compile(SGD(0.01, 0.9, True), categorical_crossentropy, ['accuracy'])
# 不同的实验改变这里的参数
return model
model = net(X_train.shape[1:], y_train.shape[1])
model.fit(X_train, y_train, batch_size=1000, epochs=2, validation_data=(X_test, y_test))
优化器之SGD
- 使用SGD,默认参数,两轮fit后train的accuracy为0.6777,test的为0.8754,再fit两轮train为0.8293,test为0.918
Epoch 1/2
60000/60000 [==============================] - 35s 583us/step - loss: 1.9640 - accuracy: 0.3472 - val_loss: 0.9992 - val_accuracy: 0.7992
Epoch 2/2
60000/60000 [==============================] - 40s 667us/step - loss: 1.0139 - accuracy: 0.6777 - val_loss: 0.5757 - val_accuracy: 0.8754
Epoch 1/2
60000/60000 [==============================] - 35s 589us/step - loss: 0.7026 - accuracy: 0.7824 - val_loss: 0.4138 - val_accuracy: 0.9007
Epoch 2/2
60000/60000 [==============================] - 39s 648us/step - loss: 0.5542 - accuracy: 0.8293 - val_loss: 0.3311 - val_accuracy: 0.9181
- SGD,有动量为0.9,Nesterov为True, 两轮fit后train为0.9107,test为0.9573。
Epoch 1/2
60000/60000 [==============================] - 36s 606us/step - loss: 0.8629 - accuracy: 0.7203 - val_loss: 0.2172 - val_accuracy: 0.9370
Epoch 2/2
60000/60000 [==============================] - 37s 623us/step - loss: 0.2851 - accuracy: 0.9107 - val_loss: 0.1457 - val_accuracy: 0.9573
结论:SGD训练是慢,只要不是遇到鞍点,多训练一点还是可以到达收敛的。
优化器之Adagrad
- Adagrad,默认值,两轮fit后train的accuracy为0.9127,test为0.9569。
Epoch 1/2
60000/60000 [==============================] - 37s 617us/step - loss: 1.7784 - accuracy: 0.8039 - val_loss: 0.1961 - val_accuracy: 0.9422
Epoch 2/2
60000/60000 [==============================] - 38s 631us/step - loss: 0.2839 - accuracy: 0.9127 - val_loss: 0.1501 - val_accuracy: 0.9569
结论:这个优化器Adagrad也没有什么参数可以调的。只有一个learning_rate,其他你别动。
In [27]: Adagrad??
Init signature: Adagrad(learning_rate=0.01, **kwargs)
Source:
class Adagrad(Optimizer):
"""Adagrad optimizer.
Adagrad is an optimizer with parameter-specific learning rates,
which are adapted relative to how frequently a parameter gets
updated during training. The more updates a parameter receives,
the smaller the learning rate.
It is recommended to leave the parameters of this optimizer
at their default values.
# Arguments
learning_rate: float >= 0. Initial learning rate.
# References
- [Adaptive Subgradient Methods for Online Learning and Stochastic
Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
"""
优化器之Adam
- Adam,使用默认值,两轮fit后train的accuracy为0.9348,test为0.9691,其实第一轮时test就到达0.9548了。
Epoch 1/2
60000/60000 [==============================] - 36s 599us/step - loss: 0.6294 - accuracy: 0.8043 - val_loss: 0.1628 - val_accuracy: 0.9548
Epoch 2/2
60000/60000 [==============================] - 46s 764us/step - loss: 0.2073 - accuracy: 0.9348 - val_loss: 0.1057 - val_accuracy: 0.9691
- Adam,使用learning_rate=0.01,其它默认,一轮fit后test达到0.9592,两轮test就到0.9725。
Epoch 1/2
60000/60000 [==============================] - 36s 593us/step - loss: 2.3847 - accuracy: 0.7794 - val_loss: 0.1803 - val_accuracy: 0.9592
Epoch 2/2
60000/60000 [==============================] - 38s 629us/step - loss: 0.2549 - accuracy: 0.9329 - val_loss: 0.0984 - val_accuracy: 0.9725
- Adam,使用learning_rate=0.01,amsgrad=True,结果同上
Epoch 1/2
60000/60000 [==============================] - 37s 621us/step - loss: 4.3511 - accuracy: 0.7465 - val_loss: 0.2714 - val_accuracy: 0.9545
Epoch 2/2
60000/60000 [==============================] - 44s 733us/step - loss: 0.3458 - accuracy: 0.9337 - val_loss: 0.1064 - val_accuracy: 0.9719
结论:这个优化器可以很快收敛。
优化器之RMSprop
- RMSprop,使用默认值,两轮fit后train的accuracy为0.9333,test为0.9673,其实一轮时test到达0.9579了。
Epoch 1/2
60000/60000 [==============================] - 36s 594us/step - loss: 0.7515 - accuracy: 0.8130 - val_loss: 0.1454 - val_accuracy: 0.9579
Epoch 2/2
60000/60000 [==============================] - 37s 625us/step - loss: 0.2154 - accuracy: 0.9333 - val_loss: 0.1007 - val_accuracy: 0.9673
- RMSprop,使用learning_rate=0.01,两轮test才到达0.9563。
Epoch 1/2
60000/60000 [==============================] - 36s 594us/step - loss: 21.2518 - accuracy: 0.6875 - val_loss: 8.6292 - val_accuracy: 0.5309
Epoch 2/2
60000/60000 [==============================] - 42s 699us/step - loss: 6.1957 - accuracy: 0.8132 - val_loss: 0.8259 - val_accuracy: 0.9563
- RMSprop,使用learning_rate默认值,rho=0.5,两轮后train为0.9364,test为0.9674,其实一轮时test到达0.9569了。
Epoch 1/2
60000/60000 [==============================] - 35s 590us/step - loss: 0.5887 - accuracy: 0.8204 - val_loss: 0.1453 - val_accuracy: 0.9569
Epoch 2/2
60000/60000 [==============================] - 38s 627us/step - loss: 0.2050 - accuracy: 0.9364 - val_loss: 0.1077 - val_accuracy: 0.9674
结论:它的默认参数已经很完美了。
优化器之Adadelta
- Adadelta,使用默认值,两轮fit后train的accuracy为0.9351,test为0.9694,其实一轮时test到达0.9566了。
Epoch 1/2
60000/60000 [==============================] - 37s 618us/step - loss: 0.6233 - accuracy: 0.8176 - val_loss: 0.1516 - val_accuracy: 0.9566
Epoch 2/2
60000/60000 [==============================] - 38s 639us/step - loss: 0.2077 - accuracy: 0.9351 - val_loss: 0.1007 - val_accuracy: 0.9694
结论:它的默认参数已经很完美了。
优化器之Adamax
- Adamax,使用默认值,两轮fit后train的accuracy为0.9251,test为0.9638,其实一轮时test到达0.9488了。
Epoch 1/2
60000/60000 [==============================] - 36s 600us/step - loss: 0.6072 - accuracy: 0.8148 - val_loss: 0.1794 - val_accuracy: 0.9488
Epoch 2/2
60000/60000 [==============================] - 37s 619us/step - loss: 0.2410 - accuracy: 0.9251 - val_loss: 0.1234 - val_accuracy: 0.9638
- Adamax,使用learning_rate=0.01,一轮test达到0.9482,两轮test达到0.9616。
Epoch 1/2
60000/60000 [==============================] - 36s 598us/step - loss: 1.9103 - accuracy: 0.7729 - val_loss: 0.2179 - val_accuracy: 0.9482
Epoch 2/2
60000/60000 [==============================] - 36s 606us/step - loss: 0.3143 - accuracy: 0.9196 - val_loss: 0.1345 - val_accuracy: 0.9616
结论:它很快可以收敛。
优化器之Nadam
- Nadam,使用默认值,一轮fit后test的accuracy达到0.9621,两轮后test达到0.9692。
Epoch 1/2
60000/60000 [==============================] - 36s 606us/step - loss: 0.5319 - accuracy: 0.8454 - val_loss: 0.1266 - val_accuracy: 0.9621
Epoch 2/2
60000/60000 [==============================] - 39s 651us/step - loss: 0.1791 - accuracy: 0.9439 - val_loss: 0.0949 - val_accuracy: 0.9692
结论:参数完美。
后记
以后假如再写神经网络,不再会傻傻地只用SGD了。。。。。。
最后
以上就是开放飞机为你收集整理的关于Keras里常用各种优化器的实验前言其他数据以及神经网络结构优化器之SGD优化器之Adagrad优化器之Adam优化器之RMSprop优化器之Adadelta优化器之Adamax优化器之Nadam后记的全部内容,希望文章能够帮你解决关于Keras里常用各种优化器的实验前言其他数据以及神经网络结构优化器之SGD优化器之Adagrad优化器之Adam优化器之RMSprop优化器之Adadelta优化器之Adamax优化器之Nadam后记所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复