概述
摘要:文章部分摘自链接。以kaggle上的猫狗数据集做训练,20000张训练图片,猫狗各10000张,5000张验证集,猫狗各2500张。
数据集链接
链接:https://pan.baidu.com/s/1uTl_ErqP_KxYH4M5feZOaQ
提取码:6666
文章目录
- 1.学习表征
- 2. 网络模型
- 3.源码部分
- 总结
1.学习表征
在2012年前,图像特征都是机械地计算出来的。事实上,设计一套新的特征函数、改进结果,并撰写论文是盛极一时的潮流。SIFT [Lowe, 2004]、SURF [Bay et al., 2006]、HOG(定向梯度直方图) [Dalal & Triggs, 2005] 、bags of visual words 和类似的特征提取方法占据了主导地位。
另一组研究人员,包括Yann LeCun、Geoff Hinton、Yoshua Bengio、Andrew Ng、Shun ichi Amari和Juergen Schmidhuber,想法则与众不同:他们认为特征本身应该被学习。此外,他们还认为,在合理地复杂性前提下,特征应该由多个共同学习的神经网络层组成,每个层都有可学习的参数。在机器视觉中,最底层可能检测边缘、颜色和纹理。事实上,Alex Krizhevsky、Ilya Sutskever和Geoff Hinton提出了一种新的卷积神经网络变体AlexNet。在2012年ImageNet挑战赛中取得了轰动一时的成绩。AlexNet 以 Alex Krizhevsky 的名字命名,他是论文 [Krizhevsky et al., 2012] 的第一作者。
有趣的是,在网络的最底层,模型学习到了一些类似于传统滤波器的特征抽取器。 图1 是从AlexNet论文 [Krizhevsky et al., 2012] 复制的,描述了底层图像特征。
AlexNet的更高层建立在这些底层表示的基础上,以表示更大的特征,如眼睛、鼻子、草叶等等。而更高的层可以检测整个物体,如人、飞机、狗或飞盘。最终的隐藏神经元可以学习图像的综合表示,从而使属于不同类别的数据易于区分。尽管一直有一群执着的研究者不断钻研,试图学习视觉数据的逐级表征,然而很长一段时间里这些尝试都未有突破。深度卷积神经网络的突破出现在2012年。2012年,AlexNet横空出世。它首次证明了学习到的特征可以超越手工设计的特征。它一举打破了计算机视觉研究的现状。 AlexNet使用了8层卷积神经网络,并以很大的优势赢得了2012年ImageNet图像识别挑战赛。
2. 网络模型
AlexNet和LeNet的设计理念非常相似,但也存在显著差异。 首先,AlexNet比相对较小的LeNet5要深得多。 AlexNet由八层组成:五个卷积层、两个全连接隐藏层和一个全连接输出层。 其次,AlexNet使用ReLU而不是sigmoid作为其激活函数。
3.源码部分
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset,DataLoader
import math
import numpy as np
from PIL import Image
import os
import torchvision
import matplotlib.pyplot as plt
from torch.utils.tensorboard import SummaryWriter
def AlexNet():
net = nn.Sequential(
#11x11卷积层
nn.Conv2d(in_channels=3,out_channels=96,kernel_size=11,stride=4,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3,stride=2),
#5x5卷积层
nn.Conv2d(in_channels=96,out_channels=256,kernel_size=5,padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3,stride=2),
#连续三个3x3卷积层
nn.Conv2d(in_channels=256,out_channels=384,kernel_size=3,padding=1),nn.ReLU(),
nn.Conv2d(in_channels=384,out_channels=384,kernel_size=3,padding=1),nn.ReLU(),
nn.Conv2d(in_channels=384,out_channels=256,kernel_size=3,padding=1),nn.ReLU(),
nn.MaxPool2d(kernel_size=3,stride=2),
#展开
nn.Flatten(),
#全连接层,这里的数目与原来论文中不一致,这里减少一点参数方便训练
nn.Linear(6400,512),nn.ReLU(),nn.Dropout(p=0.5),
nn.Linear(512,64),nn.ReLU(),nn.Dropout(p=0.5),
nn.Linear(64,2)
)
return net
class CatsAndDogs(Dataset):
def __init__(self, root,transforms=None,size=(224,224)):
#初始化
self.images = [os.path.join(root,item) for item in os.listdir(root)]
self.transforms = transforms
self.size = size
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
#这里需要resize是因为用Dataloader加载的同一个batch里面的图片大小需要一样
image = Image.open(self.images[idx])
image = self.transforms(image)
#the format of the path :"K:\imageData\dogAndCat\train\dog.9983.jpg"
label = self.images[idx].split("\")[-1].split(".")[0]
if label == "cat":
label = 0
if label == "dog":
label = 1
return image,label
def train(model,optimizer,loss_fn,train_loader,validLoader,epoches=30,device=torch.device("cpu"),logdir="./log"):
train_batches = 0
train_loss_list = []
valid_loss_list = []
valid_accuracy_list = []
epoch_list = []
writer = SummaryWriter(logdir)
for epoch in range(epoches):
training_loss = 0.0
valid_loss = 0.0
model.train()
for batch in train_loader:
train_batches += 1
# if train_batches>5:
# train_batches=0
# break
optimizer.zero_grad()
inputs,targets=batch
inputs = inputs.to(device)
targets = targets.to(device)
outputs = model(inputs)
loss = loss_fn(outputs,targets)
loss.backward()
optimizer.step()
training_loss += loss.data.item()*inputs.size(0)
#print("training batch: {}, batch loss: {:.5f}".format(train_batches,loss.data.item()))
writer.add_scalar("loss/batch_loss",loss.data.item(),train_batches)
training_loss /= len(train_loader.dataset)
model.eval()
num_correct = 0
num_examples = 0
with torch.no_grad():
for batch in validLoader:
# train_batches += 1# 正常情况需注释
# if train_batches > 5:
# train_batches = 0
# break
inputs, targets = batch
inputs = inputs.to(device)
outputs = model(inputs)
targets = targets.to(device)
loss = loss_fn(outputs, targets)
valid_loss += loss.data.item() * inputs.size(0)
correct = torch.eq(torch.max(F.softmax(outputs, dim=1), dim=1)[1], targets)
num_correct += torch.sum(correct).item()
num_examples += correct.shape[0]
valid_loss /= len(validLoader.dataset)
valid_accuracy = num_correct / num_examples
if epoch % 1 == 0:
print(
'Epoch: {}/{}, Training Loss: {:.5f}, Validation Loss: {:.5f}, accuracy = {:.5f}'
.format(epoch, epoches,training_loss,valid_loss,num_correct / num_examples))
writer.add_scalar("loss/epoches_loss",training_loss,epoch)
writer.add_scalar("loss/accuracy",num_correct / num_examples,epoch)
acc = num_correct / num_examples
writer.add_scalars("loss/train_valid",{"trainLoss":training_loss,"accuracy":acc},epoch)
train_loss_list.append(training_loss)
valid_loss_list.append(valid_loss)
valid_accuracy_list.append(valid_accuracy)
epoch_list.append(epoch)
return train_loss_list, valid_loss_list, valid_accuracy_list, epoch_list
def get_parameter_number(net):
total_num = sum(p.numel() for p in net.parameters())
trainable_num = sum(p.numel() for p in net.parameters() if p.requires_grad)
return {'Total parmeters': total_num, 'Trainable parmeters': trainable_num}
def visualize(train_loss,val_loss,val_acc,path="./train_valid.png"):
train_loss = np.array(train_loss)
val_loss = np.array(val_loss)
val_acc = np.array(val_acc)
plt.grid(True)
plt.xlabel("epoch")
plt.ylabel("value")
plt.title("train_loss and valid_acc")
plt.plot(np.arange(len(val_acc)),val_acc, label=r"valid_acc",c="g")
plt.plot(np.arange(len(train_loss)),train_loss,label=r"train_loss",c="r")
plt.legend()#显示曲线标签
plt.savefig(path)
plt.show()
测试模型是否正确
net = AlexNet()
x = torch.randn(1,3,224,224)
for layer in net:
x = layer(x)
print(layer.__class__.__name__, 'Output shape:t', x.shape)
Conv2d Output shape: torch.Size([1, 96, 54, 54])
ReLU Output shape: torch.Size([1, 96, 54, 54])
MaxPool2d Output shape: torch.Size([1, 96, 26, 26])
Conv2d Output shape: torch.Size([1, 256, 26, 26])
ReLU Output shape: torch.Size([1, 256, 26, 26])
MaxPool2d Output shape: torch.Size([1, 256, 12, 12])
Conv2d Output shape: torch.Size([1, 384, 12, 12])
ReLU Output shape: torch.Size([1, 384, 12, 12])
Conv2d Output shape: torch.Size([1, 384, 12, 12])
ReLU Output shape: torch.Size([1, 384, 12, 12])
Conv2d Output shape: torch.Size([1, 256, 12, 12])
ReLU Output shape: torch.Size([1, 256, 12, 12])
MaxPool2d Output shape: torch.Size([1, 256, 5, 5])
Flatten Output shape: torch.Size([1, 6400])
Linear Output shape: torch.Size([1, 512])
ReLU Output shape: torch.Size([1, 512])
Dropout Output shape: torch.Size([1, 512])
Linear Output shape: torch.Size([1, 64])
ReLU Output shape: torch.Size([1, 64])
Dropout Output shape: torch.Size([1, 64])
Linear Output shape: torch.Size([1, 2])
net
Sequential(
(0): Conv2d(3, 96, kernel_size=(11, 11), stride=(4, 4), padding=(1, 1))
(1): ReLU()
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU()
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU()
(8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU()
(10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU()
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(13): Flatten()
(14): Linear(in_features=6400, out_features=512, bias=True)
(15): ReLU()
(16): Dropout(p=0.5, inplace=False)
(17): Linear(in_features=512, out_features=64, bias=True)
(18): ReLU()
(19): Dropout(p=0.5, inplace=False)
(20): Linear(in_features=64, out_features=2, bias=True)
)
if __name__ == "__main__":
epoches = 25
modelPath = "D:\classifier\model\dogsAndCats_AlexNet.pt"
trainingResultPath = "D:\classifier\model\dogsAndCats_AlexNet.png"
logdir = "./catsAndDogs_AlexNet/log"#tensorboard logdir
model = AlexNet()
img_transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
])
trainset = CatsAndDogs(r"D:classifierimageDatacatsAndDogstrain",transforms=img_transforms)
validset = CatsAndDogs(r"D:classifierimageDatacatsAndDogsval",transforms=img_transforms)
trainLoader = DataLoader(trainset, batch_size=128, shuffle=True,num_workers=0)
validLoader = DataLoader(validset, batch_size=128, shuffle=True,num_workers=0)
if torch.cuda.is_available():
device = torch.device("cuda")
print("run in cuda")
else:
device = torch.device("cpu")
print("run in cpu")
model.to(device)
optimizer = torch.optim.AdamW(model.parameters(),lr=0.0005)
loss_fn = torch.nn.CrossEntropyLoss()
print(get_parameter_number(model))
train_loss_list,valid_loss_list,valid_accuracy_list ,epoch_list =
train(model,optimizer,loss_fn,trainLoader,validLoader,epoches,device,logdir)
torch.save(model,modelPath)
visualize(train_loss_list,valid_loss_list,valid_accuracy_list,trainingResultPath)
run in cuda
{'Total parmeters': 7057474, 'Trainable parmeters': 7057474}
Epoch: 0/25, Training Loss: 0.68914, Validation Loss: 0.67966, accuracy = 0.61220
Epoch: 1/25, Training Loss: 0.69001, Validation Loss: 0.68665, accuracy = 0.50000
Epoch: 2/25, Training Loss: 0.68481, Validation Loss: 0.67704, accuracy = 0.55940
Epoch: 3/25, Training Loss: 0.67508, Validation Loss: 0.66865, accuracy = 0.57240
Epoch: 4/25, Training Loss: 0.64222, Validation Loss: 0.60939, accuracy = 0.67520
Epoch: 5/25, Training Loss: 0.59012, Validation Loss: 0.53298, accuracy = 0.73220
Epoch: 6/25, Training Loss: 0.51666, Validation Loss: 0.47126, accuracy = 0.77320
Epoch: 7/25, Training Loss: 0.44834, Validation Loss: 0.41834, accuracy = 0.80760
Epoch: 8/25, Training Loss: 0.40024, Validation Loss: 0.38949, accuracy = 0.82140
Epoch: 9/25, Training Loss: 0.35366, Validation Loss: 0.46253, accuracy = 0.78220
Epoch: 10/25, Training Loss: 0.31696, Validation Loss: 0.37098, accuracy = 0.83320
Epoch: 11/25, Training Loss: 0.27756, Validation Loss: 0.32717, accuracy = 0.85800
Epoch: 12/25, Training Loss: 0.24526, Validation Loss: 0.35002, accuracy = 0.85100
Epoch: 13/25, Training Loss: 0.20707, Validation Loss: 0.39739, accuracy = 0.83860
Epoch: 14/25, Training Loss: 0.17929, Validation Loss: 0.37975, accuracy = 0.85600
Epoch: 15/25, Training Loss: 0.14151, Validation Loss: 0.39280, accuracy = 0.86200
Epoch: 16/25, Training Loss: 0.12865, Validation Loss: 0.51913, accuracy = 0.85640
Epoch: 17/25, Training Loss: 0.12045, Validation Loss: 0.44457, accuracy = 0.86560
Epoch: 18/25, Training Loss: 0.08484, Validation Loss: 0.46240, accuracy = 0.86580
Epoch: 19/25, Training Loss: 0.05874, Validation Loss: 0.50794, accuracy = 0.86540
Epoch: 20/25, Training Loss: 0.05012, Validation Loss: 0.58512, accuracy = 0.86980
Epoch: 21/25, Training Loss: 0.06486, Validation Loss: 0.55290, accuracy = 0.87020
Epoch: 22/25, Training Loss: 0.04856, Validation Loss: 0.62414, accuracy = 0.86820
Epoch: 23/25, Training Loss: 0.04241, Validation Loss: 0.57700, accuracy = 0.86040
Epoch: 24/25, Training Loss: 0.03803, Validation Loss: 0.73861, accuracy = 0.86560
- tensorboard可视化训练过程
-
总结
把原来ALexNet最后的全连接层的神经元数目减少了一些方便训练,最后的验证准确度差不多86%左右,训练参数7057474个。
最后
以上就是慈祥哑铃为你收集整理的pytorch图像分类之一:AlexNet的全部内容,希望文章能够帮你解决pytorch图像分类之一:AlexNet所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复