Pytorch笔记Pytorch NotebookTable of Contentstensorautogradneural networkdata loadoptimizertrainmodel I/Oevaluatemodelssundryproblem shooting

110 阅读 0 评论 73 点赞

我是靠谱客的博主淡定自行车，最近开发中收集的这篇文章主要介绍Pytorch笔记Pytorch NotebookTable of Contentstensorautogradneural networkdata loadoptimizertrainmodel I/Oevaluatemodelssundryproblem shooting，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

Pytorch Notebook

由于使用emacs-org编辑，为方便暂且使用了英文

tensor
1. create
2. cloning
3. operation
  1. in-place operations
  2. transpose (permute)
  3. about size and indexing
  4. add
4. with numpy
5. cuda
autograd
1. track and gradient computing
2. function
3. backward()
4. torch.no_grad()
neural network
1. structured construction
  1. layers (no order)
  2. forward propagate structure (ordered)
2. sequential construction
data load
1. torchvision
optimizer
train
1. gpu support
2. loss function
3. train
4. about step()s
  1. optimizer.step(self, closure = None)
  2. schedular.step()
model I/O
1. method 1 (recommended)
2. method 2
evaluate
models
1. attributes
2. pretrained models
  1. torchvision.models
sundry
problem shooting

pytorch is deeplearning’s numpy

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as Data
import torch.optim as optim
import numpy as np

tensor

create

uninitialized tensor:
x = torch.empty(5, 3)
random tensor:
x = torch.rand(5, 3)
zeros:
x = torch.zeros(5, 3)
define dtype:
x = torch.zeros(5, 3, dtype = torch.long)
from known data:
x = torch.tensor([5.5, 3])

cloning

tryna reuse existing tensor’s properties.

new_* methods:

x = x.new_ones(5, 3, dtype = torch.double)# 64-bit

copy the size:

x = torch.randn_like(x, dtype = torch.float)# 32-bit*

operation

in-place operations

write ‘_’ behind.

ex. y.add_(x) -> +=

x.t_() -> directly transpose x

transpose (permute)

x = x.permute(1, 2, 0)

about size and indexing

get size: x.size(axes)

resize:

 x = torch.randn(4, 4)
 y = x.view(16)
 z = x.view(-1, 8)
 # '-1's size will be inferred from other dims

 # use .item() to get a scalar to python number
 x = torch.randn(1)
 num = x.item()
 ```

add

 # simply
 x + y
 torch.add(x, y)
 
 # introduce the result
 torch.add(x, y, out = result)
 
 # in-place (+=)
 y.add_(x)

with numpy

numpy-form and torch-form
share the same memory location,
change together.

torch.from_numpy(npdata)

torchdata.numpy()

npdata = np.arange(6).reshape(2, 3)
np2torch = torch.from_numpy(npdata)
'''
tensor([[0, 1, 2],
        [3, 4, 5]], dtype=torch.int32)
'''
torch2np = np2torch.numpy()

cuda

 if torch.cuda.is_available():
     device = torch.device('cuda')
     # directly create on GPU
     y = torch.ones_like(x, device = device)
     # copy to GPU
     x = x.to(device)
     # or x.to('cuda')
 
     z = x + y
     # tensor([0.1034], device='cuda:0')
     z.to('cpu')

autograd

track and gradient computing

set sometensor.requires_grad True,
to keep track of all the computations.
(enable training)
call .backward() to compute all gradients.
gradient accumulate to .grad attribute.
stop tracking: .detach().
prevent tracking: use code block
with torch.no_grad():.

function

for operation-created tensor,
tensor.grad_fn refer to a function that has
created the tensor.

for user-defined tensor, .grad_fn is None.

backward()

for non-scalar, specify a gradient that is a tensor
of matching shape.

torch.no_grad()

use with torch.no_grad(): when testing the model.

neural network

the typical learning precedure:

define the network, define the learnable params.
iterate over a dataset of inputs.
process the input through the network.
compute the loss.
back-propagate.
update the params.
(weight = weight - learningrate * gradient)

structured construction

layers (no order)

import torch.nn as nn

define in net_class’s __init__()

 class LeNet(nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         self.conv1 = nn.Conv2d(3, 6, 5)
         self.pool = nn.MaxPool2d(2, 2)
         self.conv2 = nn.Conv2d(6, 16, 5)
         self.fc1 = nn.Linear(16 * 5 * 5, 120)
         self.fc2 = nn.Linear(120, 84)
         self.fc3 = nn.Linear(84, 10)

forward propagate structure (ordered)

import torch.nn.functional as F

define in net_class’s forward()

 class LeNet(nn.Module):
     def __init_(self):# layers
     def forward(self, x):
         x = self.conv1(x)
         x = F.relu(x)
         x = self.pool(x)
 
         # write simply with nested structure
         x = self.pool(F.relu(self.conv2(x)))
 
         x = x.view(-1, 16 * 5 * 5)
         # single output/input is/should be row vector
         # -1 is for batchsize
 
         x = F.relu(self.fc1(x))
         x = F.relu(self.fc2(x))
         x = self.fc3(x)
         return x

sequential construction

 net = nn.Sequential(
     nn.Linear(2, 10),
     nn.ReLU(),# btw, this ReLU is a class
     nn.Linear(10, 2)
 )

data load

transforms.ToTensor <-> transforms.ToPILImage()

 import torch.utils.data as Data
 mydataset = Data.TensorDataset(data_tensor = x, target_tensor = y)
 mydataloader = Data.DataLoader(
     dataset = mydataset,
     batch_size = BATCH_SIZE,
     shuffle = True,
     num_workers = 2
 )

torchvision

 transform = transforms.Compose(
     [transforms.ToTensor(),
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 
 trainset = torchvision.datasets.CIFAR10(root = './data', train = True,
                                         download = True, transform = transform)
 trainloader = torch.utils.data.DataLoader(trainset, batch_size = BATCH_SIZE,
                                           shuffle = True, num_workers = 0)
 
 transforms.RandomResizedCrop((height, width))

optimizer

 import torch.optim as optim
 optimizer = optim.SGD(net.parameters(), lr = 0.001, momentum = 0.9)

train

gpu support

 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 print(device)
 net = Net()
 net.to(device)
 '''...'''
 for epoch in range(epochs):
     for i, data in enumerate(trainloader, 0):
         inputs, labels = data[0].to(device), data[1].to(device)

loss function

 import torch.nn as nn
 criterion = nn.CrossEntropyLoss()

train

 for epoch in range(2):
     trainingloss = 0.0
     for i, data in enumerate(trainloader, 0):
         # for gpu support
         inputs, labels = data[0].to(device), data[1].to(device)
         # clear the gradient buffer
         optimizer.zero_grad()
         # forward
         outputs = net(inputs)
         # loss computing
         loss = criterion(outputs, labels)
         # back propagate
         loss.backward()
         # update weights
         optimizer.step()

about step()s

optimizer.step(self, closure = None)

usually used every mini-batch to update the weights.

closure (callable, optional): A closure that reevaluates the model
proceed back-propagation, and returns the loss.

if closure isn’t passed, a backward() should be
proceeded before optimizer.step().

schedular.step()

usually used every epoch to adjust learning rate.

model I/O

method 1 (recommended)

only save the weights, not structure.

needa reconstruct the net when evaluating.

 PATH = './example-model.pth'
 # save
 torch.save(net.state_dict(), PATH)
 # load
 net = Net()# reconstruct the network
 net.load_state_dict(torch.load(PATH))

method 2

save all, but unstable for refactor or transfer usage.

 PATH = './example-model.pth'
 # save
 torch.save(net, PATH)
 # load
 net = torch.load(PATH)

evaluate

 class Net(nn.Module):# copy the structure
 net = Net()
 net.load_state_dict(torch.load(PATH))
 # evaluate
 class_correct = list(0. for i in range(10))# 10 classes example
 class_total = list(0. for i in range(10))
 with torch.no_grad():
     for data in testloader:
         images, labels = data
         outputs = net(images)
         _, predicted = torch.max(outputs, 1)
         c = (predicted == labels).squeeze()
 
         for i in range(4):
             label = labels[i]
             class_correct[label] += c[i].item()
             class_total[label] += 1

models

attributes

modules() -> all the working modules in a network.

pretrained models

torchvision.models

 import torchvision.models as models
 import torchvision.transforms as transforms
 vgg16 = models.vgg16(pretrained = True).eval()
 # all the models use the same normalization
 normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                      std=[0.229, 0.224, 0.225])

sundry

normalization: (with mean and std, x -= mean /= std)

is for making the data centralized, thus making the
distribution normal, so as to bettern the classification
performance.

torch.max(input, dim,) -> (Tensor, LongTensor): - torch.max(a, 0) returns each column’s max value,
then their index.
- torch.max(a, 1) returns each row’s max value,
  then their columns.
torch.nn.functional.softmax(input, dim) -> (Tensor): - softmax(a, 0) change a into Tensor that have all column
sum as 1.
- softmax(a, 1) change a into Tensor that have all row
  sum as 1.

Tensor.squeeze(): squeeze the length 1 dimensions in the Tensor.

 t = torch.Tensor([[1], [2], [3]])
 t.squeeze()
 # tensor([1., 2., 3.])

torch.bmm(batch1, batch2, out = None) -> Tensor: batch-matmul, say batch1.size() = [2, 3, 4],
and batch2.size() = [2, 4, 5],
so the result’s size() would be [2, 3, 5].
torch.unsqueeze(input, dim, output = None) -> Tensor: returns a new tensor with a dimension of size one
inserted at the specified position.

the new Tensor shares the same underlying data with this Tensor.
- positive dim: range from 0 to input.dim().
- negative dim: counting backward.
prediction first, label second: when calling lossfunctions, we should pass predicted and label
in order.
labels are LongTensor (64-bit) by default.
paddings
- nn.ReflectionPad1d(padding) ~ nn.ReflectionPad3d(padding): use the reflection of the opposite boundary to pad.
  - padding is number: pad all directions for the same length.
  - or padding is (left_padding, right_padding).
- nn.ReplicationPad1d(padding) ~ nn.ReplicationPad3d(padding): use the copy of the original boundary to pad.
- nn.ConstantPad1d(padding, value) ~ nn.ConstantPad3d(padding, value): use the same value to pad all directions.
- F.pad(input, pad, mode = 'constant', value = 0)

problem shooting

BrokenPipe Error: encountering this on windows when downloading dataset:
set the num_workers to 0.
TypeError: ‘module’ object is not callable: - maybe it’s your capital letters’ problem.

like datasets.MNIST shouldn’t be datasets.mnist.
Adding softmax layer to CIFAR10-lenet makes the training slower.
"trying to backward multiple times without ‘retained = True’": see if mse_loss’s parameters’ shape don’t match.