新手教程
- 60分钟快速入门Pytorch
- Pytorch是什么?
- Autograd:自动分化
- 神经网络
- 训练分类器
60分钟快速入门Pytorch
- 张量
- Autograd求导
- nn 包
- 多GPU示例
PyTorch使用Torch模型
- 学习PyTorch与实例
学习PyTorch与实例
学习PyTorch与实例
本教程通过自包含的例子介绍了PyTorch的基本概念 。
其核心在于PyTorch提供了两个主要功能:
- 一个n维Tensor,类似于numpy,但可以在GPU上运行
- 建立和训练神经网络的自动区分
我们将使用完全连接的ReLU网络作为我们的运行示例。网络将具有单个隐藏层,并且将通过最小化网络输出和真实输出之间的欧氏距离来梯度下降来训练随机数据。
注意
您可以浏览本页末尾的各个示例 。
目录
张量
预热:numpy
在介绍PyTorch之前,我们先用numpy实现网络。
Numpy提供了一个n维数组对象,以及许多用于操作这些数组的函数。Numpy是科学计算的通用框架; 它不知道计算图,深度学习或渐变。然而,我们可以通过手动实现使用numpy操作的向前和向后传递网络来轻松地使用numpy来将两层网络适配为随机数据:
# -*- coding: utf-8 -*-
import numpy as np
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)
# Compute and print loss
loss = np.square(y_pred - y).sum()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = x.T.dot(grad_h)
# Update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
PyTorch:Tensors
Numpy是一个很好的框架,但它不能利用GPU加速其数值计算。对于现代深层神经网络,GPU通常提供50倍或更大的加速,所以不幸的是,numpy对于现代深层学习来说是不够的。
在这里我们介绍最基本的PyTorch概念:Tensor。PyTorch Tensor在概念上与numpy数组相同:Tensor是一个n维数组,PyTorch提供许多功能来操作这些Tensors。像数字阵列一样,PyTorch Tensors对于深度学习或计算图形或梯度知之甚少,它们是科学计算的通用工具。
然而,不同于numpy,PyTorch Tensors可以利用GPU加速其数字计算。要在GPU上运行PyTorch Tensor,您只需将其转换为新的数据类型即可。
在这里,我们使用PyTorch Tensors将两层网络拟合为随机数据。像上面的numpy示例一样,我们需要手动实现通过网络的向前和向后传递:
# -*- coding: utf-8 -*-
import torch
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in).type(dtype)
y = torch.randn(N, D_out).type(dtype)
# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
Autograd
PyTorch:变量和自动格式
在上面的例子中,我们必须手动实现神经网络的向前和向后遍。手动实施后向传递对于小型双层网络来说不是一件大事,但是对于大型复杂网络来说,可以很快地获得很多毛病。
幸运的是,我们可以使用自动差分 来自动计算神经网络中的向后传递。PyTorch中的 autograd包提供了这个功能。使用自动格式时,网络的正向传递将定义一个 计算图 ; 图中的节点将是Tensors,边缘将是从输入Tensors生成输出Tensors的函数。通过此图反向传播,您可以轻松地计算渐变。
这听起来很复杂,在实践中使用起来很简单。我们将PyTorch Tensors包装在可变对象中; 变量表示计算图中的节点。如果x是变量,则x.data是Tensor,并且x.grad是另一个保存x相对于某个标量值的渐变的变量 。
PyTorch变量与PyTorch Tensors具有相同的API(几乎)您可以在Tensor上执行的任何操作也适用于变量; 区别在于使用变量定义计算图,允许您自动计算渐变。
这里我们使用PyTorch变量和自动调整来实现我们的两层网络; 现在我们不再需要手动实现向后通过网络:
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y using operations on Variables; these
# are exactly the same operations we used to compute the forward pass using
# Tensors, but we do not need to keep references to intermediate values since
# we are not implementing the backward pass by hand.
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss using operations on Variables.
# Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
# (1,); loss.data[0] is a scalar value holding the loss.
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Variables with requires_grad=True.
# After this call w1.grad and w2.grad will be Variables holding the gradient
# of the loss with respect to w1 and w2 respectively.
loss.backward()
# Update weights using gradient descent; w1.data and w2.data are Tensors,
# w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
# Tensors.
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
PyTorch:定义新的自动调整功能
在引擎盖下,每个原始的自动调整算子实际上是两个功能,在Tensors上操作。的正向函数从输入张量计算输出张量。该向后功能接收输出张量的梯度相对于一些标量值,并计算输入张量的梯度相对于该相同标量值。
在PyTorch我们可以很容易地通过定义的一个子类中定义我们自己autograd操作torch.autograd.Function和实施forward 和backward功能。然后,我们可以通过构造一个实例并将其称为函数,通过传递包含输入数据的变量来使用我们的新的自动格式运算符。
在这个例子中,我们定义了我们自己的自定义autograd函数来执行ReLU非线性,并用它来实现我们的两层网络:
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
def forward(self, input):
"""
In the forward pass we receive a Tensor containing the input and return a
Tensor containing the output. You can cache arbitrary Tensors for use in the
backward pass using the save_for_backward method.
"""
self.save_for_backward(input)
return input.clamp(min=0)
def backward(self, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = self.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# Construct an instance of our MyReLU class to use in our network
relu = MyReLU()
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
y_pred = relu(x.mm(w1)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
TensorFlow:静态图
PyTorch autograd看起来很像TensorFlow:在两个框架中,我们定义一个计算图,并使用自动差分来计算梯度。两者之间的最大区别在于TensorFlow的计算图是静态的,PyTorch使用 动态计算图。
在TensorFlow中,我们定义了一次计算图,然后一遍又一遍地执行相同的图,可能会向图中输入不同的输入数据。在PyTorch中,每个前进传递定义了一个新的计算图。
静态图是很好的,因为您可以优化前面的图形; 例如,一个框架可能决定融合一些图形操作来提高效率,或者提出一种在许多GPU或许多机器上分发图形的策略。如果您一遍又一遍地重复使用相同的图表,那么这个潜在的昂贵的前期优化可以被分摊,因为同一个图表一遍又一遍地重新运行。
静态和动态图不同的一个方面是控制流程。对于某些型号,我们可能希望为每个数据点执行不同的计算; 例如,对于每个数据点,可以展开针对不同数量的时间步长的经常性网络; 这个展开可以被实现为循环。使用静态图形,循环构造需要是图形的一部分; 由于这个原因,TensorFlow提供了诸如tf.scan嵌入循环的操作符。使用动态图表,情况更简单:由于我们为每个示例动态构建图形,因此我们可以使用正常的命令式流程控制来执行每个输入不同的计算。
为了与上面的PyTorch autograd示例进行对比,我们使用TensorFlow来简化两层网络:
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
# First we set up the computational graph:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))
# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))
# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)
# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)
# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])
# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)
# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
# Run the graph once to initialize the Variables w1 and w2.
sess.run(tf.global_variables_initializer())
# Create numpy arrays holding the actual data for the inputs x and targets
# y
x_value = np.random.randn(N, D_in)
y_value = np.random.randn(N, D_out)
for _ in range(500):
# Execute the graph many times. Each time it executes we want to bind
# x_value to x and y_value to y, specified with the feed_dict argument.
# Each time we execute the graph we want to compute the values for loss,
# new_w1, and new_w2; the values of these Tensors are returned as numpy
# arrays.
loss_value, _, _ = sess.run([loss, new_w1, new_w2],
feed_dict={x: x_value, y: y_value})
print(loss_value)
nn模块
PyTorch:nn
计算图和自动格式是定义复杂运算符并自动获取衍生物的非常强大的范例; 然而对于大型神经网络,原始自动格式可能有点太低级别。
当构建神经网络时,我们经常认为将计算安排成层,其中一些具有学习参数 ,这将在学习过程中进行优化。
在TensorFlow中,像Keras, TensorFlow-Slim和TFLearn这样的软件包 可以为构建神经网络的原始计算图提供更高层次的抽象。
在PyTorch中,该nn软件包具有相同的用途。该nn 软件包定义了一组模块,它们大致相当于神经网络层。模块接收输入变量并计算输出变量,但也可以保存内部状态,例如包含可学习参数的变量。该nn软件包还定义了训练神经网络时通常使用的一组有用的损失函数。
在这个例子中,我们使用nn包来实现我们的两层网络:
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)
learning_rate = 1e-4
for t in range(500):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Variable of input data to the Module and it produces
# a Variable of output data.
y_pred = model(x)
# Compute and print loss. We pass Variables containing the predicted and true
# values of y, and the loss function returns a Variable containing the
# loss.
loss = loss_fn(y_pred, y)
print(t, loss.data[0])
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Variables with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
# Update the weights using gradient descent. Each parameter is a Variable, so
# we can access its data and gradients like we did before.
for param in model.parameters():
param.data -= learning_rate * param.grad.data
PyTorch:optim
到目前为止,我们已经通过手动变更包含.data可学习参数的变量的成员来更新我们的模型的权重。这对于简单的优化算法(如随机梯度下降)来说并不是一个巨大的负担,但实际上我们经常使用像AdaGrad,RMSProp,Adam等等更为优秀的优化器来训练神经网络。
optimPyTorch中的软件包提取了优化算法的思想,并提供了常用的优化算法的实现。
在这个例子中,我们将使用nn包来定义我们的模型,但是我们将使用optim包提供的Adam算法优化模型:
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(size_average=False)
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(x)
# Compute and print loss.
loss = loss_fn(y_pred, y)
print(t, loss.data[0])
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable weights
# of the model)
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
PyTorch:自定义nn模块
有时您会想要指定比现有模块序列更复杂的模型; 对于这些情况,您可以通过子类化定义自己的模块,nn.Module并定义forward接收输入变量的值,并使用其他模块或变量上的其他自动格式化操作生成输出变量。
在这个例子中,我们将两层网络实现为一个自定义的模块子类:
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and assign them as
member variables.
"""
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Variable of input data and we must return
a Variable of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Variables.
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
print(t, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
PyTorch:控制流+重量共享
作为动态图和权重共享的一个例子,我们实现了一个非常奇怪的模型:一个完全连接的ReLU网络,每个正向通道选择1到4之间的随机数,并使用许多隐藏层,重复使用相同权重多次计算最内层的隐藏层。
对于这个模型,可以使用普通的Python流控制来实现循环,并且通过在定义正向传递时简单地重复使用相同的模块,可以在最内层之间实现权重共享。
我们可以轻松地将此模型实现为模块子类:
# -*- coding: utf-8 -*-
import random
import torch
from torch.autograd import Variable
class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we construct three nn.Linear instances that we will use
in the forward pass.
"""
super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
and reuse the middle_linear Module that many times to compute hidden layer
representations.
Since each forward pass builds a dynamic computation graph, we can use normal
Python control-flow operators like loops or conditional statements when
defining the forward pass of the model.
Here we also see that it is perfectly safe to reuse the same Module many
times when defining a computational graph. This is a big improvement from Lua
Torch, where each Module could be used only once.
"""
h_relu = self.input_linear(x).clamp(min=0)
for _ in range(random.randint(0, 3)):
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)
# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = criterion(y_pred, y)
print(t, loss.data[0])
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
例子
您可以在这里浏览上面的例子。