Мало знаний, большой вызов! Эта статья участвует в "Необходимые знания для программистов«Творческая деятельность.

Пример этой статьи взят с официального сайта pytorch, исходная ссылка:Изучение PyTorch на примерах — Документация PyTorch Tutorials 1.9.1+cu102

We will use a problem of fitting y=sin(x) with a third order polynomial as our running example. The network will have four parameters, and will be trained with gradient descent to fit random data by minimizing the Euclidean distance between the network output and the true output.

Он заключается в сопоставлении изображения sin(x) с полиномом третьего порядка, для которого требуются четыре входных параметра, а затем используется градиентный спуск для поиска оптимальных параметров.

NumPy

Хотя это введение в pytorch, полезно сначала понять код с помощью numpy:

# 设定随机的x和y
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# 给四个权重进行初始化
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

# 设定学习速率
learning_rate = 1e-6


for t in range(2000):
    # 计算预测函数 y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # 计算loss，这里loss = sum((y_predict - y)^2) 当然我学的loss前面还要除m或者2m
    loss = np.square(y_pred - y).sum()
    
    # 输出loss，这里就是挑了几个随便输出，不要纠结为什么是输出
    if t % 100 == 99:
        print(t, loss)

    # 反向传播计算abcd的gradients，求哪个参数就对哪个参数求偏导，所以得到下面的结论。不懂的话可以看底下推导。y
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # 更新权重
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

Для садовой части:

$loss = \sum_{i=1}^{2000}(y_{pred}-y)^2=\sum_{i=1}^{2000}(a+bx+cx^2+dx^3-y)^2$

Найдите частную производную приведенной выше формулы:

$\frac{\partial loss}{\partial a} = \frac{\partial \sum_{i=1}^{2000}(a+bx+cx^2+dx^3-y)^2}{\partial a} = \frac{\partial \sum_{i=1}^{2000}(y_{pred}-y)^2}{\partial y_{pred}-y} · \frac{\partial a+bx+cx^2+dx^3-y}{\partial a} = \sum_{i=1}^{2000}2(y_{pred}-y)$

$\frac{\partial loss}{\partial b} = \frac{\partial \sum_{i=1}^{2000}(a+bx+cx^2+dx^3-y)^2}{\partial b} = \frac{\partial \sum_{i=1}^{2000}(y_{pred}-y)^2}{\partial y_{pred}-y} · \frac{\partial a+bx+cx^2+dx^3-y}{\partial b} = \sum_{i=1}^{2000}2x(y_{pred}-y)$

Так же:

$\frac{\partial loss}{\partial c} = \sum_{i=1}^{2000}2x^2(y_{pred}-y)$

$\frac{\partial loss}{\partial d} = \sum_{i=1}^{2000}2x^3(y_{pred}-y)$

Pytorch

Итак, поскольку numpy может писать, зачем использовать pytorch?

Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. Numpy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients.

Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, так что, к сожалению, numpy будет недостаточно для современного глубокого обучения.

Numpy предоставляет объекты, которые могут легко вычислять n-мерные массивы.Это научная вычислительная среда, но numpy не может рассчитывать графики, глубокое обучение, градиенты и не может использовать GPU для ускорения вычислений, поэтому сегодня numpy не подходит для глубокого обучения. Так что используйте питорч.

import torch
import math


dtype = torch.float
device = torch.device("cpu")
# 取消下边这行的注释就可以在GPU上运行
# device = torch.device("cuda:0") 

# 随机设定输入
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# 随机初始化参数
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # 预测函数
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # 计算并输出loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # 反向传播
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # 更新权重
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

autograd

PyTorch: Tensors and autograd (auto-gradient)

In the above examples, we had to manually implement both the forward and backward passes of our neural network. Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.

Thankfully, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality. When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.

Звучит сложно, но на практике это довольно просто: каждый тензор представляет узел вычислительного графа.x is a Tensor that has x.requires_grad=True then x.grad is another Tensor holding the gradient of x with respect to some scalar value.

По сравнению с numpy, в дополнение к преимуществам, упомянутым выше, мы также можем использовать процесс обратного распространения без рукописного ввода, потому что его можно использовать в pytorch.autogradРеализовать автоматический расчет процесса обратного распространения нейронной сети. Когда мы используем autograd, прямое распространение определяет вычислительный граф, в котором все узлы графа являются тензорами, а ребра графа — функциями, которые генерируют выходные тензоры из входных тензоров. Градиент можно легко получить путем обратного распространения этого графика.

Хотя это звучит сложно, его очень просто использовать, каждый тензор представляет собой узел вычислительного графа. если x является тензором и установить егоx.requires_grad=True,Такx.gradТензор, который будет хранить градиент x относительно некоторого скаляра.

import torch
import math

dtype = torch.float
device = torch.device("cpu")
# 取消下边这行的注释就可以在GPU上运行
# device = torch.device("cuda:0")  

# Create Tensors to hold input and outputs.
# 默认情况下requires_grad=False, 表示我们不需要计算这个张量在反向传播过程中的梯度。
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# 随机初始化参数，设置requires_grad=True表示我们希望将反向传播过程中的梯度保留
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Now loss is a Tensor of shape (1,)
    # loss.item() 获取loss中的标量值
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # 使用autogrid计算，这个调用会计算所有的requires_grad=True的张量的gradient。
    # 然后他们的值会分别存储在对应的张量中a.grad, b.grad. c.grad d.grad
    loss.backward()

    # 手动更新权重
    # Wrap in torch.no_grad()因为之前设置了requires_grad=True,但是我们不希望在autograd记录下a-操作的gradient
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # 更新权重之后手动清除存储梯度gradient的张量
        # 每次在计算backward时都需要将前一时刻的梯度归零，否则梯度值会一直累加
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

Определить новую функцию автоградации

Under the hood, each primitive autograd operator is really two functions that operate on Tensors. The forward function computes output Tensors from input Tensors. The backward function receives the gradient of the output Tensors with respect to some scalar value, and computes the gradient of the input Tensors with respect to that same scalar value.

In PyTorch we can easily define our own autograd operator by defining a subclass of torch.autograd.Function and implementing the forward and backward functions. We can then use our new autograd operator by constructing an instance and calling it like a function, passing Tensors containing input data.

Оригинальный оператор autograd на самом деле обеспечивает две тензорные операции:

Прямой проход: вычисление выходных тензоров из входных тензоров
Обратное распространение: возьмите градиент выходного тензора относительно скаляра и вычислите градиент входного тензора относительно того же скаляра.

В pytorch мы можем использоватьtorch.autograd.FunctionРеализуя наше собственное прямое и обратное распространение, мы можем создать экземпляр, вызвать его как функцию и использовать новый оператор autograd.

Ранее наша модель прогнозирования была $y=a+bx+cx^2+dx^3$ , теперь мы модифицируем модель прогнозирования как $y=a+bP_3(c+dx)$ . $P_3(x)=\frac{1}{2}\left(5x^3-3x\right)$ это три разаполином Лежандра. Теперь мы реализуем нашу пользовательскую функцию autograd для реализации нашей новой модели:

# -*- coding: utf-8 -*-
import torch
import math


class LegendrePolynomial3(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        """
        在前向传递中，我们接收一个输入张量并返回一个输出张量
        ctx是一个伪后向传播隐藏信息的上下文对象
        匿可以使用ctx缓存任意对象，以便在向后传递中使用save_for_backward方法。
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)

    @staticmethod
    def backward(ctx, grad_output):
        """
        在后向传播中我们接收一个包含loss相对于输出的梯度的张量
        并且我们需要计算loss关于输入的梯度
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)


dtype = torch.float
device = torch.device("cpu")

# 声明保存输入和输出的张量
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)


# 随机初始化权重
# y = a + b * P3(c + d * x), 我们需要四个权重abcd
# 这些数的初始化要接近正确答案，以确保其收敛（？我有疑问，你咋知道正确答案）
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)

learning_rate = 5e-6
for t in range(2000):
    # 调用我们自己的函数P3
    P3 = LegendrePolynomial3.apply

    y_pred = a + b * P3(c + d * x)

    # 计算并输出loss
    loss = (y_pred - y).pow(2).sum()
    if t % 500 == 0:
        print(t, loss.item())

    # 反向传播
    loss.backward()

    # 更新权重
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # 手动清零 每次在计算backward时都需要将前一时刻的梯度归零，否则梯度值会一直累加
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')