電子發(fā)燒友網(wǎng)>電子資料下載>電子資料>PyTorch教程6.1之層和模塊

PyTorch教程6.1之層和模塊

2513682 2023-06-05 | pdf | 0.20 MB | 次下載 | 免費

資料介紹

當我們第一次引入神經(jīng)網(wǎng)絡(luò)時，我們專注于具有單一輸出的線性模型。在這里，整個模型只包含一個神經(jīng)元。請注意，單個神經(jīng)元 (i) 接受一組輸入；(ii) 生成相應(yīng)的標量輸出；(iii) 有一組相關(guān)參數(shù)，可以更新這些參數(shù)以優(yōu)化一些感興趣的目標函數(shù)。然后，一旦我們開始考慮具有多個輸出的網(wǎng)絡(luò)，我們就利用矢量化算法來表征整個神經(jīng)元層。就像單個神經(jīng)元一樣，層 (i) 采用一組輸入，(ii) 生成相應(yīng)的輸出，并且 (iii) 由一組可調(diào)參數(shù)描述。當我們進行 softmax 回歸時，單層本身就是模型。然而，即使我們隨后引入了 MLP，

有趣的是，對于 MLP，整個模型及其組成層都共享這種結(jié)構(gòu)。整個模型接受原始輸入（特征），生成輸出（預(yù)測），并擁有參數(shù)（來自所有構(gòu)成層的組合參數(shù)）。同樣，每個單獨的層攝取輸入（由前一層提供）生成輸出（后續(xù)層的輸入），并擁有一組可調(diào)參數(shù)，這些參數(shù)根據(jù)從后續(xù)層向后流動的信號進行更新。

雖然您可能認為神經(jīng)元、層和模型為我們提供了足夠的抽象來開展我們的業(yè)務(wù)，但事實證明，我們經(jīng)常發(fā)現(xiàn)談?wù)摫葐蝹€層大但比整個模型小的組件很方便。例如，在計算機視覺領(lǐng)域廣受歡迎的 ResNet-152 架構(gòu)擁有數(shù)百層。這些層由層組的重復(fù)圖案組成。一次一層地實現(xiàn)這樣的網(wǎng)絡(luò)會變得乏味。這種擔憂不僅僅是假設(shè)——這樣的設(shè)計模式在實踐中很常見。上面提到的 ResNet 架構(gòu)贏得了 2015 年 ImageNet 和 COCO 計算機視覺識別和檢測競賽（He et al. , 2016）并且仍然是許多視覺任務(wù)的首選架構(gòu)。層以各種重復(fù)模式排列的類似架構(gòu)現(xiàn)在在其他領(lǐng)域無處不在，包括自然語言處理和語音。

為了實現(xiàn)這些復(fù)雜的網(wǎng)絡(luò)，我們引入了神經(jīng)網(wǎng)絡(luò)模塊的概念。模塊可以描述單個層、由多個層組成的組件或整個模型本身！使用模塊抽象的一個好處是它們可以組合成更大的工件，通常是遞歸的。如圖 6.1.1所示。通過定義代碼以按需生成任意復(fù)雜度的模塊，我們可以編寫出奇緊湊的代碼并仍然實現(xiàn)復(fù)雜的神經(jīng)網(wǎng)絡(luò)。

https://file.elecfans.com/web2/M00/A9/C7/poYBAGR9NP2AcRNaAAJd7roQfBs959.svg

圖 6.1.1多層組合成模塊，形成更大模型的重復(fù)模式。

從編程的角度來看，模塊由類表示。它的任何子類都必須定義一個前向傳播方法，將其輸入轉(zhuǎn)換為輸出，并且必須存儲任何必要的參數(shù)。請注意，某些模塊根本不需要任何參數(shù)。最后，為了計算梯度，模塊必須具有反向傳播方法。幸運的是，由于自動微分（在2.5 節(jié)中介紹）在定義我們自己的模塊時提供了一些幕后魔法，我們只需要擔心參數(shù)和前向傳播方法。

						import torch
from torch import nn
from torch.nn import functional as F

						 

						from mxnet import np, npx
from mxnet.gluon import nn

npx.set_np()

						from typing import List
import jax
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l

						 

						No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

					

						import tensorflow as tf

						 

首先，我們重新審視用于實現(xiàn) MLP 的代碼（第 5.1 節(jié)）。以下代碼生成一個網(wǎng)絡(luò)，該網(wǎng)絡(luò)具有一個具有 256 個單元和 ReLU 激活的全連接隱藏層，后跟一個具有 10 個單元的全連接輸出層（無激活函數(shù)）。

						net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

X = torch.rand(2, 20)
net(X).shape

						torch.Size([2, 10])

					

在這個例子中，我們通過實例化一個來構(gòu)造我們的模型 nn.Sequential，層按照它們應(yīng)該被執(zhí)行的順序作為參數(shù)傳遞。簡而言之，nn.Sequential定義了一種特殊的Module，在 PyTorch 中呈現(xiàn)模塊的類。它維護一個有序的 constituent 列表Module。請注意，兩個完全連接的層中的每一個都是該類的一個實例，Linear該類本身是的子類Module。前向傳播 ( forward) 方法也非常簡單：它將列表中的每個模塊鏈接在一起，將每個模塊的輸出作為輸入傳遞給下一個模塊。請注意，到目前為止，我們一直在通過構(gòu)造調(diào)用我們的模型 net(X)以獲得它們的輸出。這實際上只是 net.__call__(X).

						net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize()

X = np.random.uniform(size=(2, 20))
net(X).shape

						 

						(2, 10)

					

In this example, we constructed our model by instantiating an nn.Sequential, assigning the returned object to the net variable. Next, we repeatedly call its add method, appending layers in the order that they should be executed. In short, nn.Sequential defines a special kind of Block, the class that presents a module in Gluon. It maintains an ordered list of constituent Blocks. The add method simply facilitates the addition of each successive Block to the list. Note that each layer is an instance of the Dense class which is itself a subclass of Block. The forward propagation (forward) method is also remarkably simple: it chains each Block in the list together, passing the output of each as input to the next. Note that until now, we have been invoking our models via the construction net(X) to obtain their outputs. This is actually just shorthand for net.forward(X), a slick Python trick achieved via the Block class’s __call__ method.

						net = nn.Sequential([nn.Dense(256), nn.relu, nn.Dense(10)])

# get_key is a d2l saved function returning jax.random.PRNGKey(random_seed)
X = jax.random.uniform(d2l.get_key(), (2, 20))
params = net.init(d2l.get_key(), X)
net.apply(params, X).shape

						 

						(2, 10)

					

						net = tf.keras.models.Sequential([
  tf.keras.layers.Dense(256, activation=tf.nn.relu),
  tf.keras.layers.Dense(10),
])

X = tf.random.uniform((2, 20))
net(X).shape

						 

						TensorShape([2, 10])

					

In this example, we constructed our model by instantiating an keras.models.Sequential, with layers in the order that they should be executed passed as arguments. In short, Sequential defines a special kind of keras.Model, the class that presents a module in Keras. It maintains an ordered list of constituent Models. Note that each of the two fully connected layers is an instance of the Dense class which is itself a subclass of Model. The forward propagation (call) method is also remarkably simple: it chains each module in the list together, passing the output of each as input to the next. Note that until now, we have been invoking our models via the construction net(X) to obtain their outputs. This is actually just shorthand for net.call(X), a slick Python trick achieved via the module class’s __call__ method.

6.1.1. 自定義模塊

也許培養(yǎng)關(guān)于模塊如何工作的直覺的最簡單方法是我們自己實現(xiàn)一個。在我們實現(xiàn)自己的自定義模塊之前，我們先簡單總結(jié)一下每個模塊必須提供的基本功能：

攝取輸入數(shù)據(jù)作為其前向傳播方法的參數(shù)。
通過讓前向傳播方法返回一個值來生成輸出。請注意，輸出可能具有與輸入不同的形狀。例如，我們上面模型中的第一個全連接層接收任意維度的輸入，但返回 256 維度的輸出。
計算其輸出相對于其輸入的梯度，可以通過其反向傳播方法訪問。通常這會自動發(fā)生。
存儲并提供對執(zhí)行前向傳播計算所需的那些參數(shù)的訪問。
根據(jù)需要初始化模型參數(shù)。

在下面的代碼片段中，我們從頭開始編寫一個模塊，對應(yīng)于一個包含 256 個隱藏單元的隱藏層和一個 10 維輸出層的 MLP。請注意，MLP下面的類繼承了代表模塊的類。我們將嚴重依賴父類的方法，僅提供我們自己的構(gòu)造函數(shù)（__init__ Python 中的方法）和前向傳播方法。

							class MLP(nn.Module):
  def __init__(self):
    # Call the constructor of the parent class nn.Module to perform
    # the necessary initialization
    super().__init__()
    self.hidden = nn.LazyLinear(256)
    self.out = nn.LazyLinear(10)

  # Define the forward propagation of the model, that is, how to return the
  # required model output based on the input X
  def forward(self, X):
    return self.out(F.relu(self.hidden(X)))

							 

							class MLP(nn.Block):
  def __init__(self):
    # Call the constructor of the MLP parent class nn.Block to perform
    # the necessary initialization
    super().__init__()
    self.hidden = nn.Dense(256, activation='relu')
    self.out = nn.Dense(10)

  # Define the forward propagation of the model, that is, how to return the
  # required model output based on the input X
  def forward(self, X):
    return self.out(self.hidden(X))

							 

							class MLP(nn.Module):
  def setup(self):
    # Define the layers
    self.hidden = nn.Dense(256)
    self.out = nn.Dense(10)

  # Define the forward propagation of the model, that is, how to return the
  # required model output based on the input X
  def __call__(self, X):
    return self.out(nn.relu(self.hidden(X)))

							 

							class MLP(tf.keras.Model):
  def __init__(self):
    # Call the constructor of the parent class tf.keras.Model to perform
    # the necessary initialization
    super().__init__()
    self.hidden = tf.keras.layers.Dense(units=256, activation=tf.nn.relu)
    self.out = tf.keras.layers.Dense(units=10)

  # Define the forward propagation of the model, that is, how to return the
  # required model output based on the input X
  def call(self, X):
    return self.out(self.hidden((X)))

							 

讓我們首先關(guān)注前向傳播方法。請注意，它以 X輸入為輸入，應(yīng)用激活函數(shù)計算隱藏表示，并輸出其對數(shù)。在這個MLP 實現(xiàn)中，兩層都是實例變量。要了解為什么這是合理的，想象一下實例化兩個 MLPnet1和net2，并在不同的數(shù)據(jù)上訓(xùn)練它們。自然地，我們希望它們代表兩種不同的學(xué)習模型。

我們在構(gòu)造函數(shù)中實例化 MLP 的層，隨后在每次調(diào)用前向傳播方法時調(diào)用這些層。注意幾個關(guān)鍵細節(jié)。首先，我們的自定義方法通過讓我們免于重述適用于大多數(shù)模塊的樣板代碼的痛苦來__init__調(diào)用父類的方法。然后我們實例化我們的兩個完全連接的層，將它們分配給和。請注意，除非我們實現(xiàn)一個新層，否則我們不必擔心反向傳播方法或參數(shù)初始化。系統(tǒng)會自動生成這些方法。讓我們試試這個。__init__super().__init__()self.hidden