亚洲免费成片97激情五月天,产精品亚洲欧洲美洲,国产精品992TV成人片

一旦我們選擇了一個架構(gòu)并設(shè)置了我們的超參數(shù)，我們就進入訓練循環(huán)，我們的目標是找到最小化損失函數(shù)的參數(shù)值。訓練后，我們將需要這些參數(shù)來進行未來的預(yù)測。此外，我們有時會希望提取參數(shù)以在其他上下文中重用它們，將我們的模型保存到磁盤以便它可以在其他軟件中執(zhí)行，或者進行檢查以期獲得科學理解。

大多數(shù)時候，我們將能夠忽略參數(shù)聲明和操作的具體細節(jié)，依靠深度學習框架來完成繁重的工作。然而，當我們遠離具有標準層的堆疊架構(gòu)時，我們有時需要陷入聲明和操作參數(shù)的困境。在本節(jié)中，我們將介紹以下內(nèi)容：

訪問用于調(diào)試、診斷和可視化的參數(shù)。

跨不同模型組件共享參數(shù)。

import torch
from torch import nn

from mxnet import init, np, npx
from mxnet.gluon import nn

npx.set_np()

import jax
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

import tensorflow as tf

我們首先關(guān)注具有一個隱藏層的 MLP。

net = nn.Sequential(nn.LazyLinear(8),
          nn.ReLU(),
          nn.LazyLinear(1))

X = torch.rand(size=(2, 4))
net(X).shape

torch.Size([2, 1])

net = nn.Sequential()
net.add(nn.Dense(8, activation='relu'))
net.add(nn.Dense(1))
net.initialize() # Use the default initialization method

X = np.random.uniform(size=(2, 4))
net(X).shape

(2, 1)

net = nn.Sequential([nn.Dense(8), nn.relu, nn.Dense(1)])

X = jax.random.uniform(d2l.get_key(), (2, 4))
params = net.init(d2l.get_key(), X)
net.apply(params, X).shape

(2, 1)

net = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(4, activation=tf.nn.relu),
  tf.keras.layers.Dense(1),
])

X = tf.random.uniform((2, 4))
net(X).shape

TensorShape([2, 1])

6.2.1. 參數(shù)訪問

讓我們從如何從您已知的模型中訪問參數(shù)開始。

當通過類定義模型時Sequential，我們可以首先通過索引模型來訪問任何層，就好像它是一個列表一樣。每個層的參數(shù)都方便地位于其屬性中。

When a model is defined via the Sequential class, we can first access any layer by indexing into the model as though it were a list. Each layer’s parameters are conveniently located in its attribute.

Flax and JAX decouple the model and the parameters as you might have observed in the models defined previously. When a model is defined via the Sequential class, we first need to initialize the network to generate the parameters dictionary. We can access any layer’s parameters through the keys of this dictionary.

我們可以如下檢查第二個全連接層的參數(shù)。

net[2].state_dict()

OrderedDict([('weight',
       tensor([[-0.2523, 0.2104, 0.2189, -0.0395, -0.0590, 0.3360, -0.0205, -0.1507]])),
       ('bias', tensor([0.0694]))])

net[1].params

dense1_ (
 Parameter dense1_weight (shape=(1, 8), dtype=float32)
 Parameter dense1_bias (shape=(1,), dtype=float32)
)

params['params']['layers_2']

FrozenDict({
  kernel: Array([[-0.20739523],
      [ 0.16546965],
      [-0.03713543],
      [-0.04860032],
      [-0.2102929 ],
      [ 0.163712 ],
      [ 0.27240783],
      [-0.4046879 ]], dtype=float32),
  bias: Array([0.], dtype=float32),
})

net.layers[2].weights

[,
 ]

我們可以看到這個全連接層包含兩個參數(shù)，分別對應(yīng)于該層的權(quán)重和偏差。

6.2.1.1. 目標參數(shù)

請注意，每個參數(shù)都表示為參數(shù)類的一個實例。要對參數(shù)做任何有用的事情，我們首先需要訪問基礎(chǔ)數(shù)值。做這件事有很多種方法。有些更簡單，有些則更通用。以下代碼從返回參數(shù)類實例的第二個神經(jīng)網(wǎng)絡(luò)層中提取偏差，并進一步訪問該參數(shù)的值。

type(net[2].bias), net[2].bias.data

(torch.nn.parameter.Parameter, tensor([0.0694]))

參數(shù)是復(fù)雜的對象，包含值、梯度和附加信息。這就是為什么我們需要顯式請求該值。

除了值之外，每個參數(shù)還允許我們訪問梯度。因為我們還沒有為這個網(wǎng)絡(luò)調(diào)用反向傳播，所以它處于初始狀態(tài)。

net[2].weight.grad == None

True

type(net[1].bias), net[1].bias.data()

(mxnet.gluon.parameter.Parameter, array([0.]))

Parameters are complex objects, containing values, gradients, and additional information. That is why we need to request the value explicitly.

In addition to the value, each parameter also allows us to access the gradient. Because we have not invoked backpropagation for this network yet, it is in its initial state.

net[1].weight.grad()

array([[0., 0., 0., 0., 0., 0., 0., 0.]])

bias = params['params']['layers_2']['bias']
type(bias), bias

(jaxlib.xla_extension.Array, Array([0.], dtype=float32))

Unlike the other frameworks, JAX does not keep a track of the gradients over the neural network parameters, instead the parameters and the network are decoupled. It allows the user to express their computation as a Python function, and use the grad transformation for the same purpose.

type(net.layers[2].weights[1]), tf.convert_to_tensor(net.layers[2].weights[1])

(tensorflow.python.ops.resource_variable_ops.ResourceVariable,
 )

6.2.1.2. 一次所有參數(shù)

當我們需要對所有參數(shù)執(zhí)行操作時，一個一個地訪問它們會變得乏味。當我們使用更復(fù)雜的模塊（例如，嵌套模塊）時，情況會變得特別笨拙，因為我們需要遞歸遍歷整個樹以提取每個子模塊的參數(shù)。下面我們演示訪問所有層的參數(shù)。

[(name, param.shape) for name, param in net.named_parameters()]

[('0.weight', torch.Size([8, 4])),
 ('0.bias', torch.Size([8])),
 ('2.weight', torch.Size([1, 8])),
 ('2.bias', torch.Size([1]))]

net.collect_params()

sequential0_ (
 Parameter dense0_weight (shape=(8, 4), dtype=float32)
 Parameter dense0_bias (shape=(8,), dtype=float32)
 Parameter dense1_weight (shape=(1, 8), dtype=float32)
 Parameter dense1_bias (shape=(1,), dtype=float32)
)

jax.tree_util.tree_map(lambda x: x.shape, params)

FrozenDict({
  params: {
    layers_0: {
      bias: (8,),
      kernel: (4, 8),
    },
    layers_2: {
      bias: (1,),
      kernel: (8, 1),
    },
  },
})

net.get_weights()

[array([[-0.42006454, 0.6094975 , -0.30087888, 0.42557293],
    [-0.26464057, -0.5518195 , 0.5476741 , 0.31728595],
    [-0.5571538 , -0.33794886, -0.05885679, 0.05435681],
    [ 0.28541476, 0.8276871 , -0.7665834 , 0.5791599 ]],
    dtype=float32),
 array([0., 0., 0., 0.], dtype=float32),
 array([[-0.52124995],
    [-0.22314149],
    [ 0.20780373],
    [ 0.6839919 ]], dtype=float32),
 array([0.], dtype=float32)]

6.2.2. 綁定參數(shù)

通常，我們希望跨多個層共享參數(shù)。讓我們看看如何優(yōu)雅地做到這一點。下面我們分配一個全連接層，然后專門使用它的參數(shù)來設(shè)置另一層的參數(shù)。這里我們需要net(X)在訪問參數(shù)之前運行前向傳播。

# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.LazyLinear(8)
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
          shared, nn.ReLU(),
          shared, nn.ReLU(),
          nn.LazyLinear(1))

net(X)
# Check whether the parameters are the same
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[2].weight.data[0] == net[4].weight.data[0])

tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])

net = nn.Sequential()
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Dense(8, activation='relu')
net.add(nn.Dense(8, activation='relu'),
    shared,
    nn.Dense(8, activation='relu', params=shared.params),
    nn.Dense(10))
net.initialize()

X = np.random.uniform(size=(2, 20))

net(X)
# Check whether the parameters are the same
print(net[1].weight.data()[0] == net[2].weight.data()[0])
net[1].weight.data()[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[1].weight.data()[0] == net[2].weight.data()[0])

[ True True True True True True True True]
[ True True True True True True True True]

# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.Dense(8)
net = nn.Sequential([nn.Dense(8), nn.relu,
           shared, nn.relu,
           shared, nn.relu,
           nn.Dense(1)])

params = net.init(jax.random.PRNGKey(d2l.get_seed()), X)

# Check whether the parameters are different
print(len(params['params']) == 3)

True

# tf.keras behaves a bit differently. It removes the duplicate layer
# automatically
shared = tf.keras.layers.Dense(4, activation=tf.nn.relu)
net = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  shared,
  shared,
  tf.keras.layers.Dense(1),
])

net(X)
# Check whether the parameters are different
print(len(net.layers) == 3)

True

這個例子表明第二層和第三層的參數(shù)是綁定的。它們不僅相等，而且由完全相同的張量表示。因此，如果我們改變其中一個參數(shù)，另一個參數(shù)也會改變。

您可能想知道，當參數(shù)綁定時梯度會發(fā)生什么變化？由于模型參數(shù)包含梯度，因此在反向傳播時將第二個隱藏層和第三個隱藏層的梯度相加。

You might wonder, when parameters are tied what happens to the gradients? Since the model parameters contain gradients, the gradients of the second hidden layer and the third hidden layer are added together during backpropagation.

6.2.3. 概括

我們有幾種方法來訪問和綁定模型參數(shù)。

6.2.4. 練習

使用第 6.1 節(jié)NestMLP中定義的模型并訪問各個層的參數(shù)。

構(gòu)造一個包含共享參數(shù)層的 MLP 并對其進行訓練。在訓練過程中，觀察每一層的模型參數(shù)和梯度。

為什么共享參數(shù)是個好主意？

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學習之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴

pytorch

pytorch

+關(guān)注

關(guān)注
2

文章
809

瀏覽量
13960

一区二区三区三上|欧美在线视频五区|国产午夜无码在线观看视频|亚洲国产裸体网站|无码成年人影视|亚洲AV亚洲AV|成人开心激情五月|欧美性爱内射视频|超碰人人干人人上|一区二区无码三区亚洲人区久久精品

搜索歷史

PyTorch教程-6.2. 參數(shù)管理

評論