Module

CLASStorch.nn.Module(args, kwargs*)[SOURCE]层、模型的父类
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)  # 子Module的嵌套
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
func
add_module(name, module)[SOURCE](向当前的模块中添加子模块，层层嵌套通过.访问)
Add a child module to the current module.
The module can be accessed as an attribute using the given name.
Parameters
name (str) – name of the child module. The child module can be accessed from this module using the given name
module (Module) – child module to be added to the module.
apply(fn)[SOURCE](以函数作为参数，可以递归的应用到子模块中)
Apply fn recursively to every submodule (as returned by .children()) as well as self.
Typical use includes initializing the parameters of a model (see also torch.nn.init).
Parameters
fn (Module -> None) – function to be applied to each submodule
Returns
self
Return type
Module
Example:
@torch.no_grad()  # 不需要梯度运算修饰符
def init_weights(m):
    print(m)
    if type(m) == nn.Linear:  # 如果m的类型为Module类型，
        m.weight.fill_(1.0)  # 对weight参数进行填充
        print(m.weight)
net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)  # 作用于所有的子模块进行初始化
# 第一个模块 相当于print(m)的输出
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:  # 打印weights的张量
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
# 同理： 第二个模块
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
bfloat16()[SOURCE]
Casts all floating point parameters and buffers to bfloat16 datatype.(将模块的所有浮点类型参数或者buffers全部转换为bfloat16 )
buffers(recurse=True)[SOURCE]
Return an iterator over module buffers.(返回该模块包含buffer的一个迭代器，buffers与parameter同级，parameter参与到梯度下降训练&前向运算，buffers统计量)
for buf in model.buffers():
    print(type(buf), buf.size())
'''
<class 'torch.Tensor'> (20L,)
<class 'torch.Tensor'> (20L, 1L, 5L, 5L)
**children()[
](https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module.children)**
Return an iterator over immediate children modules.
**cpu()[
](https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module.cpu)**
Move all model parameters and buffers to the CPU.
cuda(device=None)[
](https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module.cuda)
Move all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
eval()[SOURCE]
Set the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.
This is equivalent with self.train(False).
See Locally disabling gradient computation for a comparison between .eval() and several similar mechanisms that may be confused with it.
get_parameter(target)[SOURCE](根据字符串，得到当前模型里的参数)
load_state_dict(state_dict, strict=True, assign=False)[SOURCE]
Copy parameters and buffers from state_dict into this module and its descendants.
If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.
requires_grad_(requires_grad=True)[SOURCE](对模型是否需要梯度更新进行设置)
# Additional information
EPOCH = 5
PATH = "model.pt"
LOSS = 0.4

torch.save({
            'epoch': EPOCH,  # 当前属于第几轮
            'model_state_dict': net.state_dict(),  # 测试、推理用，所有parameters&buffles
            'optimizer_state_dict': optimizer.state_dict(),  # 优化器相关
            'loss': LOSS,  # 当前损失
            }, PATH)
model = Net()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

MODULE SOURCE CODE

super().__setattr__('training', True)
super().__setattr__('_parameters', OrderedDict())
super().__setattr__('_buffers', OrderedDict())
super().__setattr__('_non_persistent_buffers_set', set())
super().__setattr__('_backward_pre_hooks', OrderedDict())
super().__setattr__('_backward_hooks', OrderedDict())
super().__setattr__('_is_full_backward_hook', None)
super().__setattr__('_forward_hooks', OrderedDict())
super().__setattr__('_forward_hooks_with_kwargs', OrderedDict())
super().__setattr__('_forward_hooks_always_called', OrderedDict())
super().__setattr__('_forward_pre_hooks', OrderedDict())
super().__setattr__('_forward_pre_hooks_with_kwargs', OrderedDict())
super().__setattr__('_state_dict_hooks', OrderedDict())
super().__setattr__('_state_dict_pre_hooks', OrderedDict())
super().__setattr__('_load_state_dict_pre_hooks', OrderedDict())
super().__setattr__('_load_state_dict_post_hooks', OrderedDict())
super().__setattr__('_modules', OrderedDict())

register_buffer

# 名称、Tensor值、是否需要保存下来，作用：向当前模块中添加Buffer变量，persistent参数设置是否持久
def register_buffer(self, name: str, tensor: Optional[Tensor], persistent: bool = True) -> None:
    r"""Adds a buffer to the module.

    This is typically used to register a buffer that should not to be
    considered a model parameter. For example, BatchNorm's ``running_mean``
    is not a parameter, but is part of the module's state. Buffers, by
    default, are persistent and will be saved alongside parameters. This
    behavior can be changed by setting :attr:`persistent` to ``False``. The
    only difference between a persistent buffer and a non-persistent buffer
    is that the latter will not be a part of this module's
    :attr:`state_dict`.

    Buffers can be accessed as attributes using given names.

    Args:
        name (str): name of the buffer. The buffer can be accessed
            from this module using the given name
        tensor (Tensor or None): buffer to be registered. If ``None``, then operations
            that run on buffers, such as :attr:`cuda`, are ignored. If ``None``,
            the buffer is **not** included in the module's :attr:`state_dict`.
        persistent (bool): whether the buffer is part of this module's
            :attr:`state_dict`.

    Example::
        Class BatchNorm(nn.Module):
            def __init__(self):
        >>> # xdoctest: +SKIP("undefined vars")
        ## 实现batch_norm的class，需要调用self.register_buffer，默认值全0张量
        >>> self.register_buffer('running_mean', torch.zeros(num_features))  # 均值
        >>> self.register_buffer('running_variance', torch.ones(num_features))  # 方差

    """
    if persistent is False and isinstance(self, torch.jit.ScriptModule):
        raise RuntimeError("ScriptModule does not support non-persistent buffers")

    if '_buffers' not in self.__dict__:
        raise AttributeError(
            "cannot assign buffer before Module.__init__() call")
    elif not isinstance(name, str):
        raise TypeError(f"buffer name should be a string. Got {torch.typename(name)}")
    elif '.' in name:
        raise KeyError("buffer name can't contain \".\"")
    elif name == '':
        raise KeyError("buffer name can't be empty string \"\"")
    elif hasattr(self, name) and name not in self._buffers:
        raise KeyError(f"attribute '{name}' already exists")
    elif tensor is not None and not isinstance(tensor, torch.Tensor):
        raise TypeError(f"cannot assign '{torch.typename(tensor)}' object to buffer '{name}' "
                        "(torch Tensor or None required)"
                        )
    else:
        for hook in _global_buffer_registration_hooks.values():
            output = hook(self, name, tensor)
            if output is not None:
                tensor = output
        self._buffers[name] = tensor
        if persistent:
            self._non_persistent_buffers_set.discard(name)
        else:
            self._non_persistent_buffers_set.add(name)

register_parameter

# [Parameter]类型：tensor类型的一个继承
def register_parameter(self, name: str, param: Optional[Parameter]) -> None:
    r"""Adds a parameter to the module.

    The parameter can be accessed as an attribute using given name.

    Args:
        name (str): name of the parameter. The parameter can be accessed
            from this module using the given name
        param (Parameter or None): parameter to be added to the module. If
            ``None``, then operations that run on parameters, such as :attr:`cuda`,
            are ignored. If ``None``, the parameter is **not** included in the
            module's :attr:`state_dict`.
    """
    if '_parameters' not in self.__dict__:
        raise AttributeError(
            "cannot assign parameter before Module.__init__() call")

    elif not isinstance(name, str):
        raise TypeError(f"parameter name should be a string. Got {torch.typename(name)}")
    elif '.' in name:
        raise KeyError("parameter name can't contain \".\"")
    elif name == '':
        raise KeyError("parameter name can't be empty string \"\"")
    elif hasattr(self, name) and name not in self._parameters:
        raise KeyError(f"attribute '{name}' already exists")

    if param is None:
        self._parameters[name] = None
    elif not isinstance(param, Parameter):
        raise TypeError(f"cannot assign '{torch.typename(param)}' object to parameter '{name}' "
                        "(torch.nn.Parameter or None required)"
                        )
    elif param.grad_fn:
        raise ValueError(
            f"Cannot assign non-leaf Tensor to parameter '{name}'. Model "
            f"parameters must be created explicitly. To express '{name}' "
            "as a function of another Tensor, compute the value in "
            "the forward() method.")
    else:
        for hook in _global_parameter_registration_hooks.values():
            output = hook(self, name, param)
            if output is not None:
                param = output
        self._parameters[name] = param  # 最后添加到parameters的字典中key=>name value=>param

parameter

CLASStorch.nn.parameter.Parameter(data=None, requires_grad=True)[SOURCE](data传入的张量；*requires_grad参数是否需要梯度运算。模型内部如果添加参数，要写成Parameter类型而不是Tensor类型)
A kind of Tensor that is to be considered a module parameter.
Parameters are Tensor subclasses, that have a very special property when used with Module s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as Parameter, these temporaries would get registered too.
Parameters
data (Tensor) – parameter tensor.
requires_grad (bool, optional) – if the parameter requires gradient. Note that the torch.no_grad() context does NOT affect the default behavior of Parameter creation–the Parameter will still have requires_grad=True in no_grad mode. See Locally disabling gradient computation for more details. Default: True

实例

python - Correct way to register a parameter for model in Pytorch - Stack Overflow

import torch
import torch.nn as nn

class GaussianModel(nn.Module):  # 自定义模型

    def __init__(self):
        super(GaussianModel, self).__init__()
        # 手动的添加一些参数 mean会自动的加入Parameter的字典中
        self.register_parameter('mean', nn.Parameter(torch.zeros(1),
                                                     requires_grad=True))
        
        self.pdf = torch.distributions.Normal(self.state_dict()['mean'],
                                              torch.tensor([1.0]))
    def forward(self, x):
        return -self.pdf.log_prob(x)

model = GaussianModel()

add_module

# 向module中添加子模块
def add_module(self, name: str, module: Optional['Module']) -> None:
    r"""Adds a child module to the current module.

    The module can be accessed as an attribute using the given name.

    Args:
        name (str): name of the child module. The child module can be
            accessed from this module using the given name
        module (Module): child module to be added to the module.
    """
    if not isinstance(module, Module) and module is not None:
        raise TypeError(f"{torch.typename(module)} is not a Module subclass")
    elif not isinstance(name, str):
        raise TypeError(f"module name should be a string. Got {torch.typename(name)}")
    elif hasattr(self, name) and name not in self._modules:
        raise KeyError(f"attribute '{name}' already exists")
    elif '.' in name:
        raise KeyError(f"module name can't contain \".\", got: {name}")
    elif name == '':
        raise KeyError("module name can't be empty string \"\"")
    for hook in _global_module_registration_hooks.values():
        output = hook(self, name, module)
        if output is not None:
            module = output
    self._modules[name] = module

register_module

# 注册一个module，走的还是add_module的逻辑
def register_module(self, name: str, module: Optional['Module']) -> None:
    r"""Alias for :func:`add_module`."""
    self.add_module(name, module)

get_submodule

# 根据某种格式的字符串，从当前的module中获取子module
def get_submodule(self, target: str) -> "Module":
    """
    Returns the submodule given by ``target`` if it exists,
    otherwise throws an error.

    For example, let's say you have an ``nn.Module`` ``A`` that
    looks like this:

    .. code-block:: text

        A(  # A module中嵌套B；B中嵌套C、linear；C中又嵌套conv；取C则A.net_b.net_c(递归结构、树的子树)
            (net_b): Module(
                (net_c): Module(
                    (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
                )
                (linear): Linear(in_features=100, out_features=200, bias=True)
            )
        )

    (The diagram shows an ``nn.Module`` ``A``. ``A`` has a nested
    submodule ``net_b``, which itself has two submodules ``net_c``
    and ``linear``. ``net_c`` then has a submodule ``conv``.)

    To check whether or not we have the ``linear`` submodule, we
    would call ``get_submodule("net_b.linear")``. To check whether
    we have the ``conv`` submodule, we would call
    ``get_submodule("net_b.net_c.conv")``.

    The runtime of ``get_submodule`` is bounded by the degree
    of module nesting in ``target``. A query against
    ``named_modules`` achieves the same result, but it is O(N) in
    the number of transitive modules. So, for a simple check to see
    if some submodule exists, ``get_submodule`` should always be
    used.

    Args:
        target: The fully-qualified string name of the submodule
            to look for. (See above example for how to specify a
            fully-qualified string.)

    Returns:
        torch.nn.Module: The submodule referenced by ``target``

    Raises:
        AttributeError: If the target string references an invalid
            path or resolves to something that is not an
            ``nn.Module``
    """
    if target == "":
        return self

    atoms: List[str] = target.split(".")
    mod: torch.nn.Module = self

    for item in atoms:

        if not hasattr(mod, item):
            raise AttributeError(mod._get_name() + " has no "
                                 "attribute `" + item + "`")

        mod = getattr(mod, item)

        if not isinstance(mod, torch.nn.Module):
            raise AttributeError("`" + item + "` is not "
                                 "an nn.Module")

    return mod

get_parameter

# 根据一个字符串，得到当前module中的一个参数
def get_parameter(self, target: str) -> "Parameter":
    """
    Returns the parameter given by ``target`` if it exists,
    otherwise throws an error.

    See the docstring for ``get_submodule`` for a more detailed
    explanation of this method's functionality as well as how to
    correctly specify ``target``.

    Args:
        target: The fully-qualified string name of the Parameter
            to look for. (See ``get_submodule`` for how to specify a
            fully-qualified string.)

    Returns:
        torch.nn.Parameter: The Parameter referenced by ``target``

    Raises:
        AttributeError: If the target string references an invalid
            path or resolves to something that is not an
            ``nn.Parameter``
    """
    # module_path为最右边(模块路径).号的字符串，param_name(参数名称)为.号其右边的字符串
    module_path, _, param_name = target.rpartition(".")  
    # 使用get_submodule赋值给mod
    mod: torch.nn.Module = self.get_submodule(module_path)
    # 从mod中寻找param_name
    if not hasattr(mod, param_name):
        raise AttributeError(mod._get_name() + " has no attribute `"
                             + param_name + "`")

    param: torch.nn.Parameter = getattr(mod, param_name)
    # 判断是否属于torch.nn.Parameter的一个实例
    if not isinstance(param, torch.nn.Parameter):
        raise AttributeError("`" + param_name + "` is not an "
                             "nn.Parameter")

    return param

get_buffer

# 基于一个字符串，得到当前module中的某个buffer，与get_parameter逻辑差不多
def get_buffer(self, target: str) -> "Tensor":
    """
    Returns the buffer given by ``target`` if it exists,
    otherwise throws an error.

    See the docstring for ``get_submodule`` for a more detailed
    explanation of this method's functionality as well as how to
    correctly specify ``target``.

    Args:
        target: The fully-qualified string name of the buffer
            to look for. (See ``get_submodule`` for how to specify a
            fully-qualified string.)

    Returns:
        torch.Tensor: The buffer referenced by ``target``

    Raises:
        AttributeError: If the target string references an invalid
            path or resolves to something that is not a
            buffer
    """
    module_path, _, buffer_name = target.rpartition(".")

    mod: torch.nn.Module = self.get_submodule(module_path)
    # 只是计算过程中一个临时的tensor，则不需要当作module的状态保存，所以需要下方if再次判断
    if not hasattr(mod, buffer_name):
        raise AttributeError(mod._get_name() + " has no attribute `"
                             + buffer_name + "`")

    buffer: torch.Tensor = getattr(mod, buffer_name)
    # 在buffer中，没有对类型明确要求，只能根据_buffers字典判断该buffer是否为真buffer
    if buffer_name not in mod._buffers:
        raise AttributeError("`" + buffer_name + "` is not a buffer")

    return buffer

_apply

# 作用有仨：1、对所有的module施加fn操作；2、对所有的parameter施加fn；3、对所有的buffer施加fn
def _apply(self, fn, recurse=True):
    if recurse:
        for module in self.children():  # 对当前module的所有子模块进行调用
            module._apply(fn)

    def compute_should_use_set_data(tensor, tensor_applied):
        if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
            # If the new tensor has compatible tensor type as the existing tensor,
            # the current behavior is to change the tensor in-place using `.data =`,
            # and the future behavior is to overwrite the existing tensor. However,
            # changing the current behavior is a BC-breaking change, and we want it
            # to happen in future releases. So for now we introduce the
            # `torch.__future__.get_overwrite_module_params_on_conversion()`
            # global flag to let the user control whether they want the future
            # behavior of overwriting the existing tensor or not.
            return not torch.__future__.get_overwrite_module_params_on_conversion()
        else:
            return False

    for key, param in self._parameters.items():  # 对所有的参数进行迭代
        if param is None:
            continue
        # Tensors stored in modules are graph leaves, and we don't want to
        # track autograd history of `param_applied`, so we have to use
        # `with torch.no_grad():`
        with torch.no_grad():  # 在no_grad()对参数施加fn
            param_applied = fn(param)
        should_use_set_data = compute_should_use_set_data(param, param_applied)
        if should_use_set_data:
            param.data = param_applied
            out_param = param
        else:
            assert isinstance(param, Parameter)
            assert param.is_leaf
            out_param = Parameter(param_applied, param.requires_grad)
            self._parameters[key] = out_param

        if param.grad is not None:
            with torch.no_grad():
                grad_applied = fn(param.grad)
            should_use_set_data = compute_should_use_set_data(param.grad, grad_applied)
            if should_use_set_data:
                assert out_param.grad is not None
                out_param.grad.data = grad_applied
            else:
                assert param.grad.is_leaf
                out_param.grad = grad_applied.requires_grad_(param.grad.requires_grad)

    for key, buf in self._buffers.items():  # 对buffer同样施加fn
        if buf is not None:
            self._buffers[key] = fn(buf)

    return self

apply

# 模型参数初始化时，会用到该函数
def apply(self: T, fn: Callable[['Module'], None]) -> T:
    r"""Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
    as well as self. Typical use includes initializing the parameters of a model
    (see also :ref:`nn-init-doc`).
    递归的将fn应用到所有的子模块
    Args:
        fn (:class:`Module` -> None): function to be applied to each submodule

    Returns:
        Module: self

    Example::

        >>> @torch.no_grad()
        >>> def init_weights(m):
        >>>     print(m)
        >>>     if type(m) == nn.Linear:
        >>>         m.weight.fill_(1.0)
        >>>         print(m.weight)
        >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
        >>> net.apply(init_weights)
        Linear(in_features=2, out_features=2, bias=True)
        Parameter containing:
        tensor([[1., 1.],
                [1., 1.]], requires_grad=True)
        Linear(in_features=2, out_features=2, bias=True)
        Parameter containing:
        tensor([[1., 1.],
                [1., 1.]], requires_grad=True)
        Sequential(
          (0): Linear(in_features=2, out_features=2, bias=True)
          (1): Linear(in_features=2, out_features=2, bias=True)
        )

    """
    for module in self.children():
        module.apply(fn)
    fn(self)
    return self

cuda

def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
    r"""Moves all model parameters and buffers to the GPU.

    This also makes associated parameters and buffers different objects. So
    it should be called before constructing optimizer if the module will
    live on GPU while being optimized.

    .. note::
        This method modifies the module in-place.

    Args:
        device (int, optional): if specified, all parameters will be
            copied to that device

    Returns:
        Module: self
    """
    return self._apply(lambda t: t.cuda(device))  # 对所有的parameter、buffer、tensor等应用.cuda

type

# 将所有的parameter&buffer都转化数据类型
def type(self: T, dst_type: Union[dtype, str]) -> T:
    r"""Casts all parameters and buffers to :attr:`dst_type`.

    .. note::
        This method modifies the module in-place.

    Args:
        dst_type (type or string): the desired type

    Returns:
        Module: self
    """
    return self._apply(lambda t: t.type(dst_type))

half

# 仅针对浮点类型进行抓换
def half(self: T) -> T:
    r"""Casts all floating point parameters and buffers to ``half`` datatype.

    .. note::
        This method modifies the module in-place.

    Returns:
        Module: self
    """
    return self._apply(lambda t: t.half() if t.is_floating_point() else t)

to_empty

# 将当前模型里的所有parameter&buffers移动到一个设备上，但不会拷贝存储空间
def to_empty(self: T, *, device: Union[str, device], recurse: bool = True) -> T:
    r"""Moves the parameters and buffers to the specified device without copying storage.

    Args:
        device (:class:`torch.device`): The desired device of the parameters
            and buffers in this module.
        recurse (bool): Whether parameters and buffers of submodules should
            be recursively moved to the specified device.

    Returns:
        Module: self
    """
    return self._apply(lambda t: torch.empty_like(t, device=device), recurse=recurse)