R-Drop: Regularized Dropout for Neural Networks
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
Dropout在训练和推理时存在不一致的问题(集成学习)R 对每个子模型的分布做一个KL散度

import torch
conv_layer = torch.nn.Conv2d(in_channels=2, out_channels=2, kernel_size=3, padding="same")
for i in conv_layer.named_parameters():
print(i)
'''
('weight', Parameter containing:
tensor([[[[-0.1885, 0.1133, -0.0103],
[-0.0462, -0.1434, -0.0569],
[ 0.1969, 0.0827, -0.1245]],
[[ 0.1554, 0.1459, 0.0405],
[-0.0806, 0.1438, -0.1319],
[-0.1495, -0.0960, 0.2193]]],
[[[-0.1748, 0.1517, 0.0040],
[ 0.0097, 0.1565, 0.1850],
[-0.0261, 0.1298, 0.0211]],
[[-0.0964, -0.0133, -0.0378],
[ 0.0617, -0.2037, 0.0011],
[-0.0542, -0.1499, 0.0187]]]], requires_grad=True))
('bias', Parameter containing:
tensor([0.0560, 0.2200], requires_grad=True))
'''
conv_layer.weights.size()
'''
对于输出的第1个通道而言:
对于输入的2个通道都有3×3的卷积核分别在这2个输入通道上做卷积再求和,就得到了,第1个输出通道的特征值。
对于输出的第2个通道而言:
又来了2个卷积核,对输入的2个通道做卷积,得到第2个输出通道的特征值。
torch.Size([2, 2, 3, 3]) 3,3为Kernel_size大小 2,2 是输出×输入
'''
conv_layer.bias.size()
'''
torch.Size([2]) 和输出通道一致
'''
conv_layer = torch.nn.Conv2d(in_channels=2, out_channels=4, kernel_size=3, padding="same")
print(conv_layer.weight.size())
'''
4中每个输出通道,都有2套3×3的矩阵对输入进行滑动卷积,再将2个通道的值加起来再加上bias就构成输出通道的特征值
torch.Size([4, 2, 3, 3])
'''
Conv2d
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
stride
controls the stride for the cross-correlation, a single number or a tuple.padding
controls the amount of padding applied to the input. It can be either a string {‘valid’, ‘same’} or an int / a tuple of ints giving the amount of implicit padding applied on both sides.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation
does.groups
controls the connections between inputs and outputs. in_channels
and out_channels
must both be divisible by groups
. For example,
- At groups=1, all inputs are convolved to all outputs.
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
- At groups=
in_channels
, each input channel is convolved with its own set of filters (of size out_channelsin_channelsin_channelsout_channels).
The parameters kernel_size
, stride
, padding
, dilation
can either be:
- a single
int
– in which case the same value is used for the height and width dimension - a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
深度和可分离卷积
groups
> 1;且能够被input_channels
和out_channels
整除
同时将两者分为好多组,单独做卷积。
好处:
1、并没有将通道完全的mix
,假设有8个通道,分为2组,前4个通道后4个通道分别做一个卷积组
2、计算量下降
resnet的一个变体网络 RepConv

point-wise convolution
1×1的卷积
卷积基本假设:局部关联性、平移不变性
不考虑周围点,仅考虑自身:只是将每个通道上的值进行加权求和channels mix
depth-wise convolution
groups > 1
不考虑对所有channels进行mix,而是划分部分mix
如何将3×3
卷积、1×1
、x
融合为1
个3×3
卷积?

import torch
import torch.nn as nn
import torch.nn.functional as F
in_channels = 2
out_channels = 2
kernel_size = 3
w = 9
h = 9
x = torch.ones(1, in_channels, w, h)
conv_2d = nn.Conv2d(in_channels, out_channels, kernel_size, padding="same")
conv_2d_pointwise = nn.Conv2d(in_channels, out_channels, kernel_size=1, padding="same")
result1 = conv_2d(x) + conv_2d_pointwise(x) + x
pointwise_to_conv_weight = F.pad(conv_2d_pointwise.weight, pad=[1, 1, 1, 1, 0, 0, 0, 0])
conv_2d_for_pointwise = nn.Conv2d(in_channels, out_channels, kernel_size, padding="same")
conv_2d_for_pointwise.weight = nn.Parameter(pointwise_to_conv_weight)
conv_2d_for_pointwise.bias = conv_2d_pointwise.bias
zeros = torch.unsqueeze(torch.zeros(kernel_size, kernel_size), 0)
stars = torch.unsqueeze(F.pad(torch.ones(1, 1), pad=[1, 1, 1, 1]), 0)
stars_zeros = torch.unsqueeze(torch.cat([stars, zeros], 0), 0)
zeros_stars = torch.unsqueeze(torch.cat([zeros, stars], 0), 0)
identity_to_conv_weight = torch.cat([stars_zeros, zeros_stars], 0)
identity_to_conv_bias = torch.zeros([out_channels])
conv_2d_for_identity = nn.Conv2d(in_channels, out_channels, kernel_size, padding="same")
conv_2d_for_identity.weight = nn.Parameter(identity_to_conv_weight)
conv_2d_for_identity.bias = nn.Parameter(identity_to_conv_bias)
result2 = conv_2d(x) + conv_2d_for_pointwise(x) + conv_2d_for_identity(x)
print(torch.all(torch.isclose(result1, result2)))
conv_2d_for_fussion = nn.Conv2d(in_channels, out_channels, kernel_size, padding="same")
conv_2d_for_fussion.weight = nn.Parameter(conv_2d.weight.data + conv_2d_for_pointwise.weight.data + conv_2d_for_identity.weight.data)
conv_2d_for_fussion.bias = nn.Parameter(conv_2d.bias.data + conv_2d_for_pointwise.bias.data + conv_2d_for_identity.bias.data)
result3 = conv_2d_for_fussion(x)
print(torch.all(torch.isclose(result2, result3)))
'''
tensor(True)
tensor(True)
'''
还不快抢沙发