《动手学深度学习》——预备知识

· 2021-08-08 · # Deeplearning # 技术学习

数据操作

运算

对于任意具有相同形状的张量，常见的标准算术运算符（+、-、*、/ 和 **）都可以被升级为按元素运算。

x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x**y  # **运算符是求幂运算
# 输出
(tensor([ 3.,  4.,  6., 10.]),
tensor([-1.,  0.,  2.,  6.]),
tensor([ 2.,  4.,  8., 16.]),
tensor([0.5000, 1.0000, 2.0000, 4.0000]),
tensor([ 1.,  4., 16., 64.]))

将多个张量连结在一起

X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)
# 输出
(tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]]),
tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]]))

广播机制

广播机制 （broadcasting mechanism）的工作方式如下

首先，通过适当复制元素来扩展一个或两个数组，以便在转换之后，两个张量具有相同的形状
其次，对生成的数组执行按元素操作

a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a + b #它们的形状不匹配。我们将两个矩阵广播为一个更大的 3×2 矩阵，矩阵a将复制列，矩阵b将复制行，然后再按元素相加
# 输出
tensor([[0, 1],
        [1, 2],
        [2, 3]])

线性代数

张量算法基本性质

两个矩阵的按元素乘法称为哈达玛积（Hadamard product）（数学符号 ⊙ ）
$\mathbf{A} \odot \mathbf{B} = \begin{bmatrix} a_{11} b_{11} & a_{12} b_{12} & \dots & a_{1n} b_{1n} \\ a_{21} b_{21} & a_{22} b_{22} & \dots & a_{2n} b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} b_{m1} & a_{m2} b_{m2} & \dots & a_{mn} b_{mn} \end{bmatrix}$
```
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone()
A * B
# 输出
tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.],
        [144., 169., 196., 225.],
        [256., 289., 324., 361.]])
```

降维

默认情况下，调用求和函数会沿所有的轴降低张量的维度，使它变为一个标量。我们还可以指定张量沿哪一个轴来通过求和降低维度。

为了通过求和所有行的元素来降维（轴0），指定axis=0。因此输入的轴0的维数在输出形状中丢失
```
A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape
# output
(tensor([ 6., 22., 38., 54., 70.]), torch.Size([5]))
```

指定axis=1将通过汇总所有列的元素降维（轴1）。因此，输入的轴1的维数在输出形状中消失

A_sum_axis1 = A.sum(axis=1)
A_sum_axis1, A_sum_axis1.shape
# output
(tensor([ 6., 22., 38., 54., 70.]), torch.Size([5]))

非降维求和

```python
sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape
(tensor([[ 6.],
    [22.],
    [38.],
    [54.],
    [70.]]), torch.Size([5, 1]))
```

点积

```python
y = torch.ones(4, dtype=torch.float32)
x, y, torch.dot(x, y)
(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))
```

向量积

```python
A.shape, x.shape, torch.mv(A, x)
(torch.Size([5, 4]), torch.Size([4]), tensor([ 14.,  38.,  62.,  86., 110.]))
```

矩阵乘法

```python
B = torch.ones(4, 3)
torch.mm(A, B)
tensor([[ 6.,  6.,  6.],
        [22., 22., 22.],
        [38., 38., 38.],
        [54., 54., 54.],
        [70., 70., 70.]])
```

范数

L2范数
$\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2},$
```
u = torch.tensor([3.0, -4.0])
torch.norm(u) 
# output
tensor(5.)
```
L1范数
$\|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|.$
```
torch.abs(u).sum()
# output
tensor(7.)
```

微分

梯度

设函数 $f:\mathbb{R}^n\rightarrow\mathbb{R}$ 的输入是一个 $n$ 维向量 $\mathbf{x}=[x_1,x_2,\ldots,x_n]^\top$ ，并且输出是一个标量。函数 $f(\mathbf{x})$ 相对于 $x$ 的梯度是一个包含 $n$ 个偏导数的向量:

\nabla_{\mathbf{x}} f(\mathbf{x}) = \bigg[\frac{\partial f(\mathbf{x})}{\partial x_1}, \frac{\partial f(\mathbf{x})}{\partial x_2}, \ldots, \frac{\partial f(\mathbf{x})}{\partial x_n}\bigg]^\top

其中 $\nabla_{\mathbf{x}} f(\mathbf{x})$ 通常在没有歧义时被 $\nabla f(\mathbf{x})$ 取代。
假设 $x$ 为 $n$ 维向量，在微分多元函数时经常使用以下规则:

对于所有 $\mathbf{A} \in \mathbb{R}^{m \times n}$ ，都有 $\nabla_{\mathbf{x}} \mathbf{A} \mathbf{x} = \mathbf{A}^\top$
对于所有 $\mathbf{A} \in \mathbb{R}^{n \times m}$ ，都有 $\nabla_{\mathbf{x}} \mathbf{x}^\top \mathbf{A} = \mathbf{A}$
对于所有 $\mathbf{A} \in \mathbb{R}^{n \times n}$ ，都有 $\nabla_{\mathbf{x}} \mathbf{x}^\top \mathbf{A} \mathbf{x} = (\mathbf{A} + \mathbf{A}^\top)\mathbf{x}$
$\nabla_{\mathbf{x}} \|\mathbf{x} \|^2 = \nabla_{\mathbf{x}} \mathbf{x}^\top \mathbf{x} = 2\mathbf{x}$
同样，对于任何矩阵 $\mathbf{X}$ ，我们都有 $\nabla_{\mathbf{X}} \|\mathbf{X} \|_F^2 = 2\mathbf{X}$