【Numpy】学习笔记1

文章目录

一、概述
二、基础

1、创建数组

（1）通过 list 创建数组
（2）特殊数组

2、数组运算

（1）基础运算（一维）
（2）基础运算（二维）

3、数学函数
4、数组切片和索引
5、数组形状操作

（1）形变
（2）数组的拼合

6、数组排序
7、数组统计

三、进阶（以题的形式）

一、概述

numpy 支持大量高维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。机器学习涉及到大量对数组的变换和运算。

二、基础

import numpy as np
print(np.__version__)

输出结果

1.14.3

1、创建数组

NumPy 的主要对象是多维数组 Ndarray。在 NumPy 中维度（dimensions）叫做轴（axes），轴的个数叫做秩（rank）。

下面的数组的秩为 2。第一个维度（轴）长度为 2,第二个维度（轴）长度为 3。

[[1., 2., 3.],
 [4., 5., 6.]]

（1）通过 list 创建数组

## 一维数组
print(np.array([1, 2, 3]))
## 二维数组
print(np.array([[1, 2, 3], [4, 5, 6]])

输出结果

array([1, 2, 3])
array([[1, 2, 3],
       [4, 5, 6]])

（2）特殊数组

zeros()、ones()、arange()、eye()、linespace()、random.rand()、random.randint()、fromfunction()

## 全为 0 的二维数组
np.zeros((3, 5))
# array([[0., 0., 0., 0., 0.],
#        [0., 0., 0., 0., 0.],
#        [0., 0., 0., 0., 0.]])


## 全为 1 的三维数组
np.ones((2, 3, 4))
# array([[[1., 1., 1., 1.],
#         [1., 1., 1., 1.],
#         [1., 1., 1., 1.]],

#        [[1., 1., 1., 1.],
#         [1., 1., 1., 1.],
#         [1., 1., 1., 1.]]])

## 一维等差数组
np.arange(5)
# array([0, 1, 2, 3, 4])

## 二维等差数组
np.arange(6).reshape(2, 3)
# array([[0, 1, 2],
#        [3, 4, 5]])

## 单位矩阵（二维数组）
np.eye(3)
# array([[1., 0., 0.],
#        [0., 1., 0.],
#        [0., 0., 1.]])

## 等间隔一维数组:[1, 10]区间上等间隔的六个数
np.linspace(1, 10, num=6)
# array([ 1. ,  2.8,  4.6,  6.4,  8.2, 10. ])

## 二维随机数组
np.random.rand(2, 3)
# array([[0.52506848, 0.53357141, 0.53381268],
#        [0.16647792, 0.22610891, 0.62271866]])

## 二维随机整数数组（数值小于 5）
np.random.randint(5, size=(2, 3))
# array([[2, 0, 4],
#        [0, 3, 0]])

## 自定义函数创建数组：创建大小为(3,5)的数据
np.fromfunction(lambda i, j : i + j, (3, 5))
# array([[0., 1., 2., 3., 4.],
#        [1., 2., 3., 4., 5.],
#        [2., 3., 4., 5., 6.]])

2、数组运算

（1）基础运算（一维）

相对应元素之间进行运算。

## 创建两个一维数组，a、b
a = np.array([10, 20, 30, 40, 50])
b = np.arange(1, 6)
a, b
# (array([10, 20, 30, 40, 50]), array([1, 2, 3, 4, 5]))

print(a + b)
print(a - b)
print(a * b)
print(a / b)
# array([11, 22, 33, 44, 55])
# array([ 9, 18, 27, 36, 45])
# array([ 10,  40,  90, 160, 250])
# array([10., 10., 10., 10., 10.])

（2）基础运算（二维）

重点：矩阵乘法、转置、逆矩阵

## 创建二维矩阵：A、B
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
A, B
# (array([[1, 2],
#         [3, 4]]), array([[5, 6],
#         [7, 8]]))

print(A + B)
print(A - B)
# array([[ 6,  8],
#        [10, 12]])
# array([[-4, -4],
#        [-4, -4]])

## 矩阵元素间的乘法
print(A * B)
# array([[ 5, 12],
#        [21, 32]])

## 矩阵乘法运算
print(np.dot(A, B))
# array([[19, 22],
#        [43, 50]])

## 如果使用 np.mat 将二维数组准确定义为矩阵，就可以直接使用 * 完成矩阵乘法计算
print(np.mat(A) * np.mat(B))
# matrix([[19, 22],
#         [43, 50]])

## 数乘矩阵
print(A * 2)
# array([[2, 4],
#        [6, 8]])

## 矩阵的转置
print(A.T)
# array([[1, 3],
#        [2, 4]])

## 矩阵求逆
print(np.linalg.inv(A))
# array([[-2. ,  1. ],
#        [ 1.5, -0.5]])

3、数学函数

print(a)
# [10 20 30 40 50]

np.sin(a)        # 三角函数
np.exp(a)        # 指数
np.sqrt(a)		 # 开方运算
np.power(a, 3)   # 立方运算
# array([-0.54402111,  0.91294525, -0.98803162,  0.74511316, -0.26237485])
# array([2.20264658e+04, 4.85165195e+08, 1.06864746e+13, 2.35385267e+17, 5.18470553e+21])
# array([3.16227766, 4.47213595, 5.47722558, 6.32455532, 7.07106781])
# array([  1000,   8000,  27000,  64000, 125000])

4、数组切片和索引

## 一维切片
a = np.array([1, 2, 3, 4, 5])
a[0], a[-1]
# (1, 5)

print(a[0:2], a[:-1])
# (array([1, 2]), array([1, 2, 3, 4]))

## 二维切片
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
## 第一行、最后一行
print(a[0], a[-1])
## 第二列
print(a[:, 1])
## 第 2，3 行
print(a[1:3, :])

5、数组形状操作

（1）形变

reshape()方法：改变数组形状（但不改变原始数组）
ravel()方法：将多维数组拉成一维数组。

a = np.random.random((3, 2))
print(a)
# [[0.32136021 0.28243107]
#  [0.98636992 0.96468233]
#  [0.02745087 0.75451614]]

print(a.shape)         # 数组形状:(3, 2)
print(a.reshape(2,3))  # 改变数组形状=》(2, 3)
print(a)
# array([[0.32136021, 0.28243107, 0.98636992],
#       [0.96468233, 0.02745087, 0.75451614]])
# array([[0.32136021, 0.28243107],
#        [0.98636992, 0.96468233],
#        [0.02745087, 0.75451614]])

## 展平数组，即拉成一维数组
print(a.ravel()) 
# array([0.32136021, 0.28243107, 0.98636992, 0.96468233, 0.02745087, 0.75451614])

（2）数组的拼合

a = np.random.randint(10, size=(3, 4))
b = np.random.randint(10, size=(3, 4))
a, b
# (array([[1, 1, 4, 0],
#         [2, 2, 4, 3],
#         [1, 8, 4, 3]]), array([[2, 9, 1, 1],
#         [7, 9, 0, 2],
#         [6, 5, 0, 9]]))

## 垂直拼合数组
print(np.vstack((a, b)))
## 水平拼合数组
print(np.hstack((a, b)))
# array([[1, 1, 4, 0],
#        [2, 2, 4, 3],
#        [1, 8, 4, 3],
#        [2, 9, 1, 1],
#        [7, 9, 0, 2],
#        [6, 5, 0, 9]])
# array([[1, 1, 4, 0, 2, 9, 1, 1],
#        [2, 2, 4, 3, 7, 9, 0, 2],
#        [1, 8, 4, 3, 6, 5, 0, 9]])

## 沿横轴分割数组
print(np.hsplit(a, 4))
## 沿纵轴分割数组
print(np.vsplit(a, 3))
# [array([[1],
#         [2],
#         [1]]), array([[1],
#         [2],
#         [8]]), array([[4],
#         [4],
#         [4]]), array([[0],
#         [3],
#         [3]])]
# [array([[1, 1, 4, 0]]), array([[2, 2, 4, 3]]), array([[1, 8, 4, 3]])]

6、数组排序

axis = 0 代表列；
axis = 1 代表行。

a = np.array(([1, 4, 3], [6, 2, 9], [4, 7, 2]))

## 返回每列最大值
print(np.max(a, axis = 0))
## 返回每行最小值
print(np.min(a, axis = 1))
## 返回每列最大值索引
print(np.argmax(a, axis = 0))
## 返回每行最小值索引
print(np.argmin(a, axis = 1))
# array([6, 7, 9])
# array([1, 2, 2])
# array([1, 2, 1])
# array([0, 1, 2])

7、数组统计

## 统计数组各列的中位数
print(np.median(a, axis = 0))
## 各行的算术平均值
print(np.mean(a, axis = 1))
## 各列的加权平均值
print(np.average(a, axis = 0))
## 各行的方差
print(np.var(a, axis = 1))
## 各列的标准偏差
print(np,std(a, axis = 0))

# array([4., 4., 3.])
# array([2.66666667, 5.66666667, 4.33333333])
# array([3.66666667, 4.33333333, 4.66666667])
# array([1.55555556, 8.22222222, 4.22222222])
# array([2.05480467, 2.05480467, 3.09120617])

三、进阶（以题的形式）

1、创建一个 5x5 的二维数组，其中边界值为1，其余值为0

Z = np.ones((5, 5))
Z[1:-1, 1:-1] = 0
print(Z)
# array([[1., 1., 1., 1., 1.],
#        [1., 0., 0., 0., 1.],
#        [1., 0., 0., 0., 1.],
#        [1., 0., 0., 0., 1.],
#        [1., 1., 1., 1., 1.]])

2、使用数字 0 将一个全为 1 的 5x5 二维数组包围
==》np.pad() 方法

Z = np.ones((5, 5))
Z = np.pad(Z, pad_width=1, mode='constant', constant_values = 0)
print(Z)
# array([[0., 0., 0., 0., 0., 0., 0.],
#        [0., 1., 1., 1., 1., 1., 0.],
#        [0., 1., 1., 1., 1., 1., 0.],
#        [0., 1., 1., 1., 1., 1., 0.],
#        [0., 1., 1., 1., 1., 1., 0.],
#        [0., 1., 1., 1., 1., 1., 0.],
#        [0., 0., 0., 0., 0., 0., 0.]])

3、创建一个 5x5 的二维数组，并设置值 1, 2, 3, 4 落在其对角线下方
==》np.diag()，其中k表示元素的位置，k=0表示对角线，k>0表示对角线上方，k<0表示对角线下方，k的大小表示与对角线的“距离”（决定数组的大小）

Z = np.diag(1 + np.arange(4), k=-3)
print(Z)
# array([[0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0],
#        [0, 0, 0, 0, 0, 0, 0],
#        [1, 0, 0, 0, 0, 0, 0],
#        [0, 2, 0, 0, 0, 0, 0],
#        [0, 0, 3, 0, 0, 0, 0],
#        [0, 0, 0, 4, 0, 0, 0]])
Z = np.diag(1 + np.arange(4), k=0)
print(Z)
# array([[1, 0, 0, 0],
#        [0, 2, 0, 0],
#        [0, 0, 3, 0],
#        [0, 0, 0, 4]])

4、创建一个 10x10 的二维数组，并使得 1 和 0 沿对角线间隔放置

Z = np.zeros((10, 10), dtype=int)
Z[1::2, ::2] = 1
Z[::2, 1::2] = 1
print(Z)
# array([[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
#        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
#        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
#        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
#        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
#        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
#        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
#        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
#        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
#        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]])

5、创建一个 0-10 的一维数组，并将 (1, 9] 之间的数全部反转成负数

Z = np.arange(11)
Z[(1 < Z) & (Z <= 9)] *= -1
print(Z)
# [ 0  1 -2 -3 -4 -5 -6 -7 -8 -9 10]

6、找出两个一维数组中相同的元素

Z1 = np.random.randint(0, 10, 15)
Z2 = np.random.randint(0, 10, 10)
print("Z1:", Z1)
print("Z2:", Z2)
np.intersect1d(Z1, Z2)
# Z1: [3 8 5 5 1 0 5 2 3 8 3 1 4 0 4]
# Z2: [2 8 4 8 3 7 3 1 5 3]
# array([1, 2, 3, 4, 5, 8])

7、找出两个一维数组中相同的元素

Z = np.zeros((5, 5))
Z += np.arange(1, 6)
print(Z)
# [[1. 2. 3. 4. 5.]
#  [1. 2. 3. 4. 5.]
#  [1. 2. 3. 4. 5.]
#  [1. 2. 3. 4. 5.]
#  [1. 2. 3. 4. 5.]]

8、时间计算

yesterday = np.datetime64('today', 'D') - np.timedelta64(1, 'D')
today = np.datetime64('today', 'D')
tomorrow = np.datetime64('today', 'D') + np.timedelta64(1, 'D')
print('Yesterday: ', yesterday)
print('Today: ', today)
print('Tomorrow: ', tomorrow)
# Yesterday:  2019-01-02
# Today:  2019-01-03
# Tomorrow:  2019-01-04

9、提取一个随机数组的整数部分

Z = np.random.uniform(0, 10, 10)
print("原始值： ", Z)

print("方法 1：", Z - Z % 1)      # 减去小数部分
print("方法 2：", np.floor(Z))    # 返回输入值的下限，即输入参数的最大整数。
print("方法 3：", np.ceil(Z) - 1) # 返回输入值的上限
print("方法 4：", Z.astype(int))  # 类型转换
print("方法 5：", np.trunc(Z))    # 截取
# 原始值：  [5.56601634 4.77164344 3.08836714 3.30325796 6.15169633 9.79808824 4.38637838 6.39368379 9.62265927 2.32704462]
# 方法 1： [5. 4. 3. 3. 6. 9. 4. 6. 9. 2.]
# 方法 2： [5. 4. 3. 3. 6. 9. 4. 6. 9. 2.]
# 方法 3： [5. 4. 3. 3. 6. 9. 4. 6. 9. 2.]
# 方法 4： [5 4 3 3 6 9 4 6 9 2]
# 方法 5： [5. 4. 3. 3. 6. 9. 4. 6. 9. 2.]

10、创建一个长度为 5 的等间隔一维数组，其值域范围从 0 到 1，但是不包括 0 和 1

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
在指定的间隔 [start, stop] 内返回均匀间隔的数字。其中，endpoint为True表示包括stop，False表示不包括；

Z1 = np.linspace(0,1,6,endpoint=False)[1:]
Z2 = np.linspace(0, 1, 7)[1:-1]
print(Z1)
print(Z2)
# array([0.16666667, 0.33333333, 0.5       , 0.66666667, 0.83333333])
# array([0.16666667, 0.33333333, 0.5       , 0.66666667, 0.83333333])

11、创建一个 3x3 的二维数组，并将列按升序排序:

按列排序：axis = 0

Z = np.array([[7, 4, 3], [3, 1, 2], [4, 2, 6]])
Z.sort(axis = 0)
Z
# array([[3, 4, 7],
#        [1, 2, 3],
#        [2, 4, 6]])

12、打印每个 NumPy 标量类型的最小值和最大值

np.iinfo(dtype)：dtype 类型（需为整型）的信息
np.finfo(dtype)：dtype 类型（需为浮点型）的信息

for dtype in [np.int8, np.int32, np.int64]:
    print("The minimum value of {}: ".format(dtype), np.iinfo(dtype).min)
    print("The maximum value of {}: ".format(dtype), np.iinfo(dtype).max)
for dtype in [np.float32, np.float64]:
    print("The minimum value of {}: ".format(dtype), np.finfo(dtype).min)
    print("The maximum value of {}: ".format(dtype), np.finfo(dtype).max)
# The minimum value of <class 'numpy.int8'>:  -128
# The maximum value of <class 'numpy.int8'>:  127
# The minimum value of <class 'numpy.int32'>:  -2147483648
# The maximum value of <class 'numpy.int32'>:  2147483647
# The minimum value of <class 'numpy.int64'>:  -9223372036854775808
# The maximum value of <class 'numpy.int64'>:  9223372036854775807
# The minimum value of <class 'numpy.float32'>:  -3.4028235e+38
# The maximum value of <class 'numpy.float32'>:  3.4028235e+38
# The minimum value of <class 'numpy.float64'>:  -1.7976931348623157e+308
# The maximum value of <class 'numpy.float64'>:  1.7976931348623157e+308

13、将 float32 转换为整型：astype()

Z= np.arange(10, dtype=np.float32)
print(Z)

Z = Z.astype(np.int32, copy=False)
Z
# [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

14、从随机一维数组中找出距离给定数值（0.5）最近的数

flat 数组的迭代器

Z = np.random.uniform(0, 1, 20)
z = 0.5
m = Z.flat[np.abs(Z-z).argmin()]
m

15、非零元素索引位置：np.nonzero()

Z = np.nonzero([1,2,0,3,5,2,0])
Z
# (array([0, 1, 3, 4, 5]),)

16、对于给定的 5x5 二维数组，在其内部随机放置 p 个值为 1 的数

np.random.choice(range(5 * 6), p, replace=False)
- 在range(5 * 6)中选择p个数据。
np.put(Z, pos, value)：将Z中索引为pos的值设置为value

p = 3
Z = np.zeros((5, 6))
np.put(Z, np.random.choice(range(5 * 5), p, replace=False), 1)
Z
# (array([0, 1, 3, 4, 5]),)

17、对于随机的 3x3 二维数组，减去数组每一行的平均值

X = np.random.rand(3, 3)
print(X)

Y = X - X.mean(axis=1, keepdims = True)
Y
# [[0.03511832 0.90273634 0.13177977]
#  [0.60108545 0.63640746 0.50655992]
#  [0.80052123 0.94236654 0.33120157]]
# array([[-0.32142649,  0.54619153, -0.22476504],
#        [ 0.0197345 ,  0.05505652, -0.07479102],
#        [ 0.10915812,  0.25100343, -0.36016154]])

18、获得二维数组点积结果的对角线数组

A = np.random.uniform(0, 1, (3, 3))
B = np.random.uniform(0, 1, (3, 3))

print(np.dot(A, B))
## 方法1：较慢
print(np.diag(np.dot(A, B)))
## 方法2：较快
print(np.sum(A * B.T, axis=1))
## 方法3：快
print(np.einsum("ij, ij->i", A, B))
# [[0.46626606 0.39535655 1.13705572]
#  [0.41883304 0.58240235 1.26570632]
#  [0.20773629 0.44994393 0.80665664]]
# [0.46626606 0.58240235 0.80665664]
# [0.46626606 0.58240235 0.80665664]
# [0.78789106 0.58938838 0.38117496]

19、找到随机一维数组中前 p 个最大值

Z = np.random.randint(1, 100, 100)
print(Z)

p = 5
Z[np.argsort(Z)[-p:]]  ## 默认升序排列
# [45  2 19 56 92  5  2 21 58 55 83 51 94 50 32 67 41 37 34 86 58 40 40 59  7 68 63 26 31  3 85 83 23  8 16 32 31 94 65 50 74 95 13 35 58 64  3 47 30 74  3 63  1 24 64 85 42 31 52 12 88 58 80 42 27 73 43 93 83 21 85 64 44 35 64 65 66 29 57 53 47 40 47 97 24 87 62 14 46 97  9 85 72 46 11 53 73 43 49 50]
# array([94, 94, 95, 97, 97])

20、运算

x 的 n 次幂：np.power(x, n)
x小数位设置：np.set_printoptions(precision=2)

Z = np.random.random((5, 5))
print(Z)

Z = np.power(Z, 4)
print(Z)
np.set_printoptions(precision=2)
Z
# [[0.29 0.35 0.36 0.08 0.8 ]
#  [0.96 0.68 0.85 1.   0.86]
#  [0.56 0.85 0.65 0.42 0.36]
#  [0.84 0.01 0.17 0.18 0.05]
#  [0.06 0.11 0.24 0.21 0.77]]
# [[6.74e-03 1.47e-02 1.64e-02 3.29e-05 4.01e-01]
#  [8.61e-01 2.12e-01 5.29e-01 9.97e-01 5.55e-01]
#  [9.53e-02 5.18e-01 1.74e-01 2.98e-02 1.71e-02]
#  [5.10e-01 8.32e-09 8.10e-04 1.04e-03 5.65e-06]
#  [1.74e-05 1.31e-04 3.30e-03 2.11e-03 3.58e-01]]
# array([[6.74e-03, 1.47e-02, 1.64e-02, 3.29e-05, 4.01e-01],
#        [8.61e-01, 2.12e-01, 5.29e-01, 9.97e-01, 5.55e-01],
#        [9.53e-02, 5.18e-01, 1.74e-01, 2.98e-02, 1.71e-02],
#        [5.10e-01, 8.32e-09, 8.10e-04, 1.04e-03, 5.65e-06],
#        [1.74e-05, 1.31e-04, 3.30e-03, 2.11e-03, 3.58e-01]])

21、百分位数（25%，50%，75%）

a = np.random.randint(10, 19, 10)
print(a)

np.percentile(a, q = [25, 50, 75])
# [13 13 14 11 11 10 11 13 10 11]
# array([11., 11., 13.])

22、找出数组中缺失值的总数及所在位置

Z = np.random.rand(10, 10)
Z[np.random.randint(10, size=5), np.random.randint(10, size=5)] = np.nan

print("缺失值总数：\n", np.isnan(Z).sum())
print("缺失值索引：\n", np.where(np.isnan(Z)))
# 缺失值总数：
#  5
# 缺失值索引：
#  (array([1, 4, 4, 7, 9]), array([3, 8, 9, 5, 8]))

23、从随机数组中删除包含缺失值的行
获取元素，当元素的索引为True则保留，False为不保留

Z[np.sum(np.isnan(Z), axis=1) == 0]
# array([[0.95, 0.54, 0.2 , 0.01, 0.23, 0.25, 0.41, 0.75, 0.58, 0.36],
#        [0.57, 0.  , 0.04, 0.78, 0.86, 0.95, 0.93, 0.53, 0.06, 0.36],
#        [0.02, 0.65, 0.49, 0.73, 0.32, 0.75, 0.16, 0.29, 0.36, 0.92],
#        [0.51, 0.88, 0.43, 0.14, 0.47, 0.55, 0.69, 0.2 , 0.18, 0.3 ],
#        [0.5 , 0.8 , 0.63, 0.01, 0.82, 0.93, 0.21, 0.28, 0.63, 0.18]])

24、统计随机数组中的各元素的数量
np.unique() 函数：返回值中，第 2 个数组对应第 1 个数组元素的数量

Z = np.random.randint(0, 100, 25).reshape(5, 5)
print(Z)
np.unique(Z, return_counts=True)
# [[32 87 26 30 86]
#  [ 6 65 64 49 73]
#  [34 41 53 28 89]
#  [13 12 68  3 79]
#  [ 6 19 31 80 66]]
# (array([ 3,  6, 12, 13, 19, 26, 28, 30, 31, 32, 34, 41, 49, 53, 64, 65, 66,
#         68, 73, 79, 80, 86, 87, 89]),
#  array([1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#         1, 1]))

25、将数组中各元素按指定分类转换为文本值
指定类别如下：
1 → 汽车
2 → 公交车
3 → 火车

Z = np.random.randint(1, 4, 10)
print(Z)

label_map = {1:"汽车", 2:"公交车", 3:"火车"}
[label_map[x] for x in Z]
# [1 3 3 3 2 1 2 1 2 1]
# ['汽车', '火车', '火车', '火车', '公交车', '汽车', '公交车', '汽车', '公交车', '汽车']

26、将多个 1 维数组拼合为单个 Ndarray
np.concatenate()

Z1 = np.arange(3)
Z2 = np.arange(3, 7)
Z3 = np.arange(7, 10)

Z = np.array([Z1, Z2, Z3])
print(Z)

np.concatenate(Z)
# 等价
np.concatenate((Z1, Z2, Z3), axis=0)
# [array([0, 1, 2]) array([3, 4, 5, 6]) array([7, 8, 9])]
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

27、得到二维随机数组各行的最大值
np.amax()
np.apply_along_axis()

Z = np.random.randint(1, 100, [5, 5])
print(Z)

print(np.amax(Z, axis = 1))
print(np.apply_along_axis(np.max, arr=Z, axis=1))
# [[13 21 75 27  6]
#  [92 20 86 88 50]
#  [14 20 96  3 34]
#  [ 3 63  4 57 95]
#  [56 97  5  6 37]]
# [75 92 96 95 97]
# [75 92 96 95 97]

28、计算两个数组之间的欧氏距离
np.linalg.norm()

a = np.array([1, 2])
b = np.array([7, 8])

np.linalg.norm(b - a)
# 8.48528137423857

29、打印复数的实部和虚部
real：实部，imag：虚部

a = np.array([1 + 2j, 3 + 4j, 5 + 6j])
print("实部：", a.real)
print("虚部：", a.imag)
# 实部： [1. 3. 5.]
# 虚部： [2. 4. 6.]

30、求解给出矩阵的逆矩阵并验证
np.linalg.inv(matrix)

matrix = np.array([[1., 2.], [3., 4.]])
inverse_matrix = np.linalg.inv(matrix)

assert np.allclose(np.dot(matrix, inverse_matrix), np.eye(2))
inverse_matrix
# array([[-2. ,  1. ],
#        [ 1.5, -0.5]])

31、使用 Z-Score 标准化算法对数据进行标准化处理
Z-Score 标准化公式：
$Z = \frac{X-\mathrm{mean}(X)}{\mathrm{sd}(X)}$

def zscore(x, axis = None):
    xmean = x.mean(axis=axis, keepdims=True)
    xstd = np.std(x, axis=axis, keepdims=True)
    zscore = (x - xmean) / xstd
    return zscore

Z = np.random.randint(10, size=(5, 5))
print(Z)
zscore(Z)
# [[2 5 7 1 4]
#  [3 9 8 9 7]
#  [8 9 8 3 2]
#  [2 5 1 1 1]
#  [0 8 4 5 1]]
# array([[-0.83,  0.16,  0.82, -1.16, -0.17],
#        [-0.5 ,  1.48,  1.15,  1.48,  0.82],
#        [ 1.15,  1.48,  1.15, -0.5 , -0.83],
#        [-0.83,  0.16, -1.16, -1.16, -1.16],
#        [-1.5 ,  1.15, -0.17,  0.16, -1.16]])

32、使用 Min-Max 标准化公式对数据进行标准化处理
Min-Max 标准化公式：
$Y = \frac{Z-\min(Z)}{\max(Z)-\min(Z)}$

def min_max(x, axis = None):
    min = x.min(axis=axis, keepdims=True)
    max = x.max(axis=axis, keepdims=True)
    result = (x - min)/(max - min)
    return result

Z = np.random.randint(10, size=(5, 6))
print(Z)
min_max(Z)
# [[6 7 4 7 4 8]
#  [8 0 4 0 3 5]
#  [2 4 1 1 6 1]
#  [7 2 2 4 2 3]
#  [5 2 1 5 6 3]]
# array([[0.75, 0.88, 0.5 , 0.88, 0.5 , 1.  ],
#        [1.  , 0.  , 0.5 , 0.  , 0.38, 0.62],
#        [0.25, 0.5 , 0.12, 0.12, 0.75, 0.12],
#        [0.88, 0.25, 0.25, 0.5 , 0.25, 0.38],
#        [0.62, 0.25, 0.12, 0.62, 0.75, 0.38]])

33、使用 L2 范数对数据进行标准化处理
L2 范数计算公式：
$L_2 = \sqrt{x_1^2 + x_2^2 + \ldots + x_i^2}$

def l2_normalize(v, axis = -1, order = 2):
    l2 = np.linalg.norm(v, ord = order, axis=axis, keepdims=True)
    l2[l2 == 0] = 1
    return v / l2

Z = np.random.randint(10, size=(5, 5))
print(Z)

l2_normalize(Z)
# [[7 7 2 6 7]
#  [1 6 2 1 1]
#  [6 7 4 2 7]
#  [8 2 5 0 5]
#  [3 6 9 6 1]]
# array([[0.51, 0.51, 0.15, 0.44, 0.51],
#        [0.15, 0.91, 0.3 , 0.15, 0.15],
#        [0.48, 0.56, 0.32, 0.16, 0.56],
#        [0.74, 0.18, 0.46, 0.  , 0.46],
#        [0.23, 0.47, 0.7 , 0.47, 0.08]])

34、计算变量直接的相关性系数
np.corrcoef()
相关系数取值范围：[-1, 1]，1、-1表示相关性强，0表示无相关性。

Z = np.array([
    [1, 2, 1, 9, 10, 3, 2, 6, 7],  # 特征 A
    [2, 1, 8, 3, 7, 5, 10, 7, 2],  # 特征 B
    [2, 1, 1, 8, 9, 4, 3, 5, 7]])  # 特征 C
np.corrcoef(Z)
# array([[ 1.  , -0.06,  0.97],
#        [-0.06,  1.  , -0.01],
#        [ 0.97, -0.01,  1.  ]])

35、计算矩阵的特征值和特征向量
np.linalg.eig()

M = np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
w, v = np.linalg.eig(M)
# w 对应特征值，v 对应特征向量
w, v
# (array([ 1.61e+01, -1.12e+00, -1.30e-15]), matrix([[-0.23, -0.79,  0.41],
#          [-0.53, -0.09, -0.82],
#          [-0.82,  0.61,  0.41]]))

验证：P'AP=M

v * np.diag(w) * np.linalg.inv(v)
# matrix([[1., 2., 3.],
#         [4., 5., 6.],
#         [7., 8., 9.]])

36、计算 Ndarray 两相邻元素差值
np.diff()

Z = np.random.randint(1, 10, 10)
# Z = np.array([1,2,3,4,5,6,8,9,9])
print(Z)

print(np.diff(Z, n = 1))    # 计算 Z 两相邻元素差值
print(np.diff(Z, n = 2))    # 重复计算 2 次
print(np.diff(Z, n = 3))    # 重复计算 3 次
# [8 7 3 9 9 4 4 4 6 8]
# [-1 -4  6  0 -5  0  0  2  2]
# [-3 10 -6 -5  5  0  2  0]
# [ 13 -16   1  10  -5   2  -2]

37、计算 Ndarray 相邻元素依次累加
np.cumsum()

Z = np.random.randint(1, 10, 10)
print(Z)
# 依次累加
np.cumsum(Z)

# [3 2 1 7 1 7 5 2 6 3]
# array([ 3,  5,  6, 13, 14, 21, 26, 28, 34, 37])

38、使用 NumPy 按列、行连接两个数组
np.c_[]
np.r_[]

M1 = np.array([1, 2, 3])
M2 = np.array([4, 5, 6])
print(np.c_[M1, M2])  # 按列
print(np.r_[M1, M2])  # 按行

# array([[1, 4],
#        [2, 5],
#        [3, 6]])
# array([1, 2, 3, 4, 5, 6])

39、打印九九乘法表
np.fromfunction()

np.fromfunction(lambda i, j: (i + 1) * (j + 1), (9, 9))

# array([[ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.],
#        [ 2.,  4.,  6.,  8., 10., 12., 14., 16., 18.],
#        [ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.],
#        [ 4.,  8., 12., 16., 20., 24., 28., 32., 36.],
#        [ 5., 10., 15., 20., 25., 30., 35., 40., 45.],
#        [ 6., 12., 18., 24., 30., 36., 42., 48., 54.],
#        [ 7., 14., 21., 28., 35., 42., 49., 56., 63.],
#        [ 8., 16., 24., 32., 40., 48., 56., 64., 72.],
#        [ 9., 18., 27., 36., 45., 54., 63., 72., 81.]])