使用OpenCV和Python处理数据

1>创建一个新的IPython会话

跳转到opencv-machine-learning文件夹：

cd C:\Users\Kannyi\opencv-machine-learning

激活所创建的conda环境：

activate py38

打开一个新的IPython会话：

ipython

2>使用Python的NumPy包处理数据

引入NumPy模块并验证它的版本：

import numpy
numpy.__version__

使用np作为别名引入NumPy模块并验证它的版本：

import numpy as np
np.__version__

使用list命令创建一个整数的列表，range(x)函数将会拼出从0到x-1之间所有的整数：

int_list=list(range(10))
int_list
#结果：[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

可以让Python在整数列表int_list中迭代所有的元素，并使用str()函数处理每个元素，来创建一个字符串的列表：

str_list=[str(i) for i in int_list]
str_list
#结果：['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

将int_list中所有元素都重复两遍所得到的结果：

int_list*2
#结果：[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

将int_list中所有元素都乘以2所得到的结果：

int_arr=np.array(int_list)
int_arr
#结果：array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

int_arr*2
#结果：array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

每个NumPy数组还有以下属性：

size：数组中元素的个数。
ndim：维度的数量。
dtype：数组的数据类型。
shape：每个维度的大小。

print("int_arr size:", int_arr.size)
#结果：int_arr size: 10

print("int_arr ndim:", int_arr.ndim)
#结果：int_arr ndim: 1

print("int_arr dtype:", int_arr.dtype)
#结果：int_arr dtype: int32

print("int_arr shape:", int_arr.shape)
#结果：int_arr shape: (10,)

通过索引访问单个数组元素：

int_arr[3]
#结果：3

从数组的尾部开始索引，即负索引：

int_arr[-1]
#结果：9

int_arr[-2]
#结果：8

将数组中下标为2~4的元素切片：

int_arr[2:5]
#结果：array([2, 3, 4])

将数组中下标为0~4的元素切片：

int_arr[:5]
#结果：array([0, 1, 2, 3, 4])

将数组中下标为2的倍数（包括0）的元素切片：

int_arr[::2]
#结果：array([0, 2, 4, 6, 8])

将数组中的元素逆序切片：

int_arr[::-1]
#结果：array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

创建一个3行5列的二维数组：

arr_2d=np.zeros((3,5))
arr_2d
'''
结果：
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
'''

其中所有数组的初始值为0，如果不指定数据类型，NumPy会默认使用浮点类型。

创建一个3×2×4的三维数组，其中所有数组的初始值为1：

arr_float_3d=np.ones((3, 2, 4))
arr_float_3d
'''
结果：
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.]]])
'''

通过数组切片获取arr_float_3d中的第一个二维数组：

arr_float_3d[0, :, :]
'''
结果：
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])
'''

将NumPy数组中的dtype属性设置为8位整型，然后将数组中的所有元素乘以255来创建一个3×2×4的三维数组：

arr_uint_3d=np.ones((3, 2, 4), dtype=np.uint8)*255
arr_uint_3d
'''
结果：
array([[[255, 255, 255, 255],
        [255, 255, 255, 255]],

       [[255, 255, 255, 255],
        [255, 255, 255, 255]],

       [[255, 255, 255, 255],
        [255, 255, 255, 255]]], dtype=uint8)
'''

3>在Python中载入外部数据集

下载手写数字（0~9）的MNIST数据集：

from sklearn import datasets
mnist_data=datasets.fetch_openml("mnist_784")
x=mnist_data["data"]
y=mnist_data["target"]
mnist_data.data.shape
#结果：(70000, 784)

mnist_data.target.shape
#结果：(70000,)

检查所有目标的值（去重）：

import numpy as np
np.unique(mnist_data.target)
#结果：array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype=object)

4>使用Matplotlib进行数据可视化

使用mpl作为别名引入Matplotlib模块：

import matplotlib as mpl

使用plt作为别名引入Matplotlib.pyplot模块：

import matplotlib.pyplot as plt

绘图自动出现命令：

%matplotlib

绘图手动出现命令（一般调用上面的%matplotlib命令，不用这个）：

plt.show()

在x坐标轴上创建一个从0到10的线性空间，以及100个采样点：

import numpy as np
x=np.linspace(0, 10, 100)

使用NumPy中的sin函数得到所有x点的值，并通过plt中的plot函数把结果画出来：

plt.plot(x, np.sin(x))

得到下方的绘图输出结果：

将绘图结果保存至C:\Users\Kannyi目录下：

plt.savefig('Figure1.png')

引入sklearn的数据集：

from sklearn import datasets

载入实际数据：

digits=datasets.load_digits()

digits有两个不同的数据域：

data域：data中的所有像素都在一个大的向量中排列。
images域：images保留了各个图像8×8的空间排列，如果想要绘制出一幅单独的图像，使用images会更合适。

print(digits.data.shape)
#结果：(1797, 64)

print(digits.images.shape)
#结果：(1797, 8, 8)

使用NumPy的数组切片从数据集获取一幅图像：

img=digits.images[0, :, :]

此处是从1797个元素的数组中获取了第一行数据，对应的是8×8=64个像素。

使用plt中的imshow函数来绘制这幅图像：

plt.imshow(img, cmap='gray')

cmap参数指定了一个颜色映射，在灰度图像的情况下，gray颜色映射更有效。

得到下方的绘图输出结果：

我们可以使用plt的subplot函数绘制全部数字的样例：

for image_index in range(10):
subplot_index=image_index+1
plt.subplot(2, 5, subplot_index)
plt.imshow(digits.images[image_index, :, :], cmap='gray')

subplot需要指定行数、列数以及当前的子绘图索引。

image_index从0开始，subplot_index从1开始。

subplot函数有指定绘图位置的作用，可以认为是提供画布；imshow函数是将绘图放置在对应的画布上。

得到下方的绘图输出结果：