声明:本文参考 https://blog.csdn.net/u013733326/article/details/79639509,记录学习过程中的心得体会
Python版本:3.6.x
实验目的:搭建一个能够识别猫的图片的简单神经网络
实验步骤:
一、加载、处理数据
开始前引入的库:
import numpy as np import matplotlib.pyplot as plt import h5py from lr_utils import load_dataset
- numpy:进行科学计算的软件包
- matplotlib:绘制图表
- h5py:与H5文件中存储的数据集进行交互的软件包
- lr_utils:资料包里提供的库,加载资料包里面的数据
lr_utils.py代码如下:
import numpy as np import h5py def load_dataset(): train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r") train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r") test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels classes = np.array(test_dataset["list_classes"][:]) # the list of classes train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0])) test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0])) return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
- train_set_x_orig:训练集中的图像数据(209张图像 像素为64*64)
- train_set_y_orig:训练集中图像对应到的标签(1-猫,0-非猫)
- test_set_x_orig:测试集中的图像数据(50张图像 像素为64*64)
- test_set_y_orig:测试集中图像对应的标签(1-猫,0-非猫)
- classes:保存的是以bytes类型的两个字符串,数据为[b’non-cat’ b’cat’]
加载文件里的图片:
num = 1 for index in range(25,29): plt.subplot(2,2,num) num = num + 1 plt.imshow(train_set_x[index]) plt.show()
打印当前图片的标签:
index = 1 print("y = " + str(train_set_y[:, index]) + ", it's a " + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") + "' picture")
np.squeeze()函数用来压缩维度,只有使用该函数以后才能解码,eg:
print(train_set_y[:,index]) #result [0] print(np.squeeze(train_set_y[:,index])) #result 0
加载图片的参数、维度
- m_train:训练集中图片的数量
- m_test:测试集中图片的数量
- num_pixel:像素数目(所有图片均为64*64)
m_train = train_set_y.shape[1] m_test = test_set_y.shape[1] num_pixel = train_set_x.shape[1] print("训练集的数量: m_train = " + str(m_train)) print("测试集的数量: m_test = " + str(m_test)) print("每张图片的宽度/高度: num_pixel = " + str(num_pixel)) print("每张图片的大小: (" + str(num_pixel) + ", " + str(num_pixel) + ", 3)") print("训练集-图片的维度: " + str(train_set_x.shape)) print("训练集标签的维度: " + str(train_set_y.shape)) print("测试集-图片的维度: " + str(test_set_x.shape)) print("测试集标签的维度: " + str(test_set_y.shape))
训练集的数量: m_train = 209 测试集的数量: m_test = 50 每张图片的宽度/高度: num_pixel = 64 每张图片的大小: (64, 64, 3) #每个像素点由(R,G,B)三原色组成 训练集-图片的维度: (209, 64, 64, 3) 训练集标签的维度: (1, 209) 测试集-图片的维度: (50, 64, 64, 3) 测试集标签的维度: (1, 50)
为了之后编程方便,将(209,64,64,3)的numpy数组reshape为(64*64*3,209)的数组,每列代表一幅图像
#将训练集的维度降低并转置 train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], -1).T # -1表示:在行已知的情况下,自动计算列 #将测试集的维度降低并转置 test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T
数组变为209行(因为训练集里有209张图片),如果不想计算有多少列,就用-1告诉程序帮你算,算出来12288列。用一个T表示转置,这就变成了12288行,209列。测试集亦如此。
print("训练集-图片的维度: " + str(train_set_x_flatten.shape)) print("训练集标签的维度: " + str(train_set_y.shape)) print("测试集-图片的维度: " + str(test_set_x_flatten.shape)) print("测试集标签的维度: " + str(test_set_y.shape))
训练集-图片的维度: (12288, 209) 训练集标签的维度: (1, 209) 测试集-图片的维度: (12288, 50) 测试集标签的维度: (1, 50)
注意:像素有红、绿、蓝三元色组成,范围在0-255;正式写算法前,要进行归一化
#归一化 train_set_x = train_set_x_flatten/255 test_set_x = test_set_x_flatten/255
二、搭建逻辑斯蒂回归模型
1、建立神经网络的主要步骤:
- 定义模型结构(eg、输入特征的数目)
- 初始化模型的参数(W、b)
- LOOP:
- 前向传播(计算损失函数)
- 反向传播(计算当前梯度)
- 更新参数(梯度下降)
- $W= W - \alpha \frac{\partial J\left (W ,b \right )}{\partial W}$
- $b= b - \alpha \frac{\partial J\left (W ,b \right )}{\partial b}$
2、逻辑斯蒂回归算法流程图
3、构建神经网络的数学公式,参见吴恩达深度学习的视频
前向传播:
①、单个样本,对于$\left \{ x^{\left ( i \right )},y^{\left ( i \right )} \right \}$:
$$z^{^{\left ( i \right )}} = w^{T}x^{^{\left ( i \right )}} + b\quad\left ( 1 \right )\\$$
$$\hat{y}^{\left ( i \right )}=a^{\left ( i \right )}=sigmoid\left ( z^{\left ( i \right )} \right )\quad\left ( 2 \right )\\$$
$$l\left ( a^{\left ( i \right )},y^{\left ( i \right )}\right )=-y^{\left ( i \right )}log\left ( a^{\left ( i \right )} \right )-\left ( 1-y^{\left ( i \right )} \right )log\left ( 1-a^{\left ( i \right )} \right )\quad\left ( 3 \right )\\$$
$$J\left ( w,b \right )=\frac{1}{m}\sum _{i=1}^{m}l\left ( a^{\left ( i \right )},y^{\left ( i \right )} \right )\quad\left ( 4 \right )$$
$$注:w^{T} = \left [ w_{1},w_{2},w_{3},\cdots,w_{n_{x}} \right ],x^{\left ( i \right )}.shape = \left ( n_{x},1 \right ),n_{x}为单个样本特征的数量,b相当于一个实数$$
②、m个样本,对于$X = \begin{bmatrix}x^{\left ( 1 \right )} &x^{\left ( 2 \right )} & \cdots &x^{\left ( m \right )}\end{bmatrix} , \mathbb{R}^{n_{x}\times m}$:
$$\begin{align*}
Z &= \begin{bmatrix}
z^{\left ( 1 \right )} &z^{\left ( 2 \right )} &\cdots &z^{\left ( m \right )}
\end{bmatrix}\\
&=\begin{bmatrix}
w^{T}x^{\left ( 1 \right )}+b &w^{T}x^{\left ( 2 \right )}+b &\cdots &w^{T}x^{\left ( m \right )}+b
\end{bmatrix}\\
&=w^{T}\begin{bmatrix}
x^{\left ( 1 \right )} &x^{\left ( 2 \right )} &\cdots &x^{\left ( m \right )}
\end{bmatrix} +\begin{bmatrix}
b &b &\cdots &b
\end{bmatrix}\\
&=w^{T}X + \begin{bmatrix}
b &b &\cdots &b
\end{bmatrix}\\
&=w^{T}X+\mathbf{b}
\end{align*}\\$$
$$A=\begin{bmatrix}
a^{\left ( 1 \right )} &a^{\left ( 2 \right )} &\cdots &a^{\left ( m \right )}
\end{bmatrix}=\sigma \left ( Z \right )$$
反向传播:
①、单个样本
$$da=\frac{\partial l\left ( a,y \right )}{\partial a}=-\frac{y}{a}+\frac{1-y}{ 1-a}\quad\left ( 1 \right )\\$$
$$dz=\frac{\partial l\left ( a,y \right )}{\partial z}=\left ( \frac{\partial l}{\partial a} \right )\cdot \left ( \frac{\partial a}{\partial z} \right )=\left ( -\frac{y}{a} + \frac{1-y}{1-a}\right )\cdot a\left ( 1-a \right )=a-y\quad\left ( 2 \right )\\$$
$$dw_{1}=x_{1}dz\quad\left ( 3 \right )\\$$
$$dw_{2}=x_{2}dz\quad\left ( 4 \right )\\$$
$$db=dz\quad\left ( 5 \right )$$
②、m个样本
$$\frac{\partial }{\partial w_{1}}J\left ( w,b \right )=\frac{1}{m}\sum_{i=1}^{m}\frac{\partial }{\partial w_{1}}l\left ( a^{\left ( i \right )} ,b^{\left ( i \right )}\right )$$
已知全局的损失函数,对$w_{1}$的微分实际上是从1到m各项损失对$w_{1}$微分的平均,故有:
$$dz^{\left ( i \right )}=a^{\left ( i \right )}-y^{\left ( i \right )}\quad\left ( 1 \right )\\$$
$$dw_{1}=\frac{1}{m}\sum_{i=1}^{m}x_{1}^{\left ( i \right )}\left ( a^{\left ( i \right )}-y^{\left ( i \right )} \right )\quad\left ( 2 \right )\\$$
$$dw_{2}=\frac{1}{m}\sum_{i=1}^{m}x_{2}^{\left ( i \right )}\left ( a^{\left ( i \right )}-y^{\left ( i \right )} \right )\quad\left ( 3 \right )\\$$
$$db=\frac{1}{m}\sum_{i=1}^{m}\left ( a^{\left ( i \right )}-y^{\left ( i \right )} \right )\quad\left ( 4 \right )$$
向量化后:
$$dZ = A - Y\\$$
$$d\mathbf{w}=\begin{bmatrix}
dw_{1}\\
dw_{2}\\
\cdots \\dw_{n_{x}}
\end{bmatrix}=\frac{1}{m}\begin{bmatrix}
\sum_{i}^{m}x_{1}^{i}dz^{\left ( i \right )}\\
\sum_{i}^{m}x_{2}^{i}dz^{\left ( i \right )}\\
\cdots \\ \sum_{i}^{m}x_{n_{x}}^{i}dz^{\left ( i \right )}
\end{bmatrix}=\frac{1}{m}\sum_{i=1}^{m}x^{\left ( i \right )}dz^{\left ( i \right )}=\frac{1}{m}*X*dZ^{T}=\frac{1}{m}*np.dot\left ( X,dZ^{T} \right )\\$$
$$db = \frac{1}{m}\sum_{i=1}^{m}dz^{\left ( i \right )}=\frac{1}{m}*np.sum\left ( dZ^{T} \right )$$
4、函数部分
①、激活函数sigmoid()
def sigmoid(z): """ :param z: 任何大小的标量或numpy数组 :return: s - sigmoid(z) """ s = 1/(1 + np.exp(-z)) return s
②、初始化参数initialize_with_zeros()
def initialize_with_zeros(dim): """ :param dim: w矢量的大小(参数的数量) :return: w -维度为(dim, 1)的初始化向量 b -初始化的标量 """ w = np.zeros(shape = (dim, 1)) b = 0 #使用断言确保数据维度和类型正确 assert(w.shape == (dim, 1)) assert(isinstance(b, float) or isinstance(b, int)) return (w, b)
③、正向传播和反向传播propagate()
def propagate(w, b, X, Y): """ :param w: - 权重,大小不等的数组(num_pixel * num_pixel * 3,1) :param b: - 偏差,一个标量 :param X: - 矩阵类型为(num_px * num_px * 3,训练数量) :param Y: - 真正的“标签”矢量(如果非猫则为0,如果是猫则为1),矩阵维度为(1,训练数据数量) :return: cost- 逻辑回归的负对数似然成本 dw - 相对于w的损失梯度,因此与w相同的形状 db - 相对于b的损失梯度,因此与b的形状相同 """ m = X.shape[1] #正向传播(计算当前损失) A = sigmoid(np.dot(w.T, X) + b) lost = (-1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) #反向传播 dZ = A - Y dw = (1 / m) * np.dot(X , dZ.T) db = (1 / m) * np.sum(dZ) #使用断言确保数据正确 assert(dw.shape == w.shape) assert(db.dtype == float) lost = np.squeeze(lost) #squeeze()从数组中删除单维度条目,即把shape中为1的维度去掉 assert(lost.shape == ()) #创建一个字典 grads = {"dw" : dw , "db" : db} return (grads , lost)
④、更新参数optimize()
def optimize(w, b, X, Y, num_iterations, learning_rate, print_lost = False): """ 此函数通过梯度下降法来优化w和b :param w: - 权重(num_pixel * num_pixel * 3, 1) :param b: - 偏差(标量) :param X: - 维度为(num_pixel * num_pixel * 3, 训练数据的数量) :param Y: - 标签矢量(0-非猫, 1-猫), 矩阵维度为(1, 训练数据的数量) :param num_iterations: - 迭代的次数(梯度下降的次数) :param learning_rate: - 学习率 :param print_lost: - 每100步打印一次损失值 :return: params - 包含权重w和偏差b的字典 gards - 包含权重和偏差的梯度的字典 成本 - 梯度下降期间计算的所有成本列表, 绘制学习曲线 :提示: 1)计算当前参数以及梯度, 使用propagate() 2)使用w和b的梯度下降法, 更新参数 """ losts = [] for i in range(num_iterations): grads, lost = propagate(w, b, X, Y) dw = grads["dw"] db = grads["db"] w = w - learning_rate * dw b = b - learning_rate * db #记录成本 if i % 100 == 0: losts.append(lost) #打印成本数据 if print_lost and (i % 100 == 0): print("迭代次数: %i, 误差值: %f" %(i,lost)) params = {"w": w , "b": b} grads = {"dw": dw, "db":db} return (params, grads, losts)
⑤、预测函数predict()
def predict(w, b, X): """ 使用学习logistic(w,b)预测标签是0还是1 :param w: - 权重,大小不等的数组(num_pixel * num_pixel * 3,1) :param b: - 偏差,一个标量 :param X: - 维度为(num_pixel * num_pixel * 3,训练数据的数量)的数据 :return: Y_prediction - 包含X中所有图片的所有预测的一个numpy(1,m) """ m = X.shape[1] #图片的数量 Y_prediction = np.zeros((1, m)) w = w.reshape(X.shape[0], 1) #预测 A = sigmoid(np.dot(w.T, X) + b) for i in range(A.shape[1]): #将A转换成(0,1) Y_prediction[0,i] = 1 if A[0,i] > 0.5 else 0 #使用断言 assert(Y_prediction.shape == (1,m)) return Y_prediction
⑥、将上述函数整合到一个model()
def Logistic_Regression_model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_lost = False): """ 通过调用之前的函数来构建Logistic回归模型 :param X_train: - numpy的数组,维度为(num_pixel * num_pixel * 3, m_train)的训练集 :param Y_train: - numpy的数组,维度为(1,m_train)(矢量)的训练标签集 :param X_test: - numpy的数组,维度为(num_px * num_px * 3,m_test)的测试集 :param Y_test: - numpy的数组,维度为(1,m_test)的(向量)的测试标签集 :num_iterations - 表示用于优化参数的迭代次数的超参数 :learning_rate - 表示optimize()更新规则中使用的学习速率的超参数 :print_cost - 设置为true以每100次迭代打印成本 :return: d - 包含有关模型信息的字典。 """ w, b = initialize_with_zeros(X_train.shape[0]) parameters, grads, losts = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_lost) w, b = parameters["w"], parameters["b"] #预测 Y_prediction_test = predict(w, b, X_test) Y_prediction_train = predict(w, b, X_train) #打印训练后的准确性 print("训练集准确性:", format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100) , "%") print("测试集准确性:", format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100), "%") d = { "lost" : losts, "Y_prediction_test" : Y_prediction_test, "Y_prediction_train" : Y_prediction_train, "learning_rate" : learning_rate, "num_iterations" : num_iterations, "w": w, "b": b } return d
print("---------------------------------测试model------------------------------") #这里加载的是真实的数据,请参见上面的代码部分。 d = Logistic_Regression_model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_lost = False) 训练集准确性: 99.04306220095694 % 测试集准确性: 70.0 %
⑦、可视化
#------------绘图(alpha = 0.005, 损失函数和迭代次数的关系)----------- losts = np.squeeze(d["lost"]) plt.plot(losts) plt.ylabel("Lost") plt.xlabel("Iterations/100 times") plt.title("Learning rate =" + str(d["learning_rate"])) plt.show()
三、逻辑斯蒂回归模型的物理意义
假设训练集中的样本都是统计独立的,最小化损失函数等价于逻辑斯蒂回归模型的最大化似然函数(证明过程见吴恩达深度学习视频)