Pyhton_lab-HW2

数据集data介绍

MNIST数据集(Mixed National Institute of Standards and Technology database)是美国国家标准与技术研究院收集整理的大型手写数字数据库,包含60,000个示例的训练集以及10,000个示例的测试集。每张图片的长度和宽度都是28. y值是0-9,代表10个数字。

Load data by using TF
(trainX,trainY), (testX,testY) = tf.keras.datasets.mnist.load_data()

已知y的取值是0-9,那么x则是图像显示0-9. 把前几张图片显示出来。

# 把X的部分图片表现出来 
for i in range(9):
	# define subplot
	plt.subplot(331+i)
	# plot raw pixel data
	plt.imshow(trainX[i], cmap=plt.get_cmap('gray'))
# show the figure
plt.show()

在这里插入图片描述

(1)Data Preprocessing

首先,我们需要从train data 取20%的作为valid data。这里可以使用train_test_split()函数来分离出20%data 作为valid data

from sklearn.model_selection import train_test_split
X_train, X_valid, Y_train, Y_valid = train_test_split(trainX,trainY,test_size=0.2)

然后,需要把图片的28*28data,进行扁平化。变成784。然后再进行normalized化,除最大值255,使得数值映射到0-1内。

对feature set X进行处理(扁平化+normalized)
#对于X_train,X_valid,X_test扁平化再normalize
X_train_reshape = tf.cast(X_train.reshape((X_train.shape[0],-1))/255.0,dtype='float32')
X_valid_reshape = tf.cast(X_valid.reshape((X_valid.shape[0],-1))/255.0,dtype='float32')
X_test_reshape = tf.cast(X_test.reshape((X_test.shape[0],-1))/255.0,dtype='float32')
对y值进行one hot 处理

one hot处理,比如y=1,则转换为y=[0,1,0,0,0,0,0,0,0,0],即array位置对应着相应的数字。

# one hot encode target values
y_train_encode = tf.one_hot(y_train,depth = 10)
y_valid_encode = tf.one_hot(y_valid,depth = 10)
y_test_encode = tf.one_hot(y_test,depth = 10)

(2)模型的设置

设置两层的layers里的networks。 详细的介绍:
在这次神经网络中,input的神经元为784个,hidden 的神经元为128个,output的神经元为10个。用来分类。公式为
h 1 = R e L U ( X W 1 + b 1 ) h_1 = ReLU(XW_1 + b_1) h1=ReLU(XW1+b1) Y = h 1 W 2 + b 2 Y = h_1W_2 + b_2 Y=h1W2+b2

第一层是,input是X_train_reshape的shape为 (48000,784)。W1的shape为(784,128),b1的shape为(1,128). 经过W1和b1的转换,变成128个神经元,再经过激活函数relu来输出为h。这是第一层。

下一层的参数W2的shape为(128,10),b2的shape为(1,10),经过W2,和b2的转换,变成10个神经元,再经过sigmoid激活函数输出每个类的概率值。取其中最大的概率的那个类作为pred值。

参数的初始值设置

理解模型所需的参数之后,开始定义初始值的参数 W1, W2, b1, b2。这里我们定义W1和W2的初始值为0.01,b1,b2初始值定义为0

#W_1的值全部为 0.01 , size 为784*128
W_1 = tf.Variable(tf.ones(shape=[784,128])/100)
#b_1的值全部为0,size为1*128
b1 = tf.Variable(tf.zeros(shape=[128]))

#W_2的值全部为 0.01 , size 为128*10
W_2 = tf.Variable(tf.ones(shape=[128,10])/100)
#b_2的值全部为0,size为1*10
b2 = tf.Variable(tf.zeros(shape=[10]))
定义模型的函数
def feedforward(X):
  '''
  Implement the feed forward step for given X
  hidden_out = ReLU(X*W1 + b1)
  output = hidden_output * W2 + b2
  '''
  #matmul就是普通的矩阵相乘 需要里面的维度相同,比如 (3,2)*(2,3)=(3,3)
    hidden_in = tf.add(tf.matmul(X,W1), b1)      #公式X*W1 + b1
    hidden_out = tf.nn.relu(hidden_in)           #激活函数出来
    output = tf.add(tf.matmul(hidden_out,W2),b2) #hidden_out * W2 + b2
    return output
开始训练模型Using TF
#定义优化器
optimizer = tf.keras.optimizers.SGD(learning_rate=0.5)

设置

def one_step_train(x,y,optim, W_1 = W_1, b1 = b1, W_2 = W_2, b2 = b2):
  with tf.GradientTape() as tape:
    #求的 y_pred = logit--还没使用softmax函数计算概率,下一步会用tf里面内置函数计算
    train_logit = feedforward (x)
    #计算loss function --cross entropy【y是one hot后的实际值】
    train_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y,
                                                             logits = train_logit)
    train_loss = tf.reduce_mean(train_cross_en)  #对loss取平均值

  #对W1,W2,b1,b2 计算梯度 
  grads = tape.gradient(train_loss,[W_1,b1,W_2,b2])
  optim.apply_gradients(zip(grads,[W_1,b1,W_2,b2]))  #更新所有的参数W1,W2,b1,b2
  return train_loss
解释tf内置函数
  1. tf.nn.softmax() 函数
#随机设置两个samples 值对应的y labels和x输出的logits值
labels = [[0,0,1,0,0,0,0,0,0,0], [0,0,0,0,1,0,0,0,0,0]]
logits = [[0.1,0.2,0.3,0.5,0.2,0.3,0.4,0.5,0.2,0.1], [0.1,0.2,0.3,0.5,0.2,0.3,0.4,0.5,0.2,0.1]]
计算这两个samples的里面每个的softmax值,shape为(2,10)
logit_scaled = tf.nn.softmax(logits)
print(logit_scaled)

在这里插入图片描述

计算loss值
result = tf.nn.softmax_cross_entropy_with_logits(labels = labels, logits = logits)
print(result)

在这里插入图片描述
这里这两个值代表这两个sample的loss值,后面会算一个mean值作为总loss值。使用tf.reduce_mean()函数

train_loss = tf.reduce_mean(result)

所以把这些总结在一起,便是one step train 函数

def one_step_train(x,y,optim, W_1 = W1, b_1 = b1, W_2 = W2, b_2 = b2):
  with tf.GradientTape() as tape:
    #求的 y_pred = logit--还没使用softmax函数计算概率,下一步会用tf里面内置函数计算
    train_logit = feedforward (x)
    #计算loss function --cross entropy【y是one hot后的实际值】
    train_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y,
                                                             logits = train_logit)
    train_loss = tf.reduce_mean(train_cross_en)  #对loss取平均值最为最后的train_loss值

  #对W1,W2,b1,b2 计算梯度 
  grads = tape.gradient(train_loss,[W_1,b_1,W_2,b_2])
  optim.apply_gradients(zip(grads,[W_1,b_1,W_2,b_2]))
  return train_loss

(3)模型的迭代

#迭代的次数 1000次
iterations = 1000
#store the training losses
train_loss_ls = []
#store the validation losses 
valid_loss_ls = []
for i in range(iterations):
  train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optimizer)
  valid_logits = feedforward(X_valid_reshape)
  valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                           logits = valid_logits)
  
  valid_loss = tf.reduce_mean(valid_cross_en)
  if i % 100 == 0:
    print('step {0:3d}: training loss {1:.4f} \n\t  validation loss {2:.4f}'.format(i, train_loss,valid_loss))
  train_loss_ls.append(train_loss)
  valid_loss_ls.append(valid_loss)

在这里插入图片描述

(4)把train loss和valid loss 可视化

fig, ax = plt.subplots(figsize=(8,4))
ax.plot(train_loss_ls, label='Train Loss')
ax.plot(valid_loss_ls, label='Valid Loss')
ax.set_xlabel("Epoch #")
ax.set_ylabel("Loss")
ax.legend();

在这里插入图片描述

(5)模型的比较

改一下模型的optimizer优化器和参数,看看哪个模型的效果更好

  • Option A: tf.keras.optimizers.SGD with learning rate = 0.5
  • Option B: tf.keras.optimizers.SGD with learning rate = 0.001
  • Option C: tf.keras.optimizers.Adam with learning rate = 0.001

Option A就是上面我们已经实践过的模型。接下来是实践模型B和模型C

实践模型B
tf.keras.optimizers.SGD with learning rate = 0.001
  1. 初始化参数
W_1 = tf.Variable(tf.ones(shape=[784,128])/100)
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.ones(shape= [128,10])/100)
b_2 = tf.Variable(tf.zeros(shape= [10]))
  1. 设置优化器的参数
optim_B = tf.keras.optimizers.SGD(learning_rate = 0.001)
  1. 开始训练和迭代模型
#迭代数为2000
iterations = 2001
#store the training losses
train_loss_ls_B = []
#store the validation losses 
valid_loss_ls_B = []
#书写训练模型的迭代
for i in range(iterations):
	#利用上次的one_step_train来计算train的 loss
	train_loss = one_step_train(X_train_reshape,y_train_encode, optim = optim_B)
	#开始一步一步写valid的loss,因为需要把上一步更新过的参数来计算valid loss
	valid_logits = feedforward(X_valid_reshape)
	#计算所有samlpes的loss值
	valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,logits = valid_logits)
	#计算平均loss值作为valid_loss
	valid_loss = tf.reduce_mean(valid_cross_en)
	if ii % 100 == 0:
		print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(i, train_loss,valid_loss))
	train_loss_ls_B.append(train_loss)
	valid_loss_ls_B.append(valid_loss)

在这里插入图片描述

实践模型C
tf.keras.optimizers.Adam with learning rate = 0.001
  1. 初始化参数
W_1 = tf.Variable(tf.ones(shape=[784,128])/100)
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.ones(shape= [128,10])/100)
b_2 = tf.Variable(tf.zeros(shape= [10]))
  1. 设置优化器的参数
optim_C = tf.optimizers.Adam(learning_rate = 0.001)
  1. 开始训练和迭代模型
#迭代数为2000
iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
#书写训练模型的迭代
for i in range(iterations):
	#利用上次的one_step_train来计算train的 loss
	train_loss = one_step_train(X_train_reshape,y_train_encode, optim = optim_C)
	#开始一步一步写valid的loss,因为需要把上一步更新过的参数来计算valid loss
	valid_logits = feedforward(X_valid_reshape)
	#计算所有samlpes的loss值
	valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,logits = valid_logits)
	#计算平均loss值作为valid_loss
	valid_loss = tf.reduce_mean(valid_cross_en)
	if ii % 100 == 0:
		print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(i, train_loss,valid_loss))
	train_loss_ls_C.append(train_loss)
	valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

总结

第一个模型A,可以看出train loss在摇摆,时大时小,证明了模型A并不是很适合来训练data。
第二个模型B,loss反而上升来,并没有对模型进行改进。
第三模型C,可以明显看到loss下降。所以模型C是最适合的。

(6)参数W初始值的比较

由上面可知,Adam模型是最适合的。现在来比较初始值W选取对loss的影响。

  • Option A: 初始值W设置为0.001
  • Option B: 用标准正太分布来随意设置初始值W (mean = 0, sd = 1)
  • Option C: 用正太分布来随意设置初始值 W (mean=0,sd = 0.1)
Option A:初始值W设置为0.001
#初始值W设置为0.001
W_1 = tf.Variable(tf.ones(shape=[784,128])/1000)
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.ones(shape= [128,10])/1000)
b_2 = tf.Variable(tf.zeros(shape= [10]))
#选择Adam优化器
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 

iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                             logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
      print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

Option B:用标准正太分布来随意设置初始值W
#用标准正太分布来随意设置初始值W
W_1 = tf.Variable(tf.random.normal(shape = [784,128],mean=0,stddev=1,dtype=tf.float32 ,seed=2020))
b_1 = tf.Variable(tf.zeros(shape= [128]))
#用标准正太分布来随意设置初始值W
W_2 = tf.Variable(tf.random.normal(shape= [128,10],mean=0,stddev=1,dtype=tf.float32 ,seed=2020))
b_2 = tf.Variable(tf.zeros(shape= [10]))

#hAdam优化器
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 

iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                              logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
      print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

Option C: 用正太分布来随意设置初始值 W (mean=0,sd = 0.1)
W_1 = tf.Variable(tf.random.normal(shape = [784,128],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.random.normal(shape= [128,10],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_2 = tf.Variable(tf.zeros(shape= [10]))

#优化器Adam
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 
#start the training process
iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                              logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
       print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

总结,从loss明显能看得出来,第三种设置初始值W的效果是最好的。

(7)combine 最好的优化器和最好的初始值W

自然我们是选取Adam优化器和正态分布设置初始值W。从loss值看得出来,valid_loss没有很大的提高after 700 epoch。
所以,我们选取epoch为700作迭代值。

#用正太分布来随意设置初始值 W (mean=0,sd = 0.1)
W_1 = tf.Variable(tf.random.normal(shape = [784,128],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.random.normal(shape= [128,10],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_2 = tf.Variable(tf.zeros(shape= [10]))

#优化器Adam
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 
#设置迭代值为700个
iterations = 701
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                              logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
      print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

可视化
fig, ax = plt.subplots(figsize=(8,4))
ax.plot(train_loss_ls_C, label='Train Loss')
ax.plot(valid_loss_ls_C, label='Valid Loss')
ax.set_xlabel("Epoch #")
ax.set_ylabel("Loss")
ax.legend();

在这里插入图片描述

(8)最后一步,计算accuracy on test data

#function defined in class
def accuracy(logits, y_true):
    correct_prediction = tf.equal(tf.argmax(logits, axis=1), tf.argmax(y_true, axis=1))
    return tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float64))
#compute the logits for testing data
test_logits = feedforward(X_test_reshape)
print(f'Testing Accuracy: {accuracy(test_logits,y_test_encode)*100:.2f} %')

Testing Accuracy: 97.06

【这里需要解释的一点是,为什么没有用softmax函数计算每个sample10个概率,然后再取argmax()最大值的位置作为预测值,这是因为softmax是递增函数,直接用之前的的array中最大值的位置来来作为预测值也是一样
即tf.argmax(logits, axis=1) = tf.argmax(softmax(logits, axis=1))】

猜你喜欢

转载自blog.csdn.net/Jiana_Feng/article/details/109653521