Machine Learning 学习笔记（tensorflow)

numpy basics
python list/tuple/set/dict
define constants, variables, operations in tensorflow
overfit
dropout
global_variables_initializer()
opencv
save and restore model
python中全局变量与局部变量
basic concepts
过拟合（高方差）问题
神经网络中的各种层的作用
优化算法如何选择
BN
L1与L2 正则化（规则化，regularization）
RNN

numpy basics

numpy 中matrix是array的一个子类，是严格意义上的二维矩阵，区别是*在matrix是矩阵乘法,而array是代表对应位置的元素相乘。

	叉乘(标准)	点乘(逐元素)
array	dot `@`	multiply `*`
matrix	dot `@` `*`	multiply

'@'新版本推荐使用这个符号来表示矩阵的乘法
numpy 会自动把列向量转成行向量, 转置操作对行向量无效,注意reshape

np.array([1,2,3]) //这是一个数组（行向量），shape为(3,),无法执行转置T,可以reshape
np.array([[1,2,3]]) //这是一个矩阵,shape为（1*3）

python list/tuple/set/dict

list 数组用[]
tuple 元祖用()

初始化之后就不能修改
set 集合{}

不存在重复的元素
dict 字典{}

不存在重复的元素，且支持字符串索引

所有类型都执行混搭，就是同时存入不同类型的数据。
初始化空的set不能使用{},默认为dict

define constants, variables, operations in tensorflow

# first, create a TensorFlow constant
const = tf.constant(2.0, name="const")
    
# create TensorFlow variables
b = tf.Variable(2.0, name='b')
c = tf.Variable(1.0, name='c')

# now create some operations
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

# setup the variable initialisation
init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    # initialise the variables
    sess.run(init_op)
    # compute the output of the graph
    a_out = sess.run(a)
    print("Variable a is {}".format(a_out))
    print("Variable a is {}".format(const))

as the Python code runs through these commands, the variables haven’t actually been declared as they would have been if you just had a standard Python declaration (i.e. b = 2.0). Instead, all the constants, variables, operations and the computational graph are only created when the initialisation commands are run.

d e f is not a variable but an operation

overfit

大量数据集
L1 L2 Dropout正则化
- L1：cost= (wx-y)^2-abs(w)
- L2：cost= (wx-y)^2-w2
dropout
随机忽略

dropout

一般用在全连接层，防止过拟合。
step1: 随机忽略一些节点（设置这值为0）
step2: 放大剩下的节点（与dropout成反比）

# dropout函数的实现
def dropout(x, level):
    if level < 0. or level >= 1:  # level是概率值，必须在0~1之间
        raise Exception('Dropout level must be in interval [0, 1[.')
    retain_prob = 1. - level
    # 我们通过binomial函数，生成与x一样的维数向量。binomial函数就像抛硬币一样，我们可以把每个神经元当做抛硬币一样
    # 硬币 正面的概率为p，n表示每个神经元试验的次数
    # 因为我们每个神经元只需要抛一次就可以了所以n=1，size参数是我们有多少个硬币。
    sample = np.random.binomial(n=1, p=retain_prob, size=x.shape)  # 即将生成一个0、1分布的向量，0表示这个神经元被屏蔽，不工作了，也就是dropout了
    print(sample)
    x *= sample  # 0、1与x相乘，我们就可以屏蔽某些神经元，让它们的值变为0
    print(x)
    x /= retain_prob
    return x

参考链接

global_variables_initializer()

variable 也是一种特殊的operation, global_variables_initializer获得所有variable,然后调用run进行初始化。

variable的初始化必须在其他operation之前。

参考链接

opencv

show mnist image from opencv

     x_input = batch_xs[0].reshape(28,28,1)
     x_input = x_input*255
     x_input=x_input.astype(numpy.uint8)
     cv2.imshow("test image input",x_input)
     cv2.waitKey()

type must be uint8

shape must be 3-d

the range must be 0~255

save and restore model

meta类型
model 文件分为两类：

Meta graph : 网络模型
.meta
Checkpoint file: 参数值
.data-00000-of-00001
.index //table, key is the name of variable, and value is the entry of real data.

save step

saver = tf.train.Saver()
saver.save(sess, 'my_test_model',global_step=1000) # 这里可以设置global step对文件进行重命名

必须在一个session内部创建，也就是在session.start()之前

保存过程不能修改session，否则meta就会无限增加，可以通过tf.get_default_graph().finalize()来保证

pb类型
冻结了变量,无法再训练,有三种方式
方式1:

    # convert_variables_to_constants 需要指定output_node_names，list()，可以多个
    constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph_def, ['op_to_store'])
    # 写入序列化的 PB 文件
    with tf.gfile.FastGFile(pb_file_path+'model.pb', mode='wb') as f:    
        f.write(constant_graph.SerializeToString())

方式2:

    constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph_def, ['op_to_store'])
    tf.train.write_graph(graph_def, pb_file_path, 'flower_model_save.pb', as_text=False)

方式3:

	builder = tf.saved_model.builder.SavedModelBuilder(pb_file_path+'savemodel')
	# 构造模型保存的内容，指定要保存的 session，输入输出信息字典，额外的信息
	builder.add_meta_graph_and_variables(sess,['cpu_server_1'])
	builder.save()  # 保存 PB 模型
	#保存好以后到saved_model_dir目录下，会有一个saved_model.pb文件以及variables文件夹。

其中1与2相同,没有区别，只会保留session中的graph，都不会保留weights数据。
第三种会保留，在variables文件夹中。

Reference

python中全局变量与局部变量

全局变量的使用推荐 global 关键字
如果不加global 关键字，可读与支持修改。

def fun():
    print(g_var)
    g_var.a=1

但是对变量重新赋值类似g_var="123",实则是创建了一个新的局部变量。

basic concepts

特征缩放可以加快收敛速度

逻辑回归解决的是分类问题，如果使用线性回归中的costfunction会导致非凸函数，有很多全局最小值，影响局部最小值。

逻辑回归的costfunction是一个分段函数，经过合并后变成了一个函数

上下标，分别代表层数，相应的单元

正规方程（Normal Equation）可以直接求解最优的参数值

正则化（Regularization）在代价函数中加入惩罚机制，可以防止过度拟合。

逻辑回归的输出是一个标量，而神经网络输出的是一个矢量

逻辑回归实则为单层神经网络

过拟合（高方差）问题

原因：假设函数多项式次数过高/神经网络层数过深，导致过拟合问题，但是太少又会导致欠拟合（高偏差）问题。

解决方案：

使用交叉验证进行训练，选择合适的深度，如果直接训练，cost function只会随着多项式次数的增加而逐渐减小
加入正则项罚，对模型向量进行“惩罚”
增加训练集

神经网络中的各种层的作用

卷积层：提取特征
池化层：对图像进行尺寸压缩（有时候也称之为下采样，subsample）
激活层：引入非线性因素，解决线性模型所不能解决的问题。
全连接层：fully connected 矩阵向量乘积
softmax：进行输出结果概率归一化
BN:batch normalization,一般用在激活函数之前，进行归一化，防止梯度消失或者爆炸

优化算法如何选择

参考链接

BN

参考链接

L1与L2 正则化（规则化，regularization）

参考链接

RNN

注意：tensorflow中用到的rnn的隐藏层=output layer,这不是rnn的标准定义。

rnn原理详解
 tesorflow参考链接