Grid Search是Hyperparameter Tuning阶段传统的调试方法,也称为Exaustive Search。在调试前,需要选定算法,需要确定待调参数,以及待调参数的可能取值。Grid Search算法会在参数空间中穷尽尝试所有组合,选出表现最佳的参数。
Outline:
本文会用Sklearn的Boston房价预测作为数据集
先做一次‘调包侠’,用Sklearn内置的GridSearchCV来实现
然后在Tensorflow下实现一次,主要说明Deep Learning环境下的Grid Search操作
Boston House Price Problem:
经典数据集之一,内置在sklearn.datasets里,我们直接加载:
from sklearn import datasets boston = datasets.load_boston() X = boston["data"] Y = boston["target"] print(X.shape) print(Y.shape)
我们可以知道,一共有506条数据,13列(attributes),要求预测房价的数额(Regression Problem)。为了简化过程,我们不去深究各个attribute的意义了。
Grid Search in Sklearn:
sklearn.model_selection.GridSearchCV实现了该功能。在下面的例子中,我们选取了SVR作为model来预测Boston房价。主要tuning的参数有2个,kernel类型(linear or RBF)以及C(=1 or 10)。默认采用5-fold Cross-Validation来评估模型。代码如下:
model = SVR(gamma='scale') parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} reg = GridSearchCV(model, parameters, cv=5) reg.fit(X, Y) sorted(reg.cv_results_.keys()) print(reg.cv_results_)
从打印的结果中,我们可以看到各个parameter组合所构建的模型评分,最终Linear Kernel&C=1胜出。
{... 'params': [{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'}, {'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}], 'split0_test_score': array([ 0.77285459, 0.12029639, 0.77953306, -0.04157249]), 'split1_test_score': array([ 0.72771739, -0.08134385, 0.72810716, 0.01592944]), 'split2_test_score': array([ 0.56131914, -0.79967714, 0.63566857, -0.38338425]), 'split3_test_score': array([0.15056451, 0.09037651, 0.02786433, 0.25941567]), 'split4_test_score': array([ 0.08212844, -0.90391602, -0.07224368, -0.62731013]), 'mean_test_score': array([ 0.45953725, -0.31399285, 0.42049685, -0.15515943]), 'std_test_score': array([0.289307 , 0.44498376, 0.36516833, 0.31248031]), 'rank_test_score': array([1, 4, 2, 3]), 'split0_train_score': array([0.70714979, 0.39723582, 0.70149448, 0.70558716]), 'split1_train_score': array([0.68986786, 0.39850963, 0.68696465, 0.68704436]), 'split2_train_score': array([0.62838757, 0.37872469, 0.64670086, 0.66406787]), 'split3_train_score': array([0.82850586, 0.38276233, 0.82941506, 0.73598928]), 'split4_train_score': array([0.69005814, 0.29652628, 0.69148868, 0.64436246]), 'mean_train_score': array([0.70879385, 0.37075175, 0.71121274, 0.68741023]), 'std_train_score': array([0.06558667, 0.03791872, 0.06197584, 0.03190121]) }
至于为什么要选择SVR模型,而不是选择Neural Network,从而利于与下章的Tensorflow代码做对比。原因在于,随意组合的Neural Network参数,很有可能导致模型无法收敛,甚至把Cost Function推向Nan,从而抛出异常。So...
Grid Search in Tensorflow Deep Learning:
在下面的例子中,我们自己定义循环函数来遍历所有的Parameter组合,构建不同的Model并做Model的表现评估。我们首先定义两个功能函数。第一个需要确定Parameter Scope,model_configs函数来生成Configuration List:
def model_configs(): # define scope of configs learning_rate = [0.0001,0.01] layer1_nodes = [16,32] layer2_nodes = [8,4] # create configs configs = list() for i in learning_rate: for j in layer1_nodes: for k in layer2_nodes: cfg = [i,j,k] configs.append(cfg) print('Total configs: %d' % len(configs)) return configs
第二个是在Tensorflow中增加神经网络Hidden Layer的函数add_layer:
def add_layer(name1,inputs,in_size,out_size,activation_function=None): Weights = tf.get_variable(name1,[in_size, out_size], \ initializer = tf.contrib.layers.xavier_initializer()) biases = tf.Variable(tf.zeros([1, out_size]) + 0.1) Wx_plus_b = tf.matmul(inputs, Weights) + biases if activation_function is None: outputs = Wx_plus_b else: outputs = activation_function(Wx_plus_b) return outputs
最后是主流程控制程序,整体来讲是遍历Configure List里面的所有Parameter组合,来构建Tensorflow Neural Network。在训练中,使用MSE作为Cost Function,使用Adam Optimizer来做优化。在训练好的Models上,使用20%的测试集&MSE来评估各个Model的表现。
cfg_list = model_configs() error_list = [] for cfg in cfg_list: #unzip hyperparameters learning_rate,layer1_nodes,layer2_nodes = cfg; X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, shuffle=True) #define model tf.reset_default_graph() tf_x = tf.placeholder(tf.float32, [None, 13]) tf_y = tf.placeholder(tf.int32, [None, 1]) l1 = add_layer('l1',tf_x,13, layer1_nodes, activation_function=tf.nn.relu) l2 = add_layer('l2',l1, layer1_nodes, layer2_nodes, activation_function=tf.nn.relu) pred = add_layer('out',l2, layer2_nodes, 1, activation_function=tf.nn.relu) with tf.name_scope('loss'): loss = tf.losses.mean_squared_error(tf_y, pred) #sigmoid_cross_entropy_with_logits(labels=tf_y, logits=pred) tf.summary.scalar("loss",tensor=loss) train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss) init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) sess = tf.Session() sess.run(init_op) for j in range(0,10000): sess.run(train_op,{tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])}) cost_ = sess.run(loss, {tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])}) loss = sess.run(loss, feed_dict={tf_x: X_test, tf_y:y_test.reshape([y_test.shape[0],1])}) print('test loss: %.2f'% loss) error_list.append(loss) sess.close() print(cfg_list) print(error_list)
[[0.0001, 16, 8], [0.0001, 16, 4], [0.0001, 32, 8], [0.0001, 32, 4], [0.01, 16, 8], [0.01, 16, 4], [0.01, 32, 8], [0.01, 32, 4]] [659.03925, 15.627606, 34.55378, 598.14703, 579.9314, 10.684119, 25.026648, 103.17941]
最后,个人感觉Naive Grid Search是不太适合Deep Learning问题的。首先,我们看到在例子中,模型是假定的、层数是假定的、Activation是Relu,Optimizer也只使用了Adam,还有例如初始化方法,Cost Function选择,Regularization等林林总总的参数可以被调节。可以说,需要Data Scientist来制定Deep Learning的Optimization计划,然后固定几个参数,再使用Grid Search来遍历搜索。并且,如此多的参数维度,遍历所有参数组合是不可能的,所以应该引入更智能的调参方式。