个推教程--第六课--排序浅层模型之LR

LR：就是特征线性加权加sigmoid激活函数，与线性回归分开，原理不同，后者最小二乘法，前者是最大似然概率

公式推导主要基于最大似然，最后采用梯度下降法对W权重列表进行迭代

超参如下，具体可以参考https://zhuanlan.zhihu.com/p/39780207

1、正则化，L1和L2

L1:适合特征较多的高维数据

L2：适合非高维的数据

2、正则化系数

一般是C，sklearn中取C的倒数

3、solver，梯度下降算法

样本量较大时，可以考虑批量梯度下降法

样本量较小时，可以使用拟牛顿法、lbfgs

L1正则化时只能用liblinear来对损失函数优化，因为l1是权重系数绝对值之和，在0处不可导，也就没有二阶导。

线上应用时直接把系数直接存下来，server加载使用

在排序中，更在意模型的排序能力，即auc得分。实际场景中大于0.7即认为是可用的模型

特征组合：

https://developers.google.com/machine-learning/crash-course/feature-crosses/crossing-one-hot-vectors?hl=zh-cn

很少对连续特征作特征组合，一般是对离散特征作组合，注意新特征的特征取值为两个特征原离散化后长度之积

具体看代码

def add(str_one, str_two):
    """
    Args:
        str_one:"0,0,1,0"
        str_two:"1,0,0,0"
    Return:
        str such as"0,0,1,0,0"
    """
    list_one = str_one.split(",")
    list_two = str_two.split(",")
    list_one_len = len(list_one)
    list_two_len = len(list_two)
    return_list = [0]*(list_one_len*list_two_len)
    try:
        index_one = list_one.index("1")
    except:
        index_one = 0
    try:
        index_two = list_two.index("1")
    except:
        index_two = 0
    return_list[index_one*list_two_len + index_two] = 1
    return ",".join([str(ele) for ele in return_list])


def combine_feature(feature_one, feature_two, new_feature, train_data_df, test_data_df, feature_num_dict):
    """
    Args:
        feature_one:
        feature_two:
        new_feature: combine feature name
        train_data_df:
        test_data_df:
        feature_num_dict: ndim of every feature, key feature name value len of the dim
    Return:
        new_feature_num
    """
    train_data_df[new_feature] = train_data_df.apply(lambda row: add(row[feature_one], row[feature_two]), axis=1)
    test_data_df[new_feature] = test_data_df.apply(lambda row: add(row[feature_one], row[feature_two]), axis=1)
    if feature_one not in feature_num_dict:
        print "error"
        sys.exit()
    if feature_two not in feature_num_dict:
        print "error"
        sys.exit()
    return feature_num_dict[feature_one]*feature_num_dict[feature_two]


#调用示例
new_feature_len = combine_feature("age", "capital-gain", "age_gain", train_data_df, test_data_df, feature_num_dict)

个推教程--第六课--排序浅层模型之LR

猜你喜欢