转载,原文地址请点击这里
在特征处理(Feature Processing)中我介绍了利用笛卡尔乘积的方法来构造组合特征。这种方法虽然简单,但麻烦的是会使得特征数量爆炸式增长。比如一个可以取N个不同值的类别特征,与一个可以去M个不同值的类别特征做笛卡尔乘积,就能构造出N*M个组合特征。
特征太多这个问题在具有个性化的问题里尤为突出。如果把用户id看成一个类别特征,那么它可以取的值的数量就等于用户数。把这个用户id特征与其他特征做笛卡尔积,就能产生庞大的特征集。做广告算法的公司经常宣称自己模型里有几十上百亿的特征,基本都是这么搞出来的。
当然,特征数量多的问题自古有之,目前也已经有很多用于降维的方法。比如聚类、PCA等都是常用的降维方法1。但这类方法在特征量和样本量很多的时候本身就计算量很大,所以对大问题也基本无能为力。
本文介绍一种很简单的降维方法——特征哈希(Feature Hashing)法2 3。
特征哈希法的目标是把原始的高维特征向量压缩成较低维特征向量,且尽量不损失原始特征的表达能力。
记哈希前的特征向量为x∈RN。我们要把这个原始的N维特征向量压缩成M维(M < N)。 记
h(n):{1,…,N}→{1,…,M}为一个选定的均匀哈希函数,而
ξ(n):{1,…,N}→{−1,1}为另一个选定的均匀哈希函数。
h(n)和
ξ(n)是独立选取的,它们没关系。按下面方式计算哈希后的M维新特征向量
ϕ∈RM的第
i个元素值(
ϕ是依赖于
x的,所以有时候也把
ϕ写成
ϕ(x)):
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0069.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/003D.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Size2/Regular/400/2211.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/006A.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/003A.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0068.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/006A.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/003D.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0069.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03BE.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/006A.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/006A.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
可以证明,按上面的方式生成的新特征ϕ在概率意义下保留了原始特征空间的内积,以及距离2:
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2032.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2225.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2212.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2032.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2225.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2248.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2248.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2032.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2225.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2212.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2032.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2225.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
其中x和
x′为两个原始特征向量,而
ϕ和
ϕ′为对应的哈希后的特征向量。
利用上面的哈希方法把x转变成
ϕ后,就可以直接把
ϕ用于机器学习算法了。这就是利用特征哈希法来降低特征数量的整个过程。需要说的是,这里面的两个哈希函数
h和
ξ并不要求非要是把整数哈希成整数,其实它们只要能把原始特征均匀哈希到新特征向量上就行。例如在NLP里,每个特征代表一个单词,那么只要保证
h和
ξ把单词均匀哈希到
{1,…,M}和
{−1,1}就行。
下面具体说明如何把特征哈希法应用于多任务学习(multitask learning)问题。所谓多任务学习,就是同时求解多个问题。个性化问题就是一种典型的多任务学习问题,它同时学习多个用户的兴趣偏好。
![](/qrcode.jpg)
在世纪佳缘我们使用Logistic Regression (LogReg) 模型学习每个男性用户的交友兴趣以便预测他给具有某些特征的女性的发信概率。这时候学习一个男性用户的交友兴趣就是一个学习任务。记男性用户集合为U,抽取出的女性特征维度为
d。我们为每个用户
u∈U学习一组参数
wu∈Rd。再加上一组全局参数
w0,总共有
N≜d⋅(1+∣U∣)个参数。这种表达方式就是把男性用户id与所有特征
x做了笛卡尔积。下图给出了一个有3个用户且
x长度为2时扩展后各个用户对应特征向量的示例图。LogReg模型通过计算
(w0+wu)Tx来获得最终的预测概率值。
这个问题也可以转化到特征哈希后的空间来看。我们为每个用户引入一个不同的转换函数ϕu(x)。一般取
ϕu(x)=ϕ((u,x))即可。那么用户
u对应的扩展向量通过哈希转换后为
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0068.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/AMS/Regular/400/225C.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
扩展向量对应的权重参数[w0T,…,w∣U∣T]通过哈希转换后为
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0068.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/AMS/Regular/400/225C.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Size2/Regular/400/2211.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2208.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Bold/283/0055.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
那么在哈希转换后的空间里,
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0068.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0068.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/003D.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Size4/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Size2/Regular/400/2211.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2208.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Bold/283/0055.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Size4/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2248.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2248.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/003D.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002B.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
这从理论上证明了特征哈希可用于此多任务学习问题。 上面公式中第一个近似等式利用了不同任务之间哈希转换后的参数ϕu(wu)与特征
ϕu′(x)近似不相关2的结论,即:
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0077.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0054.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/03D5.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/283/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/200/2032.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0028.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0078.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0029.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2248.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/0030.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/002C.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2200.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/2260.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Math/Italic/400/0075.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/283/2032.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
![](https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/fonts/HTML-CSS/TeX/png/Main/Regular/400/00A0.png?V=2.7.1)
具体实现算法时,我们并不需要关心wh,只需要把原始特征
x通过哈希转换成
xuh即可。剩下的就是标准机器学习流程了。
特征哈希法可以降低特征数量,从而加速算法训练与预测过程,以及降低内存消耗;但代价是通过哈希转换后学习的模型变得很难检验,我们很难对训练出的模型参数做出合理解释。特征哈希法的另一个问题是它会把多个原始特征哈希到相同的位置上,出现哈希里的collision现象。但实际实验表明这种collision对算法的精度影响很小3。
最后,总结下特征哈希法相对于其他机器学习降维算法的优势:
- 实现简单,所需额外计算量小;
- 可以添加新的任务(如新用户),或者新的原始特征而保持哈希转换后的特征长度不变,很适合任务数频繁变化的问题(如个性化推荐里新用户,新item的出现);
- 可以保持原始特征的稀疏性,既然哈希转换时只有非0原始特征才起作用;
- 可以只哈希转换其中的一部分原始特征,而保留另一部分原始特征(如那些出现collision就会很影响精度的重要特征)。