- 强化学习
我在机器学习读书会的分享ppt,关于DP、MC、TD方法:
https://mp.weixin.qq.com/s/r8wZw4iZwFCz0nnakutY3Q
- 推荐
强化学习在阿里的技术演进与业务创新
https://www.jiqizhixin.com/articles/2018-02-06-3
强化学习在淘宝锦囊推荐系统中的应用
细化搜索的关键词卡片
状态:
用户:性别,年龄,购买力,偏好,当前的行为,page_id,查看/点击的商品特征
Query:类型,此类型下的用户整体偏好
动作:
锦囊:类型(>2万种)
回报:
R1 = is_click * (1 + alpha * exp{-page_num})
R2 = is_click * exp{-item_click_this_user_per_recent_100_pv)
R = r1 + beta * r2
算法:
value_based:DQN
- 搜索:
Reinforcement Learning to Rank with Markov Decision Process
http://www.bigdatalab.ac.cn/~junxu/publications/SIGIR2017_RL_L2R.pdf
State:
Rank pos, cand doc set
(t, {D_t})
Action(s_t):
a_t = d_idx(a_t) from {D_t}
Trans(S,A):
(t, {D_t}) -> (t+1, {D_t}\d_idx(a_t))
Reward(S,A):
R(s_t, a_t) =
2^y_idx(a_t) - 1 for t=0;
(2^y_idx(a_t) - 1) / log_2(t+1) for t>0
Policy(a|s):
exp{w^T d_idx(a_t)} / sum_{a in A(s_t)} exp{w^T d_idx(a_t)}
Learning:
Policy gradient