论文解惑《word2vec Parameter Learning Explained》1.1--CBOW模型中One-word context情况公式推导问题

word2vec中有CBOW和Skip-Gram模型，对于两个模型中的参数如何学习的公式推导，在《word2vec Parameter Learning Explained》中有详细解释，我在阅读1.1节One-word context时对于公式(8)的推导感到不解，花了些时间，原文如下：
“Let us now derive the update equation of the weights between hidden and output layers. Take the derivative of E with regard to $j$ -th unit’s net input $u_j$ , we obtain $\frac{\partial E}{\partial u_j}=y_j-t_j:=e_j$ where $t_j=\mathbb{1}(j=j^*),\text{i.e},t_j$ will only be 1 when the $j$ -th unit is the output word, otherwise $t_j=0.$ ”
我一开始不明白是怎么推到这一步的，后来发现过程很显然：
$\begin{aligned} E & =\text{log}\sum_{j'=1}^V{\text{exp}(u_{j'})-u_{j*}} \\ e_j=\frac{\partial E}{\partial u_j} & =\frac{\text{exp}(u_j)}{\sum_{j'=1}^V{\text{exp}(u_{j'})}}-u_{j*} \\ & =y_j-u_{j*} \\ & =y_j-t_i \end{aligned}$

K_Snail

发布了27 篇原创文章 · 获赞 10 · 访问量 5010

私信关注

论文解惑《word2vec Parameter Learning Explained》1.1--CBOW模型中One-word context情况公式推导问题

猜你喜欢