矩阵求导-1——分子布局与分母布局

主要参考文章1
主要参考文章2

output = function(input)
input和output都有标量、向量(本文中的向量均为列向量,其转置为行向量)、矩阵三种形式,input用x,xX表示,output用f,fF表示,共9种情况。即:
f(x),f(x),f(X),f(x),f(x),f(X),F(x),F(x),F(X)
每种情况又有 分子布局(numerator layout)和分母布局(denominator layout)两种表示方式。
分子布局:分子为列向量,分母为行向量
分母布局:分母为列向量,分子为行向量
x = [ x 1 x 2 . . . x n ] X = [ x 11 x 12 . . . x 1 n x 21 x 22 . . . x 2 n . . . . . . . . . . . . x m 1 x m 2 . . . x m n ] f = [ f 1 f 2 . . . f n ] F = [ f 11 f 12 . . . f 1 n f 21 f 22 . . . f 2 n . . . . . . . . . . . . f m 1 f m 2 . . . f m n ] v e c ( X ) = [ x 11 , x 21 , . . . , x m 1 , x 12 , x 22 , . . . , x m 2 , . . . , x 1 n , x 2 n , . . . , x m n ] T \pmb{x}=\begin{bmatrix} x_1\\x_2\\...\\x_n \end{bmatrix}\\ \pmb{X}=\begin{bmatrix} x_{11}&x_{12}&...&x_{1n}\\x_{21}&x_{22}&...&x_{2n}\\...&...&...&...\\x_{m1}&x_{m2}&...&x_{mn} \end{bmatrix}\\ \pmb{f}=\begin{bmatrix} f_1\\f_2\\...\\f_n \end{bmatrix}\\ \pmb{F}=\begin{bmatrix} f_{11}&f_{12}&...&f_{1n}\\f_{21}&f_{22}&...&f_{2n}\\...&...&...&...\\f_{m1}&f_{m2}&...&f_{mn} \end{bmatrix}\\ vec(\pmb X)=[x_{11},x_{21},...,x_{m1},x_{12},x_{22},...,x_{m2},...,x_{1n},x_{2n},...,x_{mn}]^T xxx=x1x2...xnXXX=x11x21...xm1x12x22...xm2............x1nx2n...xmnfff=f1f2...fnFFF=f11f21...fm1f12f22...fm2............f1nf2n...fmnvec(XXX)=[x11,x21,...,xm1,x12,x22,...,xm2,...,x1n,x2n,...,xmn]T
注:
1)在深度学习中,较多使用的是分母排列方式
2)两种排列方式只是两派人的符号约定,不同领域的不同作者会使用不同的符号约定(分子排列和分母排列中的一个)

  1. f(x)
    ∂ f ∂ x (1) \frac{\partial f}{\partial x}\tag{1} xf(1)

  2. f(x)
    分 母 布 局 ( 梯 度 向 量 形 式 / 列 向 量 偏 导 形 式 / 列 偏 导 向 量 形 式 ) : ∇ x f ( x ) = ∂ f ( x ) ∂ x = [ ∂ f ∂ x 1 ∂ f ∂ x 2 . . . ∂ f ∂ x n ] (2) 分母布局(\pmb{梯度向量形式}/列向量偏导形式/列偏导向量形式):\nabla_{\pmb x}f(\pmb x)=\frac{\partial f(\pmb x)}{\partial \pmb x}=\begin{bmatrix} \frac{\partial f}{\partial x_1}\\\frac{\partial f}{\partial x_2}\\...\\\frac{\partial f}{\partial x_n} \end{bmatrix}\tag{2} 梯度向量形式//xxxf(xxx)=xxxf(xxx)=x1fx2f...xnf(2)

    分 子 布 局 ( 行 向 量 偏 导 形 式 / 行 偏 导 向 量 形 式 ) : D x f ( x ) = ∂ f ( x ) ∂ x T = [ ∂ f ∂ x 1 ∂ f ∂ x 2 . . . ∂ f ∂ x n ] (3) 分子布局(行向量偏导形式/行偏导向量形式):D_{\pmb x}f(\pmb x)=\frac{\partial f(\pmb x)}{\partial \pmb x^T}=\begin{bmatrix} \frac{\partial f}{\partial x_1}&\frac{\partial f}{\partial x_2}&...&\frac{\partial f}{\partial x_n} \end{bmatrix}\tag{3} /Dxxxf(xxx)=xxxTf(xxx)=[x1fx2f...xnf](3)

  3. f(X)

    (4)和(6)互为转置,(5)和(7)互为转置

    当X为列向量时,(2)(4)(5)相等,(3)(6)(7)相等。
    梯 度 向 量 形 式 / 列 向 量 偏 导 形 式 / 列 偏 导 向 量 形 式 : ∇ v e c   X f ( X ) = ∂ f ( X ) ∂ v e c   X = [ ∂ f ∂ x 11 ∂ f ∂ x 21 . . . ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 . . . ∂ f ∂ x m 2 . . . ∂ f ∂ x 1 n ∂ f ∂ x 2 n . . . ∂ f ∂ x m n ] (4) \pmb{梯度向量形式}/列向量偏导形式/列偏导向量形式:\nabla_{vec\:\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial vec \:\pmb X}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}\\\frac{\partial f}{\partial x_{21}}\\...\\\frac{\partial f}{\partial x_{m1}}\\ \frac{\partial f}{\partial x_{12}}\\\frac{\partial f}{\partial x_{22}}\\...\\\frac{\partial f}{\partial x_{m2}}\\...\\\frac{\partial f}{\partial x_{1n}}\\ \frac{\partial f}{\partial x_{2n}}\\...\\\frac{\partial f}{\partial x_{mn}} \end{bmatrix}\tag{4} 梯度向量形式//vecXXXf(XXX)=vecXXXf(XXX)=x11fx21f...xm1fx12fx22f...xm2f...x1nfx2nf...xmnf(4)

    梯 度 矩 阵 : ∇ X f ( X ) = ∂ f ( X ) ∂ X = [ ∂ f ∂ x 11 ∂ f ∂ x 12 . . . ∂ f ∂ x 1 n ∂ f ∂ x 21 ∂ f ∂ x 22 . . . ∂ f ∂ x 2 n . . . . . . . . . . . . ∂ f ∂ x m 1 ∂ f ∂ x m 2 . . . ∂ f ∂ x m n ] (5) \pmb{梯度矩阵}:\nabla_{\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial \pmb X}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&...&\frac{\partial f}{\partial x_{1n}}\\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&...&\frac{\partial f}{\partial x_{2n}}\\ ...&...&...&...\\ \frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&...&\frac{\partial f}{\partial x_{mn}}\\ \end{bmatrix}\tag{5} 梯度矩阵XXXf(XXX)=XXXf(XXX)=x11fx21f...xm1fx12fx22f...xm2f............x1nfx2nf...xmnf(5)

    行 向 量 偏 导 形 式 / 行 偏 导 向 量 形 式 : D v e c   X f ( X ) = ∂ f ( X ) ∂ v e c T   X = [ ∂ f ∂ x 11 ∂ f ∂ x 21 . . . ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 . . . ∂ f ∂ x m 2 . . . ∂ f ∂ x 1 n ∂ f ∂ x 2 n . . . ∂ f ∂ x m n ] (6) 行向量偏导形式/行偏导向量形式:D_{vec\:\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial vec^T \:\pmb X}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{21}}& ...& \frac{\partial f}{\partial x_{m1}}& \frac{\partial f}{\partial x_{12}}& \frac{\partial f}{\partial x_{22}}& ...& \frac{\partial f}{\partial x_{m2}}& ...& \frac{\partial f}{\partial x_{1n}}& \frac{\partial f}{\partial x_{2n}}& ...& \frac{\partial f}{\partial x_{mn}} \end{bmatrix}\tag{6} /DvecXXXf(XXX)=vecTXXXf(XXX)=[x11fx21f...xm1fx12fx22f...xm2f...x1nfx2nf...xmnf](6)

    J a c o b i a n 矩 阵 : D X f ( X ) = ∂ f ( X ) ∂ X T = [ ∂ f ∂ x 11 ∂ f ∂ x 21 . . . ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 . . . ∂ f ∂ x m 2 . . . . . . . . . . . . ∂ f ∂ x 1 n ∂ f ∂ x 2 n . . . ∂ f ∂ x m n ] (7) Jacobian矩阵:D_{\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial \pmb X^T}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&...&\frac{\partial f}{\partial x_{m1}}\\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}&...&\frac{\partial f}{\partial x_{m2}}\\ ...&...&...&...\\ \frac{\partial f}{\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&...&\frac{\partial f}{\partial x_{mn}}\\ \end{bmatrix}\tag{7} JacobianDXXXf(XXX)=XXXTf(XXX)=x11fx12f...x1nfx21fx22f...x2nf............xm1fxm2f...xmnf(7)


    以下为自己的总结,可能包含错误

  4. f(x)
    ∇ x f ( x ) = ∂ f ( x ) ∂ x = [ ∂ f 1 ∂ x ∂ f 2 ∂ x . . . ∂ f n ∂ x ] D x f ( x ) = ∂ f T ( x ) ∂ x = [ ∂ f 1 ∂ x ∂ f 2 ∂ x . . . ∂ f n ∂ x ] \nabla_x\pmb f(x)= \frac{\partial \pmb f(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_1}{\partial x}\\ \frac{\partial f_2}{\partial x}\\ ...\\ \frac{\partial f_n}{\partial x} \end{bmatrix}\\ D_x\pmb f(x)= \frac{\partial \pmb f^T(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_1}{\partial x}& \frac{\partial f_2}{\partial x}& ...& \frac{\partial f_n}{\partial x} \end{bmatrix} xfff(x)=xfff(x)=xf1xf2...xfnDxfff(x)=xfffT(x)=[xf1xf2...xfn]

  5. f(x)
    ∇ x f ( x ) = ∂ f T ( x ) ∂ x = [ ∂ f 1 ∂ x 1 ∂ f 2 ∂ x 1 ∂ f n ∂ x 1 ∂ f 1 ∂ x 2 ∂ f 2 ∂ x 2 ∂ f n ∂ x 2 . . . . . . . . . ∂ f 1 ∂ x n ∂ f 2 ∂ x n ∂ f n ∂ x n ] D x f ( x ) = ∂ f ( x ) ∂ x T = [ ∂ f 1 ∂ x 1 ∂ f 1 ∂ x 2 ∂ f 1 ∂ x n ∂ f 2 ∂ x 1 ∂ f 2 ∂ x 2 ∂ f 2 ∂ x n . . . . . . . . . ∂ f n ∂ x 1 ∂ f n ∂ x 2 ∂ f n ∂ x n ] \nabla_{\pmb x}\pmb f(\pmb x)=\frac{\partial \pmb f^T(\pmb x)}{\partial \pmb x}=\begin{bmatrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_2}{\partial x_1}&\frac{\partial f_n}{\partial x_1}\\ \frac{\partial f_1}{\partial x_2}&\frac{\partial f_2}{\partial x_2}&\frac{\partial f_n}{\partial x_2}\\ ...&...&...\\ \frac{\partial f_1}{\partial x_n}&\frac{\partial f_2}{\partial x_n}&\frac{\partial f_n}{\partial x_n}\\ \end{bmatrix}\\ D_{\pmb x}\pmb f(\pmb x)=\frac{\partial \pmb f(\pmb x)}{\partial \pmb x^T}=\begin{bmatrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&\frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&\frac{\partial f_2}{\partial x_n}\\ ...&...&...\\ \frac{\partial f_n}{\partial x_1}&\frac{\partial f_n}{\partial x_2}&\frac{\partial f_n}{\partial x_n}\\ \end{bmatrix}\\ xxxfff(xxx)=xxxfffT(xxx)=x1f1x2f1...xnf1x1f2x2f2...xnf2x1fnx2fn...xnfnDxxxfff(xxx)=xxxTfff(xxx)=x1f1x1f2...x1fnx2f1x2f2...x2fnxnf1xnf2...xnfn

  6. f(X)
    ∇ X f ( X ) = ∂ f T ( X ) ∂ v e c X D X f ( X ) = ∂ f ( X ) ∂ v e c T X \nabla_{\pmb X}\pmb f(\pmb X)=\frac{\partial \pmb f^T(\pmb X)}{\partial vec\pmb X}\\ D_{\pmb X}\pmb f(\pmb X)=\frac{\partial \pmb f(\pmb X)}{\partial vec^T\pmb X}\\ XXXfff(XXX)=vecXXXfffT(XXX)DXXXfff(XXX)=vecTXXXfff(XXX)

  7. F(x)
    ∇ x F ( x ) = ∂ F ( x ) ∂ x = [ ∂ f 11 ∂ x ∂ f 12 ∂ x . . . ∂ f 1 n ∂ x ∂ f 21 ∂ x ∂ f 22 ∂ x . . . ∂ f 2 n ∂ x . . . . . . . . . . . . ∂ f n 1 ∂ x ∂ f n 2 ∂ x . . . ∂ f m n ∂ x ] D x F ( x ) = ∂ F T ( x ) ∂ x = [ ∂ f 11 ∂ x ∂ f 21 ∂ x . . . ∂ f n 1 ∂ x ∂ f 12 ∂ x ∂ f 22 ∂ x . . . ∂ f n 2 ∂ x . . . . . . . . . . . . ∂ f 1 n ∂ x ∂ f 2 n ∂ x . . . ∂ f m n ∂ x ] \nabla_x\pmb F(x)= \frac{\partial \pmb F(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_{11}}{\partial x}&\frac{\partial f_{12}}{\partial x}&...&\frac{\partial f_{1n}}{\partial x}\\ \frac{\partial f_{21}}{\partial x}&\frac{\partial f_{22}}{\partial x}&...&\frac{\partial f_{2n}}{\partial x}\\ ...&...&...&...\\ \frac{\partial f_{n1}}{\partial x}&\frac{\partial f_{n2}}{\partial x}&...&\frac{\partial f_{mn}}{\partial x}\\ \end{bmatrix}\\ D_x\pmb F(x)= \frac{\partial \pmb F^T(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_{11}}{\partial x}&\frac{\partial f_{21}}{\partial x}&...&\frac{\partial f_{n1}}{\partial x}\\ \frac{\partial f_{12}}{\partial x}&\frac{\partial f_{22}}{\partial x}&...&\frac{\partial f_{n2}}{\partial x}\\ ...&...&...&...\\ \frac{\partial f_{1n}}{\partial x}&\frac{\partial f_{2n}}{\partial x}&...&\frac{\partial f_{mn}}{\partial x}\\ \end{bmatrix} xFFF(x)=xFFF(x)=xf11xf21...xfn1xf12xf22...xfn2............xf1nxf2n...xfmnDxFFF(x)=xFFFT(x)=xf11xf12...xf1nxf21xf22...xf2n............xfn1xfn2...xfmn

  8. F(x)
    ∇ X F ( x ) = ∂ v e c T ( F ( X ) ) ∂ x D X F ( x ) = ∂ v e c ( F ( X ) ) ∂ x T \nabla_{\pmb X}\pmb F(\pmb x)=\frac{\partial \pmb vec^T(\pmb F(\pmb X))}{\partial \pmb x}\\ D_{\pmb X}\pmb F(\pmb x)=\frac{\partial \pmb vec(\pmb F(\pmb X))}{\partial \pmb x^T}\\ XXXFFF(xxx)=xxxvvvecT(FFF(XXX))DXXXFFF(xxx)=xxxTvvvec(FFF(XXX))

  9. F(X)
    ∇ X F ( X ) = ∂ v e c T ( F ( X ) ) ∂ v e c X D X F ( X ) = ∂ v e c ( F ( X ) ) ∂ v e c T X \nabla_{\pmb X}\pmb F(\pmb X)=\frac{\partial \pmb vec^T(\pmb F(\pmb X))}{\partial vec\pmb X}\\ D_{\pmb X}\pmb F(\pmb X)=\frac{\partial \pmb vec(\pmb F(\pmb X))}{\partial vec^T\pmb X}\\ XXXFFF(XXX)=vecXXXvvvecT(FFF(XXX))DXXXFFF(XXX)=vecTXXXvvvec(FFF(XXX))

猜你喜欢

转载自blog.csdn.net/qq_42283621/article/details/123838150