1 问题
给定 x ∈ R n × 1 \mathbf{x} \in \mathbb{R}^{n \times 1} x∈Rn×1, A ∈ R n × n \mathbf{A} \in \mathbb{R}^{n \times n} A∈Rn×n, f ( x ) = ( A x ) ⊙ ( A x ) \mathbf{f}(\mathbf{x})=\sqrt{(\mathbf{A} \mathbf{x}) \odot (\mathbf{Ax})} f(x)=(Ax)⊙(Ax)。 其中 ( ⋅ ) \sqrt{(\cdot)} (⋅)表示Hadamard root (elements-wise square root),即矩阵元素逐项平方根。求 ∂ f ∂ x \frac{\partial \mathbf{f}}{\partial \mathbf{x}} ∂x∂f。
2 求解
2.1 先用Hadamard product解平方根
令: b = A x \mathbf{b} = \mathbf{A} \mathbf{x} b=Ax,有: d b = d ( A x ) = A d x d\mathbf{b} = d(\mathbf{A} \mathbf{x}) = \mathbf{A} d\mathbf{x} db=d(Ax)=Adx
2.2 矩阵对矩阵求导一般先将矩阵矢量化
f ⊙ f = ( A x ) ⊙ ( A x ) = b ⊙ b \begin{aligned} \mathbf{f} \odot \mathbf{f} &=(\mathbf{A} \mathbf{x}) \odot (\mathbf{A} \mathbf{x}) \\ &=\mathbf{b} \odot \mathbf{b} \end{aligned} f⊙f=(Ax)⊙(Ax)=b⊙b
根据微分哈达马乘积性质: d ( x ⊙ Y ) = x ⊙ d Y + d x ⊙ Y d(\mathbf{x} \odot \mathbf{Y})=\mathbf{x} \odot d \mathbf{Y}+d \mathbf{x} \odot \mathbf{Y} d(x⊙Y)=x⊙dY+dx⊙Y
有:
d ( f ⊙ f ) = f ⊙ d f + d f ⊙ f = f ⊙ d f + f ⊙ d f = 2 f ⊙ d f d i a g ( f ) vec(df) = d i a g ( b ) vec(db) ( 性 质 : vec ( A ⊙ X ) = diag ( A ) vec ( X ) ) \begin{aligned} d(\mathbf{f} \odot \mathbf{f}) &=\mathbf{f} \odot d \mathbf{f}+d \mathbf{f} \odot \mathbf{f} \\ &=\mathbf{f} \odot d \mathbf{f}+\mathbf{f} \odot d \mathbf{f} \\ &= 2\mathbf{f} \odot d \mathbf{f} \\ \operatorname{diag(\mathbf{f})\operatorname{vec(d\mathbf{f})}} &= \operatorname{diag(\mathbf{b})\operatorname{vec(d\mathbf{b})}} \quad (性质:\operatorname{vec}(\mathbf{A} \odot \mathbf{X})=\operatorname{diag}(\mathbf{A}) \operatorname{vec}(\mathbf{X})) \end{aligned} d(f⊙f)diag(f)vec(df)=f⊙df+df⊙f=f⊙df+f⊙df=2f⊙df=diag(b)vec(db)(性质:vec(A⊙X)=diag(A)vec(X))
其中 diag ( f ) \operatorname{diag}(\mathbf{f}) diag(f) 是 n × n n \times n n×n 的对角矩阵,对角线上的元素是矩阵 f \mathbf{f} f 按列向量化后排列出来的; diag ( b ) \operatorname{diag}(\mathbf{b}) diag(b)同理。
vec(df) = diag(f) − 1 diag ( b ) vec ( d b ) \operatorname{vec(d\mathbf{f})} = \operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag}(\mathbf{b}) \operatorname{vec}(d\mathbf{b}) vec(df)=diag(f)−1diag(b)vec(db)
b ∈ R n × 1 ⟹ vec(db) = d b \mathbf{b} \in \mathbb{R}^{n \times 1} \implies \operatorname{vec(d \mathbf{b})} = d \mathbf{b} b∈Rn×1⟹vec(db)=db
∴ diag(f) d f = diag(b) A d x \therefore \operatorname{diag(\mathbf{f})} d \mathbf{f} = \operatorname{diag(b)} \mathbf{A} d \mathbf{x} ∴diag(f)df=diag(b)Adx
vec(df) = diag(f) − 1 diag(b) A d x \operatorname{vec(d\mathbf{f})} = \operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag(\mathbf{b})} \mathbf{A} d \mathbf{x} vec(df)=diag(f)−1diag(b)Adx
矩阵对矩阵求导如果采用分母布局,有:
vec ( d f ) = ( ∂ f ∂ x ) T vec ( d x ) \operatorname{vec}(d \mathbf{f})=\left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right)^{T} \operatorname{vec}(d \mathbf{x}) vec(df)=(∂x∂f)Tvec(dx)
如果是采用分子布局,有:
vec ( d f ) = ( ∂ f ∂ x ) vec ( d x ) \operatorname{vec}(d \mathbf{f})=\left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right) \operatorname{vec}(d \mathbf{x}) vec(df)=(∂x∂f)vec(dx)
所以,对于此问题,如果采用分母布局:
∂ f ∂ x = ( diag(f) − 1 diag(b) A ) T \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \left(\operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag(\mathbf{b})} \mathbf{A}\right)^{T} ∂x∂f=(diag(f)−1diag(b)A)T
如果采用分子布局:
∂ f ∂ x = diag(f) − 1 diag(b) A \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag(\mathbf{b})} \mathbf{A} ∂x∂f=diag(f)−1diag(b)A