Suppose we have (x(i),y(i)) with sample size N, where x(i)∈RD. y^=j=1∑Dβjxj L(a,b)=21(a−b)2 ε(β0,β1,...,βD)=N1i=1∑NL(y^(i),y(i))=2N1i=1∑N(y^(i),y(i))2=2N1i=1∑N(j=1∑Dβjxj(i)−y(i))2
Take Derivative with respect to wj: ∂βj∂ε=N1i=1∑Nxj(i)(y^(i)−y(i))=N1i=1∑Nxj(i)(j′=1∑Dβj′xj′(i)−y(i))(这部分注意:你就是这里不明白)=N1j′=1∑D(i=1∑Nxj(i)xj′(i))βj′−N1i=1∑Nxj(i)y(i) Let Ajj′=N1∑i=1Nxj(i)xj′(i)∈RD and cj=N1∑i=1Nxj(i)y(i)∈RD. Then: ∂βj∂ε=N1j′=1∑D(i=1∑Nxj(i)xj′(i))βj′−N1i=1∑Nxj(i)y(i)=N1j′=1∑DAjj′βj′−cj=set0 Let X∈RN×D , A=N1XTX and c=N1XTy X=⎣⎢⎢⎢⎢⎡x(1)Tx(2)T..x(n)T⎦⎥⎥⎥⎥⎤(3) ∂βj∂ε=N1j′=1∑D(i=1∑Nxj(i)xj′(i))βj′−N1i=1∑Nxj(i)y(i)=N1j′=1∑DAjj′βj′−cj=Aβ−c=set0 β^=A−1c=(XTX)−1XTt 终于解决了!
一种更简单的方法是直接在risk做变换: ε(β0,β1,...,βD)=2N1i=1∑N(j=1∑Dβjxj(i)−y(i))2=2N1[Xβ−y]T[Xβ−y] Finally, the MLE estimate is β^=(XTX)−1XTy
This is only a estimate from one single training data, but we really want to get the true error or prediction error, which can be defined as: εtrue(β0,β1,...,βD)=21E(j=1∑Dβjxj−y)2=21∫x(j=1∑Dβjxj−y)2p(x)dx If want to read more about bias-variance in linear regression model, read the following: https://courses.cs.washington.edu/courses/cse546/12wi/slides/cse546wi12LinearRegression.pdf