逻辑回归

标签（空格分隔）： ML 斯坦福视频笔记

分类问题

对于二元分类问题，将因变量y的可能分类称为正向类（1）和负向类（0）。
逻辑回归算法是为了使线性回归的输出在0-1之间。

假设表说

假设函数是预测值在0-1之间的某个函数，线性回归的值可能超出0-1，因此引入逻辑回归，假设：

h_{θ} (x) = g (Θ^{T} x)

$h_ \theta(x)=g(\Theta^Tx)$

Z = Θ^{T} x

$Z=\Theta^Tx$

g (z) = \frac{1}{1 + e^{- z}}

$g(z)=\frac 1 {1+e^{-z}}$
其中新的模型g(z)称为逻辑函数或sigmoid函数，

h_{θ} (x)

$h_ \theta(x)$ 的输出可视为可能性。
判定边界（decision boundary）：

θ^{T} x \geq 0 \Rightarrow y = 1

$θ^Tx≥0⇒y=1$

θ^{T} x < 0 \Rightarrow y = 0

$θ^Tx<0⇒y=0$

代价函数

J (θ) = \frac{1}{m} \sum_{i = 1}^{m} = C o s t (h_{θ} (x^{(i)}), y^{(i)}) i f y = 1 i f y = 0

$J(θ)=\frac 1 m∑_{i=1} ^m=Cost(h_θ(x^{(i)}),y^{(i)})if y = 1if y = 0$

C o s t (h_{θ} (x), y) = - l o g (h_{θ} (x)) i f y = 1

$Cost(h_θ(x),y)=−log(h_θ(x)) \qquad if \qquad y=1$

C o s t (h_{θ} (x), y) = - l o g (1 - h_{θ} (x)) i f y = 0

$Cost(h_θ(x),y)=−log(1−h_θ(x)) \qquad if \qquad y=0$
从以上公式可以看出，

h_{θ} (x)

$h_θ(x)$ 和y之间的差距越大，代价函数的值越大。如果相等，log计算的值为0。这样保证了代价函数是凸函数。可将公式合并;

C o s t (h_{θ} (x), y) = - y \log (h_{θ} (x)) - (1 - y) \log (1 - h_{θ} (x))

$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$

J (θ) = - \frac{1}{m} \sum_{i = 1}^{m} [y^{(i)} \log (h_{θ} (x^{(i)})) + (1 - y^{(i)}) \log (1 - h_{θ} (x^{(i)}))]

$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$
向量化：

\begin{aligned} h = g (X θ) \\ J (θ) = \frac{1}{m} \cdot (- y^{T} \log (h) - (1 - y)^{T} \log (1 - h)) \end{aligned}

$\begin{align*} & h = g(X\theta)\newline & J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right) \end{align*}$
有了代价函数就可以用梯度下降来求解代价函数最小时的参数了:

\begin{aligned} R e p e a t { \\ θ_{j} := θ_{j} - α \frac{\partial}{\partial θ_{j}} J (θ) \\ } \end{aligned}

$\begin{align*}& Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta) \newline & \rbrace\end{align*}$
求导：

\begin{aligned} R e p e a t { \\ θ_{j} := θ_{j} - \frac{α}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)}) x_{j}^{(i)} \\ } \end{aligned}

$\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}$
向量化：

θ := θ - \frac{α}{m} X^{T} (g (X θ) - \vec{y})

$\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})$

改进的代价函数

用库里的方法，代价函数：

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

matlab提供的寻找最小值函数：

options = optimset('GradObj', 'on', 'MaxIter', 100);
      initialTheta = zeros(2,1);
      [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

多分类：一对多，将一个类别拿出来，将剩下的类别作为一个类。重复这个过程，不断地进行二分类。

y={0，1，2，……，n} （n+1分类）从可能性的角度考虑:

\begin{aligned} y \in {0, 1 . . . n} \\ h_{θ}^{(0)} (x) = P (y = 0 | x; θ) \\ h_{θ}^{(1)} (x) = P (y = 1 | x; θ) \\ \dots \\ h_{θ}^{(n)} (x) = P (y = n | x; θ) \\ p r e d i c t i o n = max_{i} (h_{θ}^{(i)} (x)) \end{aligned}

$\begin{align*}& y \in \lbrace0, 1 ... n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}$

ML(六)_逻辑回归