一、原理详解
概率:
1.1 定义 概率(P)robability: 对一件事情发生的可能性的衡量
1.2 范围 0 <= P <= 1
1.3 计算方法:
- 根据个人置信
- 根据历史数据
- 根据模拟数据
1.4 条件概率:
Logistic Regression (逻辑回归)
2.1 例子
h(x) > 0.5
h(x) > 0.22.2 基本模型
测试数据为X(x0,x1,x2···xn)
要学习的参数为: Θ(θ0,θ1,θ2,···θn)
向量表示:
处理二值数据,引入Sigmoid函数时曲线平滑化
预测函数:
用概率表示:
正例(y=1):
反例(y=0):
2.3 Cost函数
线性回归:
找到合适的 θ0,θ1使上式最小
Logistic regression:
Cost函数:
目标:找到合适的 θ0,θ1使上式
2.4 解法:梯度下降(gradient decent)
更新法则:
学习率
同时对所有的θ进行更新
重复更新直到收敛
二、代码实现
这段代码,主要就是试验下简单梯度下降算法的可用性,数据是随机生成的。
import numpy as np
import random
# m denotes the number of examples here, not the number of features
def gradient_descent(x, y, theta, alpha, m, num_iterations):
x_trans = x.transpose()
for i in range(0, num_iterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(x_trans, loss) / m
# update
theta = theta - alpha * gradient
return theta
def generator_data(num_points, bias, variance):
x = np.zeros(shape=(num_points, 2))
y = np.zeros(shape=num_points)
# basically a straight line
for i in range(0, num_points):
# bias feature
x[i][0] = 1
x[i][1] = i
# our target variable
y[i] = (i + bias) + random.uniform(0, 1) * variance
return x, y
# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = generator_data(100, 25, 10)
# print(x)
# print(y)
m, n = np.shape(x)
num_iterations = 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradient_descent(x, y, theta, alpha, m, num_iterations)
print(theta)
……..
梯度下降算法在深度学习中是一个很重要的算法,对于求解最优点,最小值。所以大家有必要将其基本的数学理论学习扎实,后面还有什么随机梯度下降,是为了避免overfitting的情况。