训练二元分类器
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
features = iris.data[:100,:]
target = iris.target[:100]
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)
logistic_regression = LogisticRegression(random_state=0)
model = logistic_regression.fit(features_standardized,target)
- 逻辑回归是一种二元分类器。只能处理二元的分类
- 创建观察值并确定分类
>>> new_observation = [[0.5,0.5,0.5,0.5]]
>>> model.predict(new_observation)
array([1])
model.predict_proba(new_observation)
array([[0.17738424, 0.82261576]])
训练多元分类器
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
features = iris.data
target = iris.target
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)
logistic_regression = LogisticRegression(random_state=0,multi_class="ovr")
model = logistic_regression.fit(features_standardized,target)
>>> new_observations = [[0.7,0.7,0.7,0.7]]
>>> model.predict(new_observations)
array([2])
>>> model.predict_proba(new_observations)
array([[0.0141224 , 0.22656313, 0.75931447]])
通过正则化来减小方差
- 正则化是通过城发复杂模型来减小方差的方法
- 将一个惩罚项加在我们希望最小化的损失函数上
- LogisticRegressionCV的参数
利用随机平均梯度训练(sag)逻辑回归模型
- 随机平均梯度算法使我们在超大数据集上训练分类比用其他模型的速度要快一些
- 大部分情况下,scikit-learn会帮助我们自动选择最佳的solver 或者 给出一些警告
logistic_regression = LogisticRegression(random_state=0,solver="sag")
处理不均衡的分类
- LogisticRegression 自带一个处理不均衡的分类方法
- 人工处理不均衡的分类
LogisticRegression(random_state=0,class_weight="balanced")