1. 分类器
1.1. 逻辑回归
Logistic regression在sklearn中有不同的实现方式,即solver{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’,其中当solver为‘sag’或者‘liblinear’时,需要指定随机种子(The seed of the pseudo random number generator to use when shuffling the data)。
Changed in version 0.22: The default solver changed from ‘liblinear’ to ‘lbfgs’ in 0.22.
所以如果scikit-learn版本低于0.22,使用默认参数,则就需要指定随机种子。
但如果使用逻辑回归,出现下列警告:
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
此时增加max_iter即可。
修改前:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
lr = LogisticRegression(random_state=0)
lr.fit(X, y)
print(lr.score(X, y))
修改后:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
lr = LogisticRegression(random_state=0, max_iter=5000)
lr.fit(X, y)
print(lr.score(X, y))
逻辑回归每次fit的时候,都会重新初始化coef_和intercept_,其中部分fit代码所示:
def fit(self, X, y, sample_weight=None):
self.coef_ = list()
self.intercept_ = np.zeros(n_classes)
假如我们现有需求是进行多次fit,下一次fit想在上一次fit的基础上进行训练,则只需加上参数warm_start=True即可,
warm_start:热启动参数,bool类型。默认为False。如果为True,则下一次训练是以追加树的形式进行(重新使用上一次的调用作为初始化)。
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import sklearn
X, y = load_iris(return_X_y=True)
lr = LogisticRegression(random_state=0, max_iter=5000, warm_start=True)
lr.fit(X, y)
lr.coef_
lr.intercept_
lr.fit(X, y)
lr.coef_
lr.intercept_
如图所示,coef_和intercept_略有差异,说明warm_start参数是work的。