1. 分类器

1.1. 逻辑回归

Logistic regression在sklearn中有不同的实现方式，即solver{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’，其中当solver为‘sag’或者‘liblinear’时，需要指定随机种子(The seed of the pseudo random number generator to use when shuffling the data)。

Changed in version 0.22: The default solver changed from ‘liblinear’ to ‘lbfgs’ in 0.22.

所以如果scikit-learn版本低于0.22，使用默认参数，则就需要指定随机种子。
但如果使用逻辑回归，出现下列警告：

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

此时增加max_iter即可。

修改前：

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
lr = LogisticRegression(random_state=0)
lr.fit(X, y)
print(lr.score(X, y))

修改后：

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
lr = LogisticRegression(random_state=0, max_iter=5000)
lr.fit(X, y)
print(lr.score(X, y))

逻辑回归每次fit的时候，都会重新初始化coef_和intercept_，其中部分fit代码所示：

def fit(self, X, y, sample_weight=None):
	self.coef_ = list()
    self.intercept_ = np.zeros(n_classes)

假如我们现有需求是进行多次fit，下一次fit想在上一次fit的基础上进行训练，则只需加上参数warm_start=True即可，
warm_start：热启动参数，bool类型。默认为False。如果为True，则下一次训练是以追加树的形式进行（重新使用上一次的调用作为初始化）。

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import sklearn

X, y = load_iris(return_X_y=True)
lr = LogisticRegression(random_state=0, max_iter=5000, warm_start=True)

lr.fit(X, y)
lr.coef_
lr.intercept_

在这里插入图片描述

lr.fit(X, y)
lr.coef_
lr.intercept_

在这里插入图片描述
如图所示，coef_和intercept_略有差异，说明warm_start参数是work的。

scikit-learn使用汇总

1. 分类器

1.1. 逻辑回归

猜你喜欢