线性回归

线性回归很简单，用线性函数拟合数据，用 mean square error (mse) 计算损失（cost），然后用梯度下降法找到一组使 mse 最小的权重。

lasso 回归和岭回归（ridge regression）其实就是在标准线性回归的基础上分别加入 L1 和 L2 正则化（regularization）。

正则化

岭回归-Ridge

Lasso is great for feature selection, but when building regression models, Ridge regression should be your first choice.[datacamp]
(https://campus.datacamp.com/courses/supervised-learning-with-scikit-learn/regression-2?ex=13)

# Import necessary modules
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score

# Setup the array of alphas and lists to store scores
alpha_space = np.logspace(-4, 0, 50)
ridge_scores = []
ridge_scores_std = []

# Create a ridge regressor: ridge
ridge = Ridge(normalize=True)

# Compute scores over range of alphas
# 给出很多a值，调参，就是超参数调参
for alpha in alpha_space:

    # Specify the alpha value to use: ridge.alpha
    ridge.alpha = alpha
    
    # Perform 10-fold CV: ridge_cv_scores
    # 用到了10折交叉验证进行的调参
    ridge_cv_scores = cross_val_score(ridge, X, y, cv=10)
    
    # Append the mean of ridge_cv_scores to ridge_scores
    ridge_scores.append(np.mean(ridge_cv_scores))
    
    # Append the std of ridge_cv_scores to ridge_scores_std
    ridge_scores_std.append(np.std(ridge_cv_scores))

# Display the plot
display_plot(ridge_scores, ridge_scores_std)

Lasso_套索回归（最小绝对值收敛）

Loss function = OLS loss function +\(\alpha * \sum_{i=1}^{n}\left|a_{i}\right|\)

# Import Lasso
from sklearn.linear_model import Lasso

# Instantiate a lasso regressor: lasso
lasso = Lasso(alpha=0.4, normalize=True)

# Fit the regressor to the data
lasso.fit(X, y)

# Compute and print the coefficients
lasso_coef = lasso.coef_
print(lasso_coef)

# Plot the coefficients
plt.plot(range(len(df_columns)), lasso_coef)
plt.xticks(range(len(df_columns)), df_columns.values, rotation=60)
plt.margins(0.02)
plt.show()

回归模型综述

线性回归

正则化

岭回归-Ridge

Lasso_套索回归（最小绝对值收敛）

猜你喜欢