一、回归分析——LinearRegression

一、官方文档

class sklearn.linear_model.LinearRegression(fit_intercept=Truenormalize=Falsecopy_X=Truen_jobs=1)

1.参数:

Ordinary least squares Linear Regression.(普通最小二乘线性回归)

Parameters:

fit_intercept : boolean, optional, default True。是否计算截距,默认为计算

normalize : boolean, optional, default False。该参数在 fit_intercept 设置为False时自动忽略,表示是否在回归前对数据进行正则化,如果要对数据进行标准化,请使用sklearn.preprocessing.StandardScaler

copy_X : boolean, optional, default True。If True, X will be copied; else, it may be overwritten.

n_jobs : int, optional, default 1

Attributes:

coef_ : array, shape (n_features, ) or (n_targets, n_features)

intercept_ : array

2、Methods

  • fit(X, y[, sample_weight]):Fit linear model.
  • get_params([deep])Get parameters for this estimator.
  • predict(X):Predict using the linear model
  • score(X, y[, sample_weight]):Returns the coefficient of determination R^2 of the prediction.
  • set_params(**params):Set the parameters of this estimator

(1)Fit linear model.

Parameters:

X : numpy array or sparse matrix of shape [n_samples,n_features]  (Training data)

y : numpy array of shape [n_samples, n_targets]。(Target values).

sample_weight : numpy array of shape [n_samples]。每个样本的权重

Returns:

self : returns an instance of self.

(2)get_params(deep=True)

Parameters:

deep : boolean, optional 如果为真将返回估计器的参数

Returns:

params : mapping of string to any。Parameter names mapped to their values.


(3)predict(X)

Predict using the linear model

Parameters:

X : {array-like, sparse matrix}, shape = (n_samples, n_features)

Returns:

C : array, shape = (n_samples,).返回预测值

(4)score(Xysample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).


Parameters:

X : array-like, shape = (n_samples, n_features);Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs);True values for X.

sample_weight : array-like, shape = [n_samples], optional;Sample weights.

Returns:

score : float;R^2 of self.predict(X) wrt. y

(5)set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

二、代码实现

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Load the diabetes dataset
diabetes = datasets.load_diabetes()


# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]  #去除后面20行
diabetes_X_test = diabetes_X[-20:]   #取后面20行

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

###################################创建线性模型#################################
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error 均方误差
print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

输出结果:

diabetes_y_pred
Out[84]: 
array([225.9732401 , 115.74763374, 163.27610621, 114.73638965,
       120.80385422, 158.21988574, 236.08568105, 121.81509832,
        99.56772822, 123.83758651, 204.73711411,  96.53399594,
       154.17490936, 130.91629517,  83.3878227 , 171.36605897,
       137.99500384, 137.99500384, 189.56845268,  84.3990668 ])

print('Coefficients: \n', regr.coef_)
Coefficients: 
 [938.23786125]

print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
Mean squared error: 2548.07

print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))
Variance score: 0.47

plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
Out[88]: <matplotlib.collections.PathCollection at 0x1e435f09550>

猜你喜欢

转载自blog.csdn.net/weixin_39541558/article/details/80692999