主成分综合打分-Loan_aply

主成分分析

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

pd.set_option('display.max_columns', None)

某金融服务公司为了了解贷款客户的信用程度,评价客户的信用等级,采用信用评级常用的5C方法,说明客户违约的可能性。

字段 含义
品格 指客户的名誉
能力 指客户的偿还能力
资本 指客户的财务势力和财务状况
担保 指对申请贷款项担保的覆盖程度
环境 指外部经济、政策环境对客户的影响

•每个单项都是由专家打分给出的。

主成分帮助文档(scickit-learn)

loan = pd.read_csv("Loan_aply.csv")
loan.head()
ID X1 X2 X3 X4 X5
0 1 76.5 81.5 76.0 75.8 71.7
1 2 70.6 73.0 67.6 68.1 78.5
2 3 90.7 87.3 91.0 81.5 80.0
3 4 77.5 73.6 70.9 69.8 74.8
4 5 85.6 68.5 70.0 62.2 76.5
plt.figure(figsize=(2, 2))
plt.scatter(loan['X1'], loan['X2'])
plt.title('Scatter')

plt.show()

[外链图片转存失败(img-xF2CqiTj-1562728614503)(output_5_0.png)]

import seaborn as sns

sns.pairplot(loan.loc[:, 'X1':])
plt.show()

[外链图片转存失败(img-lBgfxaSy-1562728614505)(output_6_0.png)]

计算相关系数矩阵

loan.ix[ :,'X1':].corr(method='pearson')
X1 X2 X3 X4 X5
X1 1.000000 0.726655 0.825342 0.676314 0.685563
X2 0.726655 1.000000 0.929080 0.938382 0.841413
X3 0.825342 0.929080 1.000000 0.883457 0.733482
X4 0.676314 0.938382 0.883457 1.000000 0.762563
X5 0.685563 0.841413 0.733482 0.762563 1.000000

初次查看主成分的解释方差占比

from sklearn.decomposition import PCA

pca = PCA()
pca.fit(loan.loc[ :,'X1':])

print(pca.explained_variance_ratio_ )
[0.84585431 0.08914623 0.04259067 0.01663007 0.00577872]
print(pca.components_)
[[ 0.46881402  0.48487556  0.47274449  0.46174663  0.32925948]
 [ 0.83061232 -0.32991571  0.02117417 -0.43090441 -0.12293025]
 [ 0.0214065   0.0148012  -0.4127194  -0.24084475  0.87805421]
 [ 0.25465387 -0.28771993 -0.58858207  0.70628304 -0.0842856 ]
 [ 0.15808149  0.75700032 -0.50921327 -0.2104032  -0.31367674]]
pca1 = PCA(n_components=1, whiten=True)
pca1.fit(loan.ix[ :,'X1':])
PCA(copy=True, iterated_power='auto', n_components=1, random_state=None,
  svd_solver='auto', tol=0.0, whiten=True)

将打分结果和原始数据联结

score =  pd.DataFrame(pca1.transform(loan.ix[:, 'X1':]),
                      columns=['score', ])
loan.join(score).sort_values(by='score', ascending=False)
ID X1 X2 X3 X4 X5 score
6 7 94.0 94.0 87.5 89.5 92.0 1.770770
2 3 90.7 87.3 91.0 81.5 80.0 1.238404
5 6 85.0 79.2 80.3 84.4 76.5 0.672219
0 1 76.5 81.5 76.0 75.8 71.7 0.156252
3 4 77.5 73.6 70.9 69.8 74.8 -0.215028
4 5 85.6 68.5 70.0 62.2 76.5 -0.316231
1 2 70.6 73.0 67.6 68.1 78.5 -0.444657
7 8 84.6 66.9 68.8 64.8 66.4 -0.510540
9 10 70.0 69.2 71.7 64.9 68.9 -0.682753
8 9 57.7 60.4 57.4 60.8 65.0 -1.668435

猜你喜欢

转载自blog.csdn.net/weixin_40903057/article/details/95317653