/Users/shenxin/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py:703: UserWarning: Using the pointplot function without specifying `order` is likely to produce an incorrect plot.
warnings.warn(warning)
/Users/shenxin/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py:708: UserWarning: Using the pointplot function without specifying `hue_order` is likely to produce an incorrect plot.
warnings.warn(warning)
<seaborn.axisgrid.FacetGrid at 0x11c9b0dd8>
/Users/shenxin/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py:703: UserWarning: Using the barplot function without specifying `order` is likely to produce an incorrect plot.
warnings.warn(warning)
<seaborn.axisgrid.FacetGrid at 0x11ce0b8d0>
for dataset in combine:
dataset['IsAlone']=0
dataset.loc[dataset['FamilySize']==1,'IsAlone']=1
train_df[['IsAlone','Survived']].groupby(['IsAlone'], as_index=False).mean()
# machine learningfrom sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
# Logistic Regressionfrom sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
logistic = LogisticRegression()
logistic.fit(X_train, Y_train)
y_pred = logistic.predict(X_train)
acc_log =round(logistic.score(X_train, Y_train)*100,2)print(acc_log)print('把所有数据当作是训练集')print('逻辑回归的准确率为:{}'.format(logistic.score(X_train, Y_train)))
y_pred = logistic.predict(X_train)print('逻辑回归的精确率为:{}'.format(precision_score(Y_train, y_pred)))print('逻辑回归的召回率为:{}'.format(recall_score(Y_train, y_pred)))print('逻辑回归的F1-score为:{}'.format(f1_score(Y_train, y_pred)))
fpr, tpr, _ = roc_curve(Y_train, logistic.predict_proba(X_train)[:,1])
roc_auc = auc(fpr, tpr)# Plot of a ROC curve for a specific class
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)'% roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.perceptron.Perceptron'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
"and default tol will be 1e-3." % type(self), FutureWarning)
77.1
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/stochastic_gradient.py:128: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
"and default tol will be 1e-3." % type(self), FutureWarning)
65.88
# Decision Tree
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
Y_pred = decision_tree.predict(X_test)
acc_decision_tree =round(decision_tree.score(X_train, Y_train)*100,2)
acc_decision_tree
print('把所有数据当作是训练集')print('决策树的准确率为:{}'.format(decision_tree.score(X_train, Y_train)))
y_pred = decision_tree.predict(X_train)print('决策树的精确率为:{}'.format(precision_score(Y_train, y_pred)))print('决策树的召回率为:{}'.format(recall_score(Y_train, y_pred)))print('决策树的F1-score为:{}'.format(f1_score(Y_train, y_pred)))
fpr, tpr, _ = roc_curve(Y_train, decision_tree.predict_proba(X_train)[:,1])
roc_auc = auc(fpr, tpr)# Plot of a ROC curve for a specific class
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)'% roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
xgboost的结果是: [0.78114478 0.80808081 0.82154882]
把所有数据当作是训练集
xgboost的准确率为:0.8395061728395061
xgboost的精确率为:0.8419243986254296
xgboost的召回率为:0.716374269005848
xgboost的F1-score为:0.7740916271721958
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
/Users/shenxin/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
logistic = LogisticRegression()
logistic.fit(X_train, Y_train)print('如果把所有数据当作是训练集,并对其进行预测的结果如下所示:')print('逻辑回归的准确率为:{}'.format(logistic.score(X_train, Y_train)))
y_pred = logistic.predict(X_train)print('逻辑回归的精确率为:{}'.format(precision_score(Y_train, y_pred)))print('逻辑回归的召回率为:{}'.format(recall_score(Y_train, y_pred)))print('逻辑回归的F1-score为:{}'.format(f1_score(Y_train, y_pred)))
fpr, tpr, _ = roc_curve(Y_train, logistic.predict_proba(X_train)[:,1])
roc_auc = auc(fpr, tpr)# Plot of a ROC curve for a specific class
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)'% roc_auc)
plt.plot([0,1],[0,1],'k--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()