数据信息:

这是本次实践数据的下载地址 https://pan.baidu.com/s/1dtHJiV6zMbf_fWPi-dZ95g
说明:这份数据集是金融数据(非原始数据,已经处理过了),要做的是预测贷款用户是否会逾期。表格中 “status” 是结果标签:0表示未逾期,1表示逾期。

任务1.1 - 模型构建

  1. 将金融数据集三七分
  2. 随机种子2018
  3. 调用sklearn的包,简单构建逻辑回归、SVM和决策树3个模型并对每一个模型进行评分
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv(\'data_all.csv\')
df.head()   # show the top five records
df.describe()   # statistic info of each column such as count, mean, min, std

# assign and split the data into train data and test data
y = df[\'status\']
x = df.drop(\'status\', axis=1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=2018)
print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)
#(3327, 84) (3327,)
#(1427, 84) (1427,)

# train three models and score them
lr = LogisticRegression(random_state=2018)
lr.fit(x_train,y_train)

svm = SVC(random_state=2018)
svm.fit(x_train,y_train)

dtree = DecisionTreeClassifier(random_state=2018)
dtree.fit(x_train, y_train)

score_lr = lr.score(x_test,y_test)
score_svm = svm.score(x_test,y_test)
score_dtree = dtree.score(x_test,y_test)
print(\"LogisticRegression: \", score_lr)
print(\"SVM: \", score_svm)
print(\"DecisionTreeClassifier: \", score_dtree)
#LogisticRegression:  0.7484232655921513
#SVM:  0.7484232655921513
#DecisionTreeClassifier:  0.6846531184302733
收藏 打印