1,导入相关库

import pandas as pd
from lightgbm import LGBMClassifier 
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from xgboost import XGBClassifier

2,读取数据并处理

#读取数据集
test = pd.read_csv(\"/home/tarena/test/web/day11/data_all.csv\")
x = test.drop([\'status\'],axis=1)
y = test[\"status\"]
#数据三七分,随机种子2018
X_train,X_test,y_train,y_test = train_test_split(x, y, test_size =0.3,random_state=2018)

3,建立模型,评分

#逻辑森林
Rfc = RandomForestClassifier()
Rfc.fit(X_train,y_train)
Rfc_score = Rfc.score(X_train,y_train)

#GBDT
Gbdt = GradientBoostingClassifier()
Gbdt.fit(X_train,y_train)
Gbdt_score = Gbdt.score(X_train,y_train)

#XGBoost
Xgb = XGBClassifier()
Xgb.fit(X_train,y_train)
Xgb_score = Xgb.score(X_train,y_train)

#lightgbm
Lgb = LGBMClassifier()
Lgb.fit(X_train,y_train)
lgb_score = Lgb.score(X_train,y_train)

5,打印结果

print(Rfc_score,Gbdt_score,Xgb_score,lgb_score)
0.9861737300871656 0.8623384430417794 0.852419597234746 0.9972948602344455

这一次基本copy了第一天的内容,然后通过百度搜索使用了不同的库。看到有的同学对数据做了标准化和均一化处理,查看了相关资料也没有弄明白原理,但在尝试做类似处理运行之后,结果没有发生变化,所以将数据预处理部分给删除了。

收藏 打印