《Python数据分析与应用》实验四 使用scikit_第1页
《Python数据分析与应用》实验四 使用scikit_第2页
《Python数据分析与应用》实验四 使用scikit_第3页
《Python数据分析与应用》实验四 使用scikit_第4页
《Python数据分析与应用》实验四 使用scikit_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、实验四使用scikit-learn构建模型教材P196实训1-实训41、实训1#读数据import pandas as pdwine=pd.read_csv(D:桌面 实验四 datawine.csv)winequality=pd.read_csv(D:桌面 实验四 datawinequality.csv,sep=;)#数据和标签拆分开wine_data=wine.iloc:,1:wine_target=wineClassprint(wine 数据集的数据为:n,wine_data)print(wine 数据集的标签为:n,wine_target)winequality_data=winequ

2、ality.iloc:,:-1winequality_target=winequalityqualityprint(winequality 数据集的数据为:n,winequality_data)print(winequality 数据集的标签为:n,winequality_target)wine薮据集的数据为=Alcohol Malic_acid心, HueOD280/OD315_of_d iluted_wi n esProline014-231-712-431出3-921065113.201.782.141.053.401050213.162.362.671.033,171185314.37

3、1.952.50 S863,4514S6413-242-592-S72*9373517313.715 652.450641.7474017413.403.912.48-71,567517513.274.282260.591.5683517613-172.592.37S6&84617714.134.102.740,6116056617Brows x 13 columnswine数据星的标签为:_011121314117331743175317631773Name;Classj Length: 17Bjdtype:int64fixed acidity volatile acidity citric

4、 acidPHsulphatesalcohol074.3.51孔417*S32&.6B孔827.8矶了跪矶前 3269-65孔8311.2如墓3矶5E 316n5S9.S47.40.00 351e.se9.4,/. jU a 15940.600.a 3450n5B10.515955-90.559矶1/3520.7611-2159&0,510.13342n75IK15975.90.645.12 3570.7110.215980.3100.47 3390.6611.01599rows k 11 columns 1winequa lity教据集的标签为:0515253645IS 9451595615

5、9&G1575159&Name :qualityj Length: 1599jdtype: int64# 划分训练集和测试集from sklearn.model_selection import train_test_split wine_data_train, wine_data_test, wine_target_train, wine_target_test = train_test_split(wine_data, wine_target, test_size=0.1, random_state=6) winequality_data_train, winequality_data_t

6、est, winequality_target_train, winequality_target_test = train_test_split(winequality_data, winequality_target, test_size=0.1, random_state=6)#标准化数据集from sklearn.preprocessing import StandardScaler stdScale = StandardScaler().fit(wine_data_train) wine_trainScaler = stdScale.transform(wine_data_train

7、) wine_testScaler = stdScale.transform(wine_data_test) stdScale = StandardScaler().fit(winequality_data_train)winequality_trainScaler = stdScale.transform(winequality_data_train)winequality_testScaler = stdScale.transform(winequality_data_test)#PCA降维from sklearn.decomposition import PCApca = PCA(n_c

8、omponents=5).fit(wine_trainScaler)wine_trainPca = pca.transform(wine_trainScaler)wine_testPca = pca.transform(wine_testScaler)pca = PCA(n_components=5).fit(winequality_trainScaler)winequality_trainPca = pca.transform(winequality_trainScaler)winequality_testPca = pca.transform(winequality_testScaler)

9、2、实训2#根据实训1的wine数据集处理的结果,构建聚类数目为3的K-Means模型from sklearn.cluster import Kmeanskmeans = KMeans(n_clusters = 3,random_state=1).fit(wine_trainScaler) print(构建的 KMeans 模型为:n,kmeans)#对比真实标签和聚类标签求取FMIfrom sklearn.metrics import fowlkes_mallows_score score=fowlkes_mallows_score(wine_target_train,kmeans.labe

10、ls_) print(wine 数据集的 FMI:%f%(score)5=wineS据集的FMIiS, 924119#在聚类数目为210类时,确定最优聚类数目for i in range(2,11):kmeans = KMeans(n_clusters = i,random_state=123).fit(wine_trainScaler) score = fowlkes_mallows_score(wine_target_train,kmeans.labels_) print(iris 数据聚d 类 FMI 评价分值为:%f %(i,score)讣诂数握聚卫类FMI评价分值灼:如砧00跆 计诂

11、数据聚3类FMI评价分值为:&. 936567 外氐数据鞘类FMT评价分值为:&.S4636S 计诂数据聚5类斤II评倩分值为;叽74疗43 E感据聚白类评价分值为:0 .669224 2诂数据集7类FMI评价始值为:9.671255 :L坛数据聚日类FMI评价分值为:寻.6432漏数据聚9类FMI评价分值为;叽,59卷花 计也数据聚10类Fill评价分值为:0.544814#求取模型的轮廓系数,绘制轮廓系数折线图,确定最优聚类数目from sklearn.metrics import silhouette_scoreimport matplotlib.pyplot as pltsilhoue

12、ttteScore =for i in range(2,11):kmeans = KMeans(n_clusters = i,random_state=1).fit(wine)score = silhouette_score(wine,kmeans.labels_)silhouettteScore.append(score)plt.figure(figsize=(10,6)plt.plot(range(2,11),silhouettteScore,linewidth=1.5, linestyle=-)plt.show()#求取Calinski-Harabasz指数,确定最优聚类数from sk

13、learn.metrics import calinski_harabaz_scorefor i in range(2,11):kmeans = KMeans(n_clusters = i,random_state=1).fit(wine)score = calinski_harabaz_score(wine,kmeans.labels_)print(seeds 数据聚d 类 calinski_harabaz 指数为:%f%(i,score)cannot import name calin5ki_harabaz_5core from sklearnmetrics (E;anacondalib5

14、ite-package55klearnXmetric5_init_.py)出现错误,代码没有问题。换了个电脑出现结果;目曰曰曲数据聚卫类calinski_harabsz指敖为:55.425&S3 seeds数据聚3类calin3ki_harabaztB数为;561-805-171 seedsI据聚4类calinski_harab3z指数为:702-648-113 能曲m数据槃5类calinski_harabaz指数为:7S7-0L1163 seeds数据聚白类calinski_har abaz指数为 853-737946 击曰曲数据聚7类Galinski_haraLaz指数为:11S7.42.

15、1S17 seeds 数据聚8类ca 1 inski_harbazta为;1297.354659 seedsI 据聚日类 calinski_harab3z 指数为:1349-991148 能曲m数据槃IS类calinski_har3baz指数为:1B17.2&47803、实训3#读取wine数据集,区分标签和数据import pandas as pdwine = pd.read_csv (D:桌面 实验四 datawine.csv)wine_data=wine.iloc:,1:wine_target=wineClass#将wine数据集划分为训练集和测试集from sklearn.model_

16、selection import train_test_splitwine_data_train, wine_data_test, wine_target_train, wine_target_test = train_test_split(wine_data, wine_target, test_size=0.1, random_state=6)#使用离差标准化方法标准化wine数据集from sklearn.preprocessing import MinMaxScaler stdScale = MinMaxScaler().fit(wine_data_train) wine_trainS

17、caler = stdScale.transform(wine_data_train) wine_testScaler = stdScale.transform(wine_data_test) #构建SVM模型,并预测测试集结果。from sklearn.svm import SVCsvm = SVC().fit(wine_trainScaler,wine_target_train) print(建立的 SVM 模型为:n,svm)wine_target_pred = svm.predict(wine_testScaler)r四项测前10 个结果为:_n,wine_target_pred:10

18、)#打印出分类报告,评价分类模型性能from sklearn.metrics import classification_report print(使用SVM预测iris数据的分类报告为:,n, classification_report(wine_target_test,wine_target_pred)使用5VM预驰圮七数据的分类报告为;fl-scaresupportprecisionrecall11.001.001.00921.001.001.00S31.001.991.001accuracy1.09L¯o avg1.091.901.00ISweighted avg1.091.9

19、01.00IS4、实训4#根据wine_quality数据集处理的结果,构建线性回归模型。from sklearn.linear_model import LinearRegressionclf = LinearRegression().fit(winequality_trainPca,winequality_target_train)y_pred = clf.predict(winequality_testPca)print(线性回归模型预测前10个结果为:,n,y_pred:10)线性-回归模型预11前的个结果为二66.41&6791 6.23&73173 5.22673901 6.010

20、22972 5.4L15Z4B75.19534622 5-57988078 5-LS2S1258 5.42316S32#根据wine_quality数据集处理的结果,构建梯度提升回归模型from sklearn.ensemble import GradientBoostingRegressorGBR_wine = GradientBoostingRegressor().fit(winequality_trainPca,winequality_target_train)wine_target_pred = GBR_wine.predict(winequality_testPca)print(梯度

21、提升回归模型预测前10个结果为:,n,wine_target_pred:10)print(真实标签前十个预测结果为:,n,list(winequality_target_test:10)梯度握升回归模型预刑前明个结果为;&.6SS29089 6-3457347 &.025B99S4 5.56539492 5-7BB79856 5.5966SB4S5.21665472 5.71703838 5. 2&027603 5.38079&76真实标签前+个预删结果为:666566翕655#结合真实评分和预测评分,计算均方误差、中值绝对误差、可解释方差值 #根据得分,判定模型的性能优劣from sklea

22、rn.metrics import mean_absolute_errorfrom sklearn.metrics import mean_squared_errorfrom sklearn.metrics import median_absolute_errorfrom sklearn.metrics import explained_variance_scorefrom sklearn.metrics import r2_scoreprint(线性回归模型评价结果:)print(winequality数据线性回归模型的平均绝对误差为:,mean_absolute_error(winequa

23、lity_target_test,y_pred)print(winequality数据线性回归模型的均方误差为:,mean_squared_error(winequality_target_test,y_pred)print(winequality数据线性回归模型的中值绝对误差为:,median_absolute_error(winequality_target_test,y_pred)print(winequality数据线性回归模型的可解释方差值为:,explained_variance_score(winequality_target_test,y_pred) print(winequa

24、lity数据线性回归模型的R方值为:,r2_score(winequality_target_test,y_pred)print(梯度提升回归模型评价结果:)from sklearn.metrics import explained_variance_score,mean_absolute_error,mean_squared_error,median_absolute_error,r2_score print(winequality数据梯度提升回归树模型的平均绝对误差为:,mean_absolute_error(winequality_target_test,wine_target_pred)print(winequality数据梯度提升回归树模型的均方误差为:,mean_squared_error(winequality_target_test,wine_target_pred)print(winequality数据梯度提升回归树模型的中值绝对误差为:,median_absolute_error(winequality_target_test,wine_target_pred) print(winequality数据梯度提升回归树模型的可解释方差值为:,explaine

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论