s1ep3-pandas和大熊猫们Pandas一起游戏吧_第1页
s1ep3-pandas和大熊猫们Pandas一起游戏吧_第2页
s1ep3-pandas和大熊猫们Pandas一起游戏吧_第3页
s1ep3-pandas和大熊猫们Pandas一起游戏吧_第4页
s1ep3-pandas和大熊猫们Pandas一起游戏吧_第5页
已阅读5页,还剩65页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

和大熊猫们(Pandas)一起游API速查:(基于NumPy,SciPy的功能,在其上补充了大量的数据操作(DataManipulation)功能0.上手玩:Why普通的程序员看到一份数据会怎么做InimportcodecsimportrequestsimportnumpyasnpimportscipyasspimportimportcodecsimportrequestsimportnumpyasnpimportscipyasspimportpandasaspdimportdatetimeimportjsonInrr=withcodecs.open('S1EP3_Iris.txt','w',encoding='utf-8')asf:5.1,3.5,1.4,0.2,Iris-4.9,3.0,1.4,0.2,Iris-4.7,3.2,1.3,0.2,Iris-4.6,3.1,1.5,0.2,Iris-5.0,3.6,1.4,0.2,Iris-5.4,3.9,1.7,0.4,Iris-4.6,3.4,1.4,0.3,Iris-5.0,3.4,1.5,0.2,Iris-4.4,2.9,1.4,0.2,Iris-4.9,3.1,1.5,0.1,Iris-5.4,3.7,1.5,0.2,Iris-4.8,3.4,1.6,0.2,Iris-4.8,3.0,1.4,0.1,Iris-,3.0,1.1,0.1,Iris-5.8,4.0,1.2,0.2,Iris-5.7,4.4,1.5,0.4,Iris-5.4,3.9,1.3,0.4,Iris-5.1,3.5,1.4,0.3,Iris-,3.8,1.7,0.3,Iris-5.1,3.8,1.5,0.3,Iris-5.4,3.4,1.7,0.2,Iris-,3.7,1.5,0.4,Iris-,3.6,1.0,0.2,Iris-,3.3,1.7,0.5,Iris-4.8,3.4,1.9,0.2,Iris-5.0,3.0,1.6,0.2,Iris-5.0,3.4,1.6,0.4,Iris-5.2,3.5,1.PAGE5.2,3.5,1.PAGE6,0.2,Iris-,3.4,1.4,0.2,Iris-,3.2,1.6,0.2,Iris-5.4,3.4,1.5,0.4,Iris-,4.1,1.5,0.1,Iris-5.5,4.2,1.4,0.2,Iris-4.9,3.1,1.5,0.1,Iris-5.0,3.2,1.2,0.2,Iris-5.5,3.5,1.3,0.2,Iris-4.9,3.1,1.5,0.1,Iris-,3.0,1.3,0.2,Iris-5.1,3.4,1.5,0.2,Iris-5.0,3.5,1.3,0.3,Iris-,2.3,1.3,0.3,Iris-4.4,3.2,1.3,0.2,Iris-5.0,3.5,1.6,0.6,Iris-5.1,3.8,1.9,0.4,Iris-,3.0,1.4,0.3,Iris-5.1,3.8,1.6,0.2,Iris-,3.2,1.4,0.2,Iris-,3.7,1.5,0.2,Iris-5.8,2.6,4.0,1.2,Iris-5.0,2.3,3.3,1.0,Iris-5.0,2.3,3.3,1.0,Iris-5.1,2.5,3.0,1.1,Iris-5.1,2.5,3.0,1.1,Iris-Inwithwithcodecs.open('S1EP3_Iris.txt','r',encoding='utf-8')asf:lines=f.readlines()forlineinprint5.1,3.5,1.4,0.2,Iris-4.9,3.0,1.4,0.PAGE4.9,3.0,1.4,0.PAGE3,Iris-,3.2,1.3,0.2,Iris-4.6,3.1,1.5,0.2,Iris-5.0,3.6,1.4,0.2,Iris-,3.9,1.7,0.4,Iris-5.0,3.4,1.5,0.2,Iris-4.4,2.9,1.4,0.2,Iris-,3.1,1.5,0.1,Iris-5.4,3.7,1.5,0.2,Iris-,3.4,1.6,0.2,Iris-4.8,3.0,1.4,0.1,Iris-,3.0,1.1,0.1,Iris-,4.0,1.2,0.2,Iris-5.7,4.4,1.5,0.4,Iris-5.4,3.9,1.3,0.4,Iris-5.1,3.5,1.4,0.3,Iris-5.7,3.8,1.7,0.3,Iris-,3.8,1.5,0.3,Iris-5.4,3.4,1.7,0.2,Iris-,3.7,1.5,0.4,Iris-,3.6,1.0,0.2,Iris-,3.3,1.7,0.5,Iris-4.8,3.4,1.9,0.2,Iris-5.0,3.0,1.6,0.2,Iris-5.0,3.4,1.6,0.4,Iris-,3.5,1.5,0.2,Iris-,3.4,1.4,0.2,Iris-,3.2,1.6,0.2,Iris-,3.1,1.6,0.2,Iris-5.4,3.4,1.5,0.4,Iris-,4.1,1.5,0.1,Iris-5.5,4.2,1.4,0.2,Iris-,3.1,1.5,0.1,Iris-5.0,3.2,1.2,0.2,Iris-5.5,3.5,1.3,0.2,Iris-4.9,3.1,1.5,0.1,Iris-,3.0,1.3,0.2,Iris-5.1,3.4,1.5,0.2,Iris-5.0,3.5,1.3,0.3,Iris-,2.3,1.3,0.3,Iris-4.4,3.2,1.3,0.2,Iris-5.0,3.5,1.6,0.6,Iris-5.1,3.8,1.9,0.4,Iris-4.8,3.0,1.4,0.3,Iris-5.1,3.8,1.6,0.2,Iris-,3.2,1.4,0.2,Iris-,3.7,1.5,0.2,Iris-6.2,2.2,4.5,1.5,Iris-5.6,2.5,3.9,1.1,Iris-5.6,2.5,3.9,1.1,Iris-6.4,2.9,4.3,1.3,Iris-6.4,2.9,4.3,1.3,Iris-6.4,3.1,5.5,1.PAGE6.4,3.1,5.5,1.PAGE9,Iris-6.9,3.1,5.1,2.3,Iris-Pandas的意义Inimportimportpandasasirisdata=pd.read_csv('S1EP3_Iris.txt',header=None,encoding='utf-8')012340Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150rows×5Incnamescnames=['sepal_length','sepal_width','petal_length','petal_width','class'] olumns=cnames0Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150rows×5InIris-Iris-Iris-In0Inprintprintforxins=print'{0:<12}'.format(.upper()),"Statistics:",'{0:>5}{1:>5}{2:>5}{3:>5}'.format(s.max(),s.min(),Iris- Iris- Iris- dtype:快速Inslogsslogs=lambdaentpy=lambdax:sp.exp((slogs(x.sum())-x.map(slogs).sum())/x.sum())Iris-Iris-Iris-欢迎来到大熊猫世Pandas的重要数据类Index(行索引,行级元数据Series:pandas的长枪(数据表中的一列或一行,观测向量,一维数组数据世界中对于任意一个的全面观测,或者对于任意一组某一属性的观测,全部可以抽象为Series的概念用值构建一个由默认index和values组成InSeries1Series1=printprintprint - -dtype:float64<class'pandas.core.series.Series'>Int64Index([0,1,2,3],dtype='int64')[0.030480420.07274621-0.18660749-Series支持过滤的原理就如同Inprintprintprint dtype:当然也支持Inprintprintprint - -dtype: dtype:以及UniversalInprintprint#NumPyUniversalf_np=np.frompyfunc(lambdaprint0123dtype:0123dtype:在序列上就使用行标,而不是创建一个2列的数据表,能够轻松辨别哪里是数据,哪里是元数据InSeries2Series2=pd.Series(Series1.values,index=['norm_'+unicode(i)foriinprintprintSeries2.indexprintSeries2.values--float64<classIndex([u'norm_0',u'norm_1',u'norm_2',u'norm_3'],<class[0.030480420.07274621-0.18660749-虽然行是有顺序的,但是仍然能够通过行级的index来到数据(当然也不尽然像OrderedDict,因为行索引甚至可以重复,不推荐重复的行索引不代表不能用Inprintprint dtype:float64Inprintprint'norm_0'inprint'norm_6'in默认行索引就像行号一样InprintprintInt64Index([0,1,2,3],从Key不重复的OrderedDict或者从Dict来定义Series就不需要担心行索引重复InSeries3_DictSeries3_Dict={"Japan":"Tokyo","S.Korea":"Seoul","China":"Beijing"}Series3_pdSeries=pd.Series(Series3_Dict)printprintprint dtype:object['Beijing''Tokyo'Index([u'China',u'Japan',u'S.Korea'],想让序列按你的排序方式保存?就算有缺失值都毫无问InSeries4_IndexList=["Japan","China","Singapore","S.Korea"]Series4_pdSeriesSeries4_IndexList=["Japan","China","Singapore","S.Korea"]Series4_pdSeries=pd.Series(Series3_Dict,index=Series4_IndexList)printSeries4_pdSeriesprintSeries4_pdSeries.valuesprintSeries4_pdSeries.indexprintSeries4_pdSeries.isnull()print S.KoreaSeouldtype:['Tokyo''Beijing'nanIndex([u'Japan',u'China',u'Singapore',u'S.Korea'],dtype='object') dtype:bool dtype:bool整个序列级别的元数据信息当数据序列以及index本身有了名字,就可以更方便的进行后续的数据关联InprintprintprintInInSeries4_pdSSeries4_pdS="CapitalSeries"Series4_pdS="Nation"printSeries4_pdSeries ChinaBeijingSingaporeNaNS.KoreaSeoulName:CapitalSeries,dtype:"字典"?不是的,行index可以重复,尽管不推荐InSeries5_IndexListSeries5_IndexList=Series5=pd.Series(Series1.values,index=printprintABB-C-dtype:BB-Adtype:Series的有序集合,就像R的DataFrame一样方便仔细想想,绝大部分的数据形式都可以表现为DataFrame从NumPy二维数组、从文件或者从数据库定义:数据虽好,勿忘IndataNumPydataNumPy=np.asarray([('Japan','Tokyo',4000),('S.Korea','Seoul',1300),('China','Beijing',910DF1=pd.DataFrame(dataNumPy,columns=['nation','capital','GDP'])012等长的列数据保存在一个字典里(JSON):很不幸,字典keyInIndataDictdataDict{'nation':['Japan','S.Korea','China'],'capital':['Tokyo','Seoul','Beijing'],'GDP':[DF2=012从另一个DataFrame定义DataFrame:啊,强迫症犯InDF21DF21=pd.DataFrame(DF2,columns=['nation','capital','GDP'])012InDF22DF22=pd.DataFrame(DF2,columns=['nation','capital','GDP'],index=[2,0,1])201从DataFrame中取出列?两种方法(与JavaScript完全一致'[]'的写法最安全。InInprintprintprint201nation,dtype:object01capital,dtype:201Name:GDP,dtype:从DataFrame中取出行?(至少)两种方法InprintprintDF22[0:1]#给出的实际是printDF22.ix[0]#通过对应Index2Name:0,dtype:像NumPy切片一样的终极招Inprintprintprint Name:2,dtype: Name:nation,dtype:听说你从AlterTable来,大熊猫笑然而动态增加列无法用"."的方式完成,只能用"[InDF22['population']DF22['population']=[1600,130,55]201Index:pandas进行数据的鬼牌(行级索引行级索引元数可以和列名进行交换,也可以进行堆叠和展开,达到Excel表效pd.Index(普通)Int64Index(数值型索引PeriodIndex(含周期的时间格式作为索引)直接定义普通索引,长得就和普通的SeriesInindex_names=['a','b','c']Series_for_Indexindex_names=['a','b','c']Series_for_Index=pd.Series(index_names)printpd.Index(index_names)printIndex([u'a',u'b',u'c'],Index([u'a',u'b',u'c'],可惜Immutable,牢Inindex_names=['a','b','c']index0index_names=['a','b','c']index0=pd.Index(index_names)printindex0.get_values()index0[2]='d'['a''b' Traceback(mostrecentcall<ipython-input-31-f34da0a8623c>in2index0=3print>4index0[2]=/Users/wangweiyang/anaconda/anaconda/lib/python2.7/site-packages/pandas/core/index.pycin titem(self,key,value) setitem(self,key,->1057 raiseTypeError("Indexesdoesnotsupportmutableoperations") getitem(self,TypeError:Indexesdoesnotsupportmutable扔进去一个含有多元组的List,就有了可惜,如果这个ListComprehension改成小括号,就不对了InInmulti1multi1=pd.Index([('Row_'+str(x+1),'Col_'+str(y+1))forxinxrange(4)foryinxrange(4)])=['index1','index2']printMultiIndex(levels=[[u'Row_1',u'Row_2',u'Row_3',u'Row_4'],[u'Col_1',u'Col_2',u'Col_3',u'labels=[[0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3],[0,1,2,3,0,1,2,0,1,2,3,0,1,2,对于Series来说,如果拥有了多重Index,数据下列代码In=Row_10123Row_24567Row_389Row_4dtype:In0123456789InIn0123456789我们来看一下非平衡数据的例子Row_1,2,3,4和Col_1,2,3,4并不是全组Inmulti2multi2=pd.Index([('Row_'+str(x+1),'Col_'+str(y+1))forxinxrange(5)foryinxrange(x)])MultiIndex(levels=[[u'Row_2',u'Row_3',u'Row_4',u'Row_5'],[u'Col_1',u'Col_2',u'Col_3',u'labels=[[0,1,1,2,2,2,3,3,3,3],[0,0,1,0,1,2,0,1,2,In=pd.Series(np.arange(10),index=Row_20Row_312Row_4345Row_56789dtype:In0123456789In0123456789DateTime标准库如此好用,你值得Indatesdates=[datetime.datetime(2015,1,1),datetime.datetime(2015,1,8),datetime.datetime(2015,1,30)]DatetimeIndex(['2015-01-01','2015-01-08','2015-01-30'],dtype='datetime64[ns]',freq=None,t如果你不仅需要时间格式统一,时间频率也要统一的Inperiodindex1periodindex1=pd.period_range('2015-01','2015-printPeriodIndex(['2015-01','2015-02','2015-03','2015-04'],dtype='int64',月级精度和日级精度如何转有的公司统一以1号代表当月,有的公司统一以最后一天代表当月,转化起来很麻烦,可以InprintprintprintPeriodIndex(['2015-01-01','2015-02-01','2015-03-01','2015-04-01'],dtype='int64',PeriodIndex(['2015-01-31','2015-02-28','2015-03-31','2015-04-30'],dtype='int64',最后的最后,我要真正把两种频率的时间精度匹配上Inperiodindex_monperiodindex_mon=pd.period_range('2015-01','2015-03',freq='M').asfreq('D',how='start')periodindex_day=pd.period_range('2015-01-01','2015-03-31',freq='D')printprintPeriodIndex(['2015-01-01','2015-02-01','2015-03-01'],dtype='int64',PeriodIndex(['2015-01-01','2015-01-02','2015-01-03','2015-01-'2015-01-05','2015-01-06','2015-01-07','2015-01-'2015-01-09','2015-01-10','2015-01-11','2015-01-'2015-01-13','2015-01-14','2015-01-15','2015-01-'2015-01-17','2015-01-18','2015-01-19','2015-01-'2015-01-21','2015-01-22','2015-01-23','2015-01-'2015-01-25','2015-01-26','2015-01-27','2015-01-'2015-01-29','2015-01-30','2015-01-31','2015-02-'2015-02-02','2015-02-03','2015-02-04','2015-02-'2015-02-06','2015-02-07','2015-02-08','2015-02-'2015-02-10','2015-02-11','2015-02-12','2015-02-'2015-02-14','2015-02-15','2015-02-16','2015-02-'2015-02-18','2015-02-19','2015-02-20','2015-02-'2015-02-22','2015-02-23','2015-02-24','2015-02-'2015-02-26','2015-02-27','2015-02-28','2015-03-'2015-03-02','2015-03-03','2015-03-04','2015-03-'2015-03-06','2015-03-07','2015-03-08','2015-03-'2015-03-10','2015-03-11','2015-03-12','2015-03-'2015-03-14','2015-03-15','2015-03-16','2015-03-'2015-03-18','2015-03-19','2015-03-20','2015-03-'2015-03-22','2015-03-23','2015-03-24','2015-03-'2015-03-'2015-03-

'2015-03-'2015-03-

'2015-03-28','2015-03-粗粒度数据Infull_tsfull_ts=pd.Series(periodindex_mon,index=periodindex_mon).reindex(periodindex_day,method='ffi2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-01-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-2015-03-Freq:D,dtype:关于索引,方便的操作有前面描述过了,索引有序,重复,但一定程度上又能通过key来,也就是说,某些集合操作都是可以支持的Inindex1index1=index2=index3=pd.Index(['B','C','A'])printindex1.append(index2)printindex1.difference(index2)printindex1.union(index2)#Supportunique-valueIndexprintprintprintindex1.insert(0,'K')#Notprintindex3.drop('A')#Supportunique-valueIndexprintprintIndex([u'A',u'B',u'B',u'C',u'C',u'C',u'D',u'E',u'E',u'F'],dtype='object')Index([u'A',u'B'],dtype='object')Index([u'C',u'C'],Index([u'A',u'B',u'B',u'C',u'C',u'D',u'E',u'E',u'F'],dtype='object')[FalseFalseFalseTrueTrue]Index([u'A',u'B',u'C',u'C'],Index([u'K',u'A',u'B',u'B',u'C',u'C'],dtype='object')Index([u'B',u'C'],dtype='object')TrueTrueFalseFalseFalse大熊猫世界来去自如:Pandas的老生常谈,从基础来看,我们仍然关心pandas对于与外部数据是如何交互read_csv与 是一对输入输出的工具,read_csv直接返回pandas.DataFrame,而to_csv只要执行命令即可写文read_fwf:操作fixedwidthfileread_excel与to_excel方便的与excel还记得刚开始的例子吗names表示要用给定的列名来作为最终的列名encoding表示数据集的字符编码,通常而言一份数据为了方便的进行文件传输都以utf-8作为标准 Inprintprintirisdata=pd.read_csv('S1EP3_Iris.txt',header=None,names=cnames,encoding='utf-8')['sepal_length','sepal_width','petal_length','petal_width','class']0Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150希望ws×5解全部参数的s移步(这(这里介绍一些常用的参数处理skiprows:跳过一定的行 一定的行数skipfooter:尾部固定的行数永不 内容处理sep/delimiter:分隔符很重要,常见的有逗号,空格和na_values:指定应该被当作na_valuesthousands:处值类,每分隔不统一 (1.234.567,89或1,234,567.89,此要把串转为符收尾处理 2.1.xExcel...对于着极为规整数据的Excel而言,其实是没必要一定用Excel来存,尽管Pandas也十分友好的提供了I/O接口Inirisdata.to_excel('S1EP3_irisdata.xls',index=None,encoding='utf-8')irisdata_from_excelirisdata.to_excel('S1EP3_irisdata.xls',index=None,encoding='utf-8')irisdata_from_excel=pd.read_excel('S1EP3_irisdata.xls',header=0,encoding='utf-8')0Iris-1Iris-2Iris-3Iris-4Iris-5Iris-6Iris-7Iris-8Iris-9Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-150rows×5唯一重要的参数:sheetname=k,标志着一个excel的第k个sheet页将会被取出。(从0开始JSON:网络传输中常用的一种数据key可能元数据被保存在数Injson_datajson_data=data_employee=pd.read_json(json.dumps(json_data))data_employee_ri=data_employee.reindex(columns=['name','job','sal','report'])012深入Pandas数在第一部分的基础上,数据会有种方式按记录拼接(就像UnionAll)或者关联(join)缺失值处与Excel一样灵活的数据表(在第四部分更详细介绍横向拼接:直接In012InIn012012012可以使用left_on和In012In012根据index关联,可以直接使用left_index和InIndata_employee_='index1'data_employee_='index1'012TIPS:增加how关键字,并how='inner'how='left'how='right'结合how,可以看到merge基本再现了SQL应有的功能,并保持代码IndataNumPy32dataNumPy32=np.asarray([('Japan','Tokyo',4000),('S.Korea','Seoul',1300),('China','Beijing',9DF32=pd.DataFrame(dataNumPy,columns=['nation','capital','GDP'])012InIndefdefGDP_Factorize(v):fv=np.float64(v)iffv>6000.0:returneliffv<returnreturnDF32['GDP_Level']=DF32['GDP'].map(GDP_Factorize)DF32['NATION']=DF32.nation.map(str.upper)012sort:按一列或者多列的值进行行级排sort_index:根据index里的取值进行排序,而且可以根据axis决定是重还是IndataNumPy33dataNumPy33=np.asarray([('Japan','Tokyo',4000),('S.Korea','Seoul',1300),('China','Beijing',9DF33=pd.DataFrame(dataNumPy,columns=['nation','capital','GDP'])012In102InIn012In210In012一个好用的功In023213212113InIn021211232331注意tieddata(相同值)的处理:method='average'method='min'method='max'method='first'InDF34DF34=data_for_multi2.unstack()0123456789忽略缺失Indtype:如果不想忽略缺失值的话,就需要祭出fillnaInIn In“一组”大熊猫:Pandas的groupby的功能类似SQL的groupby关键字Split,就是按照规Pandas的groupby的灵活InfromfromIPython.displayimportImage分组的具Inirisdata_groupirisdata_group=irisdata.groupby('class')<pandas.core.groupby.DataFrameGroupByobjectatInforforlevel,subsetDFinprintprintIris-sepal_lengthsepal_widthpetal_length 0.2Iris- 0.2Iris- 0.2Iris- 0.2Iris- 0.2Iris- 0.4Iris- 0.3Iris- 0.2Iris- 0.2Iris- 0.1Iris- 0.2Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-Iris-sepal_lengthsepal_widthpetal_length 1.4Iris- 1.5Iris- 1.5Iris- 1.3Iris- 1.5Iris- 1.3Iris- 1.6Iris- 1.0Iris- 1.3Iris- 1.4Iris- 1.0Iris- 1.5Iris- 1.0Iris- 1.4Iris- 1.3Iris- 1.4Iris- 1.5Iris- 1.0Iris- 1.5Iris- 1.1I

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论