已阅读5页,还剩19页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Python下Pandas的14个最佳特色功能14 Best Python Pandas FeaturesPandas is the most widely used tool for data munging. It contains high-level data structures and manipulation tools designed to make data analysis fast and easy.In this post, I am going to discuss the most frequently used pandas features. I will be using olive oil data set for this tutorial, you can download the data set from thispage(scroll down todatasection). Apart from serving as a quick reference, I hope this post will help you to quickly start extracting value from Pandas. So lets get started!1) Loading Data“The Olive Oils data set has eight explanatory variables (levels of fatty acids in the oils) and nine classes(areas of Italy)”. For more information you can check myIpython notebook.I am importing numpy,pandasandmatplotlibmodules.1234%matplotlib inlineimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdI am using pd.read_csv to load olive oil data set. Function head returns the first n rows of olive.csv. Here I am returning the first 5 rows.2) Rename FunctionI am going to rename the first column (Unnamed: 0) to area_Idili.Rename functionas an argument it takes a dictionary of column names that should be renamed as keys(olive_oil.columns0) and the new title(area_Idili) to be the value. Olive_oil.columns will return the column names.inplace = Trueis used in case you want to modify the existing DataFrame.3) MapOne thing that I want to do is to clean the area_Idli column and remove the numbers. I am usingmapobject to perform this operation.Mapproperty applies changes to every element of a column. I am applying split function to columnarea_idili.Split function returns a list, and -1 returns the last element of the list. A detailed explanation of lambda is givenhere.See how split function works:4) Apply and Apply MapI have a list of acids called acidlist. Apply is a pretty flexible function, it applies a function along any axis of the DataFrame. I will be usingapplyfunction to divide each value of the acid by 100.list_of_acids =palmitic, palmitoleic, stearic, oleic, linoleic, linolenic, arachidic, eicosenoic12df = olive_oillist_of_acids.apply (lambda x: x/100.00)df.head (5)Similar toapply,apply mapfunction works element-wise on a DataFrame.Summing up,applyworks on a row/column basis of a DataFrame,applymapworks element-wise on a DataFrame, andmapworks element-wise on a Series.5)Shape and ColumnsShapeproperty will return a tuple of the shape of the data frame.olive_oil.columns will give you the column values.6) Unique functionOlive_oil.region.unique()will return unique entries in region column, there are three unique regions (1,2,3). I am applying the sameuniqueproperty toareacolumn, there are 9 unique areas.7) Cross TabCross Tab computes the simplecross tabulationof two factors. Here I am applying cross tabulation to area and region columns.8)Accessing Sub data framesThe syntax for indexing multiple columns is given below.To index a single column you can useolive_oilpalmiticorolive_oil.palmitic.9) Plottingplt.hist(olive_oil.palmitic). You can plot histogram usingplt.histfunction.You can also generate subplots of pandas data frame. Here I am generating 4 different subplots for palmitic and linolenic columns. You can set the size of the figure usingfigsizeobject, nrows and ncols are nothing but the number of columns and rows.10) Groupby and StatisticsGroupbygroups the data into 3 parts(region 1, 2 and 3). The functiongroupbygives dictionary like object. Here I am grouping by regions olive_oil.groupby(region).I am applyingdescribeon the group, describe takes any data frame and compute statistics on it. This is the quick way of getting statistics by group of any data frame.You can also calculate standard deviation of theregion_groupby usingolive_oil.groupby(region).std()11) Aggregate functionAggregate function takes a function as an argument and applies the function to columns in thegroupbysub dataframe. I am applying np.mean(computes mean) on all three regions.12) JoinI am renaming ol mean and olstd columns.In 34: list_of_acids =palmitic, palmitoleic, stearic, oleic, linoleic, linolenic, arachidic, eicosenoicPandas can do general merges. When we do that along an index, its called ajoin. Here I make two sub-data frames and join them on the common region index.13) MaskingYou can also mask a particular part of the data frame.olive_oil.eicosenoic 0.05 will check if each value in column eicosenoic is less than 0.05, if the value is less than 0.05 then it will return true, else it will return false.In 29: eico=(olive_oil.eicosenoic 0.05)14) Handling Missing ValuesMissing data is common in most data analysis applications. I find drop na and fill na function very useful while handlingmissing data.I am creating a new data frame.Thedropnacan used to drop rows or columns with missing data (None). By default, it drops all rows with any missing entry.fillnacan be used to fill missing data (None). First, I am creating a data frame with a single column.I am usingfillnareplaces the missing values with the mean of DataFrame(data).ConclusionThese are some of the important functions I use frequently while cleaning data. I highly recommend Wes MicknneysPython for Data Analysisbookfor learning pandas. Is there any other
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 广东电信规划设计院2026届校招备考题库完整答案详解
- 广东省旅游控股集团有限公司2026年校园招聘备考题库及完整答案详解一套
- 广东省鹤山市2025年下半年公开招聘医疗卫生专业技术人员备考题库及参考答案详解
- 广元市示范性综合实践基地管理中心2025年面向社会公开招聘临聘教师备考题库及一套参考答案详解
- 广州南洋理工职业学院2026年春季学期教职工招聘备考题库带答案详解
- 广州市卫生健康委员会直属事业单位广州医科大学附属市八医院2025年第一次公开招聘备考题库及答案详解1套
- 广州市天河区昌乐幼儿园2026年1月公开招聘编外聘用制专任教师备考题库及答案详解一套
- 广州市天河区美好居幼儿园2026年1月编外聘用制专任教师招聘备考题库附答案详解
- 广州市规划和自然资源局花都区分局及下属事业单位2025年公开招聘护林员备考题库及1套完整答案详解
- 广西中烟工业有限责任公司2026年招聘51人备考题库有答案详解
- 《创新创业基础》课件-项目1:创新创业基础认知
- 2026北京市通州区事业单位公开招聘工作人员189人笔试重点基础提升(共500题)附带答案详解
- 2025~2026学年山东省菏泽市牡丹区第二十一初级中学八年级上学期期中历史试卷
- 2026国家统计局仪征调查队招聘辅助调查员1人(江苏)考试参考试题及答案解析
- 2025至2030中国细胞存储行业调研及市场前景预测评估报告
- 《中华人民共和国危险化学品安全法》解读
- 水暖施工员考试及答案
- 2025年省级行业企业职业技能竞赛(老人能力评估师)历年参考题库含答案
- 水利工程施工质量检测方案
- 2025年北京高中合格考政治(第一次)试题和答案
- 卵巢类癌诊治中国专家共识(2025年版)
评论
0/150
提交评论