版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
PythonForDataScienceCheatSheet
PythonBasics
LearnMorePythonforDataScienceInteractivelyat
VariablesandDataTypes
NumpyArrays
AlsoseeLists
SelectingNumpyArrayElements
Indexstartsat0
NumpyArrayOperations
NumpyArrayFunctions
DataCamp
LearnPythonforDataScienceInteractively
my_2darray[rows,columns]
Selectitemsatindex0and1
Selectitematindex1
Subset
>>>my_array[1]
2
Slice
>>>my_array[0:2]
array([1,2])
Subset2DNumpyarrays
>>>my_2darray[:,0]
array([1,4])
>>>my_array>3
array([False,False,False,True],dtype=bool)
>>>my_array*2
array([2,4,6,8])
>>>my_array+np.array([5,6,7,8])
array([6,8,10,12])
>>>my_list=[1,2,3,4]
>>>my_array=np.array(my_list)
>>>my_2darray=np.array([[1,2,3],[4,5,6]])
VariableAssignment
Lists
Selectitematindex1Select3rdlastitem
Selectitemsatindex1and2Selectitemsafterindex0Selectitemsbeforeindex3Copymy_list
my_list[list][itemOfList]
Subset
>>>my_list[1]
>>>my_list[-3]
Slice
>>>my_list[1:3]
>>>my_list[1:]
>>>my_list[:3]
>>>my_list[:]
SubsetListsofLists
>>>my_list2[1][0]
>>>my_list2[1][:2]
>>>a='is'
>>>b='nice'
>>>my_list=['my','list',a,b]
>>>my_list2=[[4,5,6,7],[3,4,5,6]]
SelectingListElements
AlsoseeNumPyArrays
Libraries
Importlibraries
>>>importnumpy
>>>importnumpyasnp
Selectiveimport
>>>frommathimportpi
InstallPython
Dataanalysis
Machinelearning
Scientificcomputing
2Dplotting
Leadingopendatascienceplatform FreeIDEthatisincluded CreateandsharepoweredbyPython withAnaconda documentswithlivecode,
visualizations,text,...
Indexstartsat0
>>>x=5
>>>x5
CalculationsWithVariables
>>>x+2
Sumoftwovariables
7
>>>x-2
Subtractionoftwovariables
3
>>>x*2
Multiplicationoftwovariables
10
>>>x**2
Exponentiationofavariable
25
>>>x%2
Remainderofavariable
1
>>>x/float(2)
Divisionofavariable
2.5
GettheindexofanitemCountanitem
AppendanitematatimeRemoveanitem
RemoveanitemReversethelistAppendanitemRemoveanitemInsertanitemSortthelist
>>>my_list.index(a)
>>>my_list.count(a)
>>>my_list.append('!')
>>>my_list.remove('!')
>>>del(my_list[0:1])
>>>my_list.reverse()
>>>my_list.extend('!')
>>>my_list.pop(-1)
>>>my_list.insert(0,'!')
>>>my_list.sort()
str()
'5','3.45','True'
Variablestostrings
int()
5,3,1
Variablestointegers
float()
5.0,1.0
Variablestofloats
bool()
True,True,True
Variablestobooleans
TypesandTypeConversion
ListOperations
>>>my_list+my_list
['my','list','is','nice','my','list','is','nice']
>>>my_list*2
['my','list','is','nice','my','list','is','nice']
>>>my_list2>4
True
ListMethods
AskingForHelp
>>>help(str)
>>>my_string='thisStringIsAwesome'
>>>my_string
'thisStringIsAwesome'
Strings
>>>my_string*2
'thisStringIsAwesomethisStringIsAwesome'
>>>my_string+'Innit'
'thisStringIsAwesomeInnit'
>>>'m'inmy_string
True
StringOperations
StringOperations
>>>my_string[3]
>>>my_string[4:9]
StringMethods
Indexstartsat0
>>>my_array.shape
>>>np.append(other_array)
GetthedimensionsofthearrayAppenditemstoanarray
>>>np.insert(my_array,1,5)
Insertitemsinanarray
>>>np.delete(my_array,[1])
Deleteitemsinanarray
>>>np.mean(my_array)
Meanofthearray
>>>np.median(my_array)
Medianofthearray
>>>my_array.corrcoef()
Correlationcoefficient
>>>np.std(my_array)
Standarddeviation
StringtouppercaseStringtolowercaseCountStringelementsReplaceStringelementsStripwhitespaces
>>>my_string.upper()
>>>my_string.lower()
>>>my_string.count('w')
>>>my_string.replace('e','i')
>>>my_string.strip()
PythonForDataScienceCheatSheet
JupyterNotebook
LearnMorePythonforDataScienceInteractivelyat
www.DataC
WorkingwithDifferentProgrammingLanguages
Kernelsprovidecomputationandcommunicationwithfront-endinterfaceslikethenotebooks.Therearethreemainkernels:
IRkernel IJulia
InstallingJupyterNotebookwillautomaticallyinstalltheIPythonkernel.
Widgets
Notebookwidgetsprovidetheabilitytovisualizeandcontrolchangesinyourdata,oftenasacontrollikeaslider,textbox,etc.
YoucanusethemtobuildinteractiveGUIsforyournotebooksortosynchronizestatefulandstatelessinformationbetweenPythonandJavaScript.
Saving/LoadingNotebooks
Createnewnotebook
Makeacopyofthecurrentnotebook
Savecurrentnotebookandrecordcheckpoint
Previewoftheprintednotebook
Closenotebook&stoprunninganyscripts
Openanexistingnotebook
Renamenotebook
Revertnotebooktoapreviouscheckpoint
Downloadnotebookas
IPythonnotebook
Python
HTML
Markdown
reST
Restartkernel
Restartkernel&runallcells
Restartkernel&runallcells
CommandMode:
Interruptkernel
Interruptkernel&clearalloutput
Connectbacktoaremotenotebook
Runotherinstalledkernels
Downloadserializedstateofallwidgetmodelsinuse
Savenotebookwithinteractivewidgets
Embedcurrentwidgets
15
13 14
WritingCodeAndText
LaTeX
1 2 3 4 5 67 8910 11 12
Codeandtextareencapsulatedby3basiccelltypes:markdowncells,codecells,andrawNBConvertcells.
EditCells
EditMode:
Saveandcheckpoint
Insertcellbelow
Interruptkernel
Restartkernel
Cutcurrentlyselectedcellstoclipboard
Pastecellsfromclipboardabovecurrentcell
Pastecellsfrom
Copycellsfromclipboardtocurrentcursorposition
Pastecellsfromclipboardbelowcurrentcell
ExecutingCells
Runselectedcell(s) Runcurrentcellsdownandcreateanewone
below
Cutcell
Copycell(s)
Pastecell(s)below
Movecellup
Movecelldown
Runcurrentcell
AskingForHelp
Displaycharacteristics
Opencommandpalette
Currentkernel
Kernelstatus
Logoutfromnotebookserver
clipboardontopofcurrentcel
Revert“DeleteCells”
invocation
Mergecurrentcellwiththeoneabove
Movecurrentcellup
Adjustmetadataunderlyingthecurrentnotebook
Removecellattachments
Pasteattachmentsofcurrentcell
InsertCells
Addnewcellabovethecurrentone
Deletecurrentcells
Splitupacellfromcurrentcursorposition
Mergecurrentcellwiththeonebelow
Movecurrentcelldown
Findandreplaceinselectedcells
Copyattachmentsofcurrentcell
Insertimageinselectedcells
Addnewcellbelowthecurrentone
Runcurrentcellsdownandcreateanewoneabove
Runallcellsabovethecurrentcell
Changethecelltypeofcurrentcell
toggle,togglescrollingandclearalloutput
ViewCells
ToggledisplayofJupyterlogoandfilename
Togglelinenumbersincells
Runallcells
Runallcellsbelowthecurrentcell
toggle,togglescrollingandclearcurrentoutputs
Toggledisplayoftoolbar
Toggledisplayofcellactionicons:
None
Editmetadata
Rawcellformat
Slideshow
Attachments
Tags
WalkthroughaUItour
Editthebuilt-inkeyboardshortcuts
Descriptionofmarkdownavailableinnotebook
PythonhelptopicsNumPyhelptopicsMatplotlibhelptopics
Pandashelptopics
DataCamp
Listofbuilt-inkeyboardshortcuts
Notebookhelptopics
InformationonunofficialJupyterNotebookextensions
IPythonhelptopicsSciPyhelptopicsSymPyhelptopics
AboutJupyterNotebook
LearnPythonforDataScienceInteractively
NumPy
TheNumPylibraryisthecorelibraryforscientificcomputinginPython.Itprovidesahigh-performancemultidimensionalarrayobject,andtoolsforworkingwiththesearrays.
Usethefollowingimportconvention:
>>>importnumpyasnp
NumPyArrays
2
1Darray
2Darray
axis1
axis0
3Darray
axis2
axis1
axis0
CreatingArrays
InitialPlaceholders
I/O
Saving&LoadingOnDisk
Saving&LoadingTextFiles
DataTypes
CreateanarrayofzerosCreateanarrayofonesCreateanarrayofevenlyspacedvalues(stepvalue)
Createanarrayofevenly
spacedvalues(numberofsamples)
CreateaconstantarrayCreatea2X2identitymatrix
CreateanarraywithrandomvaluesCreateanemptyarray
>>>np.zeros((3,4))
>>>np.ones((2,3,4),dtype=16)
>>>d=np.arange(10,25,5)
>>>np.linspace(0,2,9)
>>>e=np.full((2,2),7)
>>>f=np.eye(2)
>>>np.random.random((2,2))
>>>np.empty((3,2))
>>>np.loadtxt("myfile.txt")
>>>np.genfromtxt("my_file.csv",delimiter=',')
>>>np.savetxt("myarray.txt",a,delimiter="")
>>>np.save('my_array',a)
>>>np.savez('array.npz',a,b)
>>>np.load('my_array.npy')
>>>a=np.array([1,2,3])
>>>b=np.array([(1.5,2,3),(4,5,6)],dtype=float)
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]],
dtype=float)
1 2
3
Subsetting,Slicing,Indexing
AlsoseeLists
Subsetting
>>>a[2]
3
>>>b[1,2]
6.0
Slicing
>>>a[0:2]
array([1,2])
>>>b[0:2,1]
array([2.,5.])
>>>b[:1]
array([[1.5,2.,3.]])
>>>c[1,...]
array([[[3.,2.,1.],
[4.,5.,6.]]])
>>>a[::-1]
array([3,2,1])
BooleanIndexing
>>>a[a<2]
array([1])
12
3
1.52
4
5
3
6
Selecttheelementatthe2ndindex
Selecttheelementatrow0column2(equivalenttob[1][2])
123 Selectitemsatindex0and1
Selectitemsatrows0and1incolumn1
Selectallitemsatrow0(equivalenttob[0:1,:])
Sameas[1,:,:]
Reversedarraya
123 Selectelementsfromalessthan2
FancyIndexing
>>>b[[1,0,1,0],[0,1,2,0]] Selectelements(1,0),(0,1),(1,2)and(0,0)
array([4.,2.,6.,1.5])
>>>b[[1,0,1,0]][:,[0,1,2,0]] Selectasubsetofthematrix’srows
array([[4.,5.,6.,4.], andcolumns
[1.5,2.,3.,1.5],
[4.,5.,6.,4.],
[1.5,2.,3.,1.5]])
ArrayManipulation
CopyingArrays
SortingArrays
Sortanarray
Sorttheelementsofanarray'saxis
>>>a.sort()
>>>c.sort(axis=0)
InspectingYourArray
>>>a.shape
Arraydimensions
>>>len(a)
Lengthofarray
>>>b.ndim
Numberofarraydimensions
>>>e.size
Numberofarrayelements
>>>b.dtype
Datatypeofarrayelements
>>>
Nameofdatatype
>>>b.astype(int)
Convertanarraytoadifferenttype
>>>a.sum()
Array-wisesum
>>>a.min()
Array-wiseminimumvalue
>>>b.max(axis=0)
Maximumvalueofanarrayrow
>>>b.cumsum(axis=1)
Cumulativesumoftheelements
>>>a.mean()
Mean
>>>b.median()
Median
>>>a.corrcoef()
Correlationcoefficient
>>>np.std(b)
Standarddeviation
TransposingArray
>>>i=np.transpose(b)
>>>i.T
PermutearraydimensionsPermutearraydimensions
ChangingArrayShape
>>>b.ravel()
>>>g.reshape(3,-2)
Flattenthearray
Reshape,butdon’tchangedata
Adding/RemovingElements
>>>h.resize((2,6))
Returnanewarraywithshape(2,6)
>>>np.append(h,g)
Appenditemstoanarray
>>>np.insert(a,1,5)
Insertitemsinanarray
>>>np.delete(a,[1])
Deleteitemsfromanarray
CombiningArrays
>>>np.concatenate((a,d),axis=0)
Concatenatearrays
array([1,2,3,10,15,20])
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
array([[1.,2.,3.],
[1.5,2.,3.],
[4.,5.,6.]])
>>>np.r_[e,f]
Stackarraysvertically(row-wise)
>>>np.hstack((e,f))
array([[7.,7.,1.,0.],
Stackarrayshorizontally(column-wise)
[7.,7.,0.,1.]])
>>>np.column_stack((a,d))
Createstackedcolumn-wisearrays
array([[1,10],
[2,15],
[3,20]])
>>>np.c_[a,d]
Createstackedcolumn-wisearrays
SplittingArrays
>>>np.hsplit(a,3)
[array([1]),array([2]),array([3])]
>>>np.vsplit(c,2)
[array([[[1.5,2.,1.],
[4.,5.,6.]]]),
array([[[3.,2.,3.],
[4.,5.,6.]]])]
Splitthearrayhorizontallyatthe3rdindex
Splitthearrayverticallyatthe2ndindex
DataCamp
LearnPythonforDataScienceInteractively
>>>h=a.view()
>>>np.copy(a)
>>>h=a.copy()
CreateaviewofthearraywiththesamedataCreateacopyofthearray
Createadeepcopyofthearray
>>>64
Signed64-bitintegertypes
>>>np.float32
Standarddouble-precisionfloatingpoint
>>>plex
Complexnumbersrepresentedby128floats
>>>np.bool
BooleantypestoringTRUEandFALSEvalues
>>>np.object
Pythonobjecttype
>>>np.string_
Fixed-lengthstringtype
>>>np.unicode_
Fixed-lengthunicodetype
PythonForDataScienceCheatSheet
NumPyBasics
ArrayMathematics
ArithmeticOperations
Comparison
AggregateFunctions
Element-wisecomparison
Element-wisecomparisonArray-wisecomparison
>>>a==b
array([[False,True,True],
[False,False,False]],dtype=bool)
>>>a<2
array([True,False,False],dtype=bool)
>>>np.array_equal(a,b)
Subtraction
SubtractionAddition
AdditionDivision
DivisionMultiplication
MultiplicationExponentiationSquareroot
PrintsinesofanarrayElement-wisecosine
Element-wisenaturallogarithmDotproduct
>>>np.divide(a,b)
>>>a*b
array([[1.5, 4., 9.],
[4.,10.,18.]])
>>>np.multiply(a,b)
>>>np.exp(b)
>>>np.sqrt(b)
>>>np.sin(a)
>>>np.cos(b)
>>>np.log(a)
>>>e.dot(f)
array([[7.,7.],
[7.,7.]])
],
]])
>>>g=a-b
array([[-0.5,0.,0.],
[-3.,-3.,-3.]])
>>>np.subtract(a,b)
>>>b+a
array([[2.5,4.,6.],
[5.,7.,9.]])
>>>np.add(b,a)
>>>a/b
array([[0.66666667,1. ,1.
[0.25 ,0.4 ,0.5
LearnPythonforDataScienceInteractivelyat
www.DataC
AskingForHelp
>>>(np.ndarray.dtype)
1.5
4
2
3
5
6
1.523
4
5
6
1.5
2
3
4
5
6
PythonForDataScienceCheatSheet
SciPy-LinearAlgebra
LearnMorePythonforDataScienceInteractivelyat
LinearAlgebra
You’llusethelinalgandsparsemodules.Notethatscipy.linalgcontainsandexpandsonnumpy.linalg.
>>>fromscipyimportlinalg,sparse
MatrixFunctions
SciPy
TheSciPylibraryisoneofthecorepackagesforscientificcomputingthatprovidesmathematicalalgorithmsandconveniencefunctionsbuiltontheNumPyextensionofPython.
Addition
>>>np.add(A,D)
Subtraction
>>>np.subtract(A,D)
Division
>>>np.divide(A,D)
Multiplication
>>>np.multiply(D,A)
>>>np.dot(A,D)
>>>np.vdot(A,D)
>>>np.inner(A,D)
>>>np.outer(A,D)
>>>np.tensordot(A,D)
>>>np.kron(A,D)
ExponentialFunctions
>>>linalg.expm(A)
>>>linalg.expm2(A)
>>>linalg.expm3(D)
AdditionSubtractionDivision
MultiplicationDotproduct
Vectordotproduct
InnerproductOuterproductTensordotproductKroneckerproduct
Matrixexponential
Matrixexponential(TaylorSeries)
Matrixexponential(eigenvalue
decomposition)
LogarithmFunction
>>>linalg.logm(A)
TrigonometricTunctions
>>>linalg.sinm(D)
>>>linalg.cosm(D)
>>>linalg.tanm(A)
HyperbolicTrigonometricFunctions
>>>linalg.sinhm(D)
>>>linalg.coshm(D)
>>>linalg.tanhm(A)
MatrixSignFunction
>>>np.sigm(A)
MatrixSquareRoot
>>>linalg.sqrtm(A)
ArbitraryFunctions
>>>linalg.funm(A,lambdax:x*x)
Matrixlogarithm
MatrixsineMatrixcosineMatrixtangent
HypberbolicmatrixsineHyperbolicmatrixcosineHyperbolicmatrixtangent
MatrixsignfunctionMatrixsquarerootEvaluatematrixfunction
CreatingMatrices
AlsoseeNumPy
>>>A=np.matrix(np.random.random((2,2)))
>>>B=np.asmatrix(b)
>>>C=np.mat(np.random.random((10,5)))
>>>D=np.mat([[3,4],[5,6]])
InverseInverse
Tranposematrix
ConjugatetranspositionTrace
Frobeniusnorm
L1norm(maxcolumnsum)Linfnorm(maxrowsum)
MatrixrankDeterminant
SolverfordensematricesSolverfordensematrices
Least-squaressolutiontolinearmatrixequation
Computethepseudo-inverseofamatrix(least-squaressolver)
Computethepseudo-inverseofamatrix(SVD)
Inverse
>>>A.I
>>>linalg.inv(A)
>>>A.T
>>>A.H
>>>np.trace(A)
Norm
>>>linalg.norm(A)
>>>linalg.norm(A,1)
>>>linalg.norm(A,np.inf)
Rank
>>>np.linalg.matrix_rank(C)
Determinant
>>>linalg.det(A)
Solvinglinearproblems
>>>linalg.solve(A,b)
>>>E=np.mat(a).T
>>>linalg.lstsq(D,E)
Generalizedinverse
>>>linalg.pinv(C)
>>>linalg.pinv2(C)
>>>np.mgrid[0:5,0:5]
Createadensemeshgrid
>>>np.ogrid[0:2,0:2]
Createanopenmeshgrid
>>>np.r_[[3,[0]*5,-1:1:10j]
Stackarraysvertically(row-wise)
>>>np.c_[b,c]
Createstackedcolumn-wisearrays
BasicMatrixRoutines
InteractingWithNumPy
AlsoseeNumPy
IndexTricks
ShapeManipulation
Polynomials
VectorizingFunctions
TypeHandling
OtherUsefulFunctions
ReturntheangleofthecomplexargumentCreateanarrayofevenlyspacedvalues
(numberofsamples)
Unwrap
Createanarrayofevenlyspacedvalues(logscale)Returnvaluesfromalistofarraysdependingonconditions
Factorial
CombineNthingstakenatktimeWeightsforNp-pointcentralderivative
Findthen-thderivativeofafunctionatapoint
>>>np.angle(b,deg=True)
>>>g=np.linspace(0,np.pi,num=5)
>>>g[3:]+=np.pi
>>>np.unwrap(g)
>>>np.logspace(0,10,3)
>>>np.select([c<4],[c*2])
>>>misc.factorial(a)
>>>b(10,3,exact=True)
>>>misc.central_diff_weights(3)
>>>misc.derivative(myfunc,1.0)
ReturntherealpartofthearrayelementsReturntheimaginarypartofthearrayelementsReturnarealarrayifcomplexpartscloseto0Castobjecttoadatatype
>>>np.real(c)
>>>np.imag(c)
>>>np.real_if_close(c,tol=1000)
>>>np.cast['f'](np.pi)
Vectorizefunctions
>>>defmyfunc(a):
ifa<0:returna*2
else:
returna/2
>>>np.vectorize(myfunc)
Createapolynomialobject
>>>fromnumpyimportpoly1d
>>>p=poly1d([3,4,5])
>>>importnumpyasnp
>>>a=np.array([1,2,3])
>>>b=np.array([(1+5j,2j,3j),(4j,5j,6j)])
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]])
Createa2X2identitymatrixCreatea2x2identitymatrix
CompressedSparseRowmatrixCompressedSparseColumnmatrixDictionaryOfKeysmatrix
Sparsematrixtofullmatrix
Identifysparsematrix
>>>F=np.eye(3,k=1)
>>>G=np.mat(np.identity(2))
>>>C[C>0.5]=0
>>>H=sparse.csr_matrix(C)
>>>I=sparse.csc_matrix(D)
>>>J=sparse.dok_matrix(A)
>>>E.todense()
>>>sparse.isspmatrix_csc(A)
>>>np.transpose(b)
Permutearraydimensions
>>>b.flatten()
Flattenthearray
>>>np.hstack((b,c))
Stackarrayshorizontally(column-wise)
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
>>>np.hsplit(c,2)
Splitthearrayhorizontallyatthe2ndindex
>>>np.vpslit(d,2)
Splitthearrayverticallyatthe2ndindex
CreatingSparseMatrices
Decompositions
InverseNorm
Solverforsparsematrices
Inverse
>>>sparse.linalg.inv(I)
Norm
>>>sparse.linalg.norm(I)
Solvinglinearproblems
>>>sparse.linalg.spsolve(H,I)
SolveordinaryorgeneralizedeigenvalueproblemforsquarematrixUnpackeigenvalues
FirsteigenvectorSecondeigenvectorUnpackeigenvalues
SingularValueDecomposition(SVD)ConstructsigmamatrixinSVD
LUDecomposition
EigenvaluesandEigenvectors
>>>la,v=linalg.eig(A)
>>>l1,l2=la
>>>v[:,0]
>>>v[:,1]
>>>linalg.eigvals(A)
SingularValueDecomposition
>>>U,s,Vh=linalg.svd(B)
>>>M,N=B.shape
>>>Sig=linalg.diagsvd(s,M,N)
LUDecomposition
>>>P,L,U=linalg.lu(C)
SparseMatrixRoutines
SparseMatrixFunctions
SparseMatrixDecompositions
Sparsematrixexponential
>>>sparse.linalg.expm(I)
DataCamp
LearnPythonforDataScienceInteractively
EigenvaluesandeigenvectorsSVD
>>>la,v=sparse.linalg.eigs(F,1)
>>>sparse.linalg.svds(H,2)
AskingForHelp
>>>help(scipy.linalg.diagsvd)
>>>(np.matrix)
PythonForDataScienceCheatSheet
PandasBasics
LearnPythonforDataScienceInteractivelyat
www.DataC
Pandas
ThePandaslibraryisbuiltonNumPyandprovideseasy-to-usedatastructuresanddataanalysistoolsforthePythonprogramminglanguage.
Usethefollowingimportconvention:
>>>importpandasaspd
PandasDataStructures
Series
a
3
b
-5
c
7
d
4
Aone-dimensionallabeledarraycapableofholdinganydatatype
AskingForHelp
>>>help(pd.Series.loc)
Selection
>>>s['b']
-5
>>>df[1:]
Country
India
Brazil
CapitalNewDelhiBrasília
Population1303171035
207847528
Getoneelement
GetsubsetofaDataFrame
Getting
ByPosition
>>>df.iloc([0],[0])
'Belgium'
>>>df.iat([0],[0])
'Belgium'
ByLabel
>>>df.loc([0],['Country'])
'Belgium'
>>>df.at([0],['Country'])
'Belgium'
ByLabel/Position
>>>df.ix[2]
Country Brazil
Capital BrasíliaPopulation207847528
>>>df.ix[:,'Capital']
Brussels
NewDelhi
Brasília
>>>df.ix[1,'Capital']
'NewDelhi'
BooleanIndexing
>>>s[~(s>1)]
>>>s[(s<-1)|(s>2)]
>>>df[df['Population']>1200000000]
Setting
>>>s['a']=6
Selecting,BooleanIndexing&Setting
AlsoseeNumPyArrays
Selectsinglevaluebyrow&column
Dropping
Dropvaluesfromrows(axis=0)Dropvaluesfromcolumns(axis=1)
>>>s.drop(['a','c'])
>>>df.drop('Country',axis=1)
Sort&Rank
SortbylabelsalonganaxisSortbythevaluesalonganaxisAssignrankstoentries
>>>df.sort_index()
>>>df.sort_values(by='Country')
>>>df.rank()
RetrievingSeries/DataFrameInformation
BasicInformation
>>>df.shape
(rows,columns)
>>>df.index
Describeindex
>>>df.columns
DescribeDataFramecolumns
>>>()
InfoonDataFrame
>>>df.count()
Numberofnon-NAvalues
>>>df.sum()
Sumofvalues
>>>df.cumsum()
Cummulativesumofvalues
>>>df.min()/df.max()
Minimum/maximumvalues
>>>df.idxmin()/df.idxmax()
Minimum/Maximumindexvalue
>>>df.describe()
Summarystatistics
>>>df.mean()
Meanofvalues
>>>df.median()
Medianofvalues
Summary
Selectsinglevaluebyrow&columnlabels
Index
>>>s=pd.Series([3,-5,7,4],index=['a','b','c','d'])
DataFrame
Country
Capital
Population
0
Belgium
Brussels
11190846
1
India
NewDelhi
1303171035
2
Brazil
Brasília
207847528
Columns
Applyfunction
Applyfunctionelement-wise
>>>f=lambdax:x*2
>>>df.apply(f)
>>>df.applymap(f)
ApplyingFunctions
Index
Atwo-dimensionallabeleddatastructurewithcolumnsofpotentiallydifferenttypes
Selectsinglerowofsubsetofrows
Selectasinglecolumnofsubsetofcolumns
DataAlignment
InternalDataAlignment
NAvaluesareintroducedintheindicesthatdon’toverlap:
Selectrowsandcolumns
>>>data={'Country':['Belgium','India','Brazil'],
'Capital':['Brussels','NewDelhi','Brasília'],'Population':[11190846,1303171035,207847528]}
>>>df=pd.DataFrame(data,
columns=['Country','Capital','Population'])
I/O
>>>pd.read_csv('file.csv',header=None,nrows=5)
>>>df.to_csv('myDataFrame.csv')
ReadandWritetoCSV
ReadandWritetoExcel
Seriesswherevalueisnot>1
swherevalueis<-1or>2
UsefiltertoadjustDataFrame
SetindexaofSeriessto6
ReadandWritetoSQLQueryorDatabaseTable
>>>fromsqlalchemyimportcreate_engine
>>>engine=create_engine('sqlite:///:memory:')
>>>pd.read_sql("SELECT*FROMmy_table;",engine)
>>>pd.read_sql_table('my_table',engine)
>>>pd.read_sql_query("SELECT*FROMmy_table;",engine)
>>>pd.read_excel('file.xlsx')
>>>pd.to_excel('dir/myDataFrame.xlsx',sheet_name='Sheet1')
Readmultiplesheetsfromthesamefile
>>>xlsx=pd.ExcelFile('file.xls')
>>>df=pd.read_excel(xlsx,'Sheet1')
read_sql()isaconveniencewrapperaroundread_sql_table()and
read_sql_query()
>>>s3=pd.Series([7,-2,3],index=['a','c','d'])
>>>s+s3
a 10.0
b NaN
c 5.0
d 7.0
ArithmeticOperationswithFillMethods
Youcanalsodotheinternaldataalignmentyourselfwiththehelpofthefillmethods:
>>>s.add(s3,fill_value=0)
a
10.0
b
-5.0
c
5.0
d
7.0
>>>s.sub(s3,fill_value=2)
>>>s.div(s3,fill_value=4)
>>>s.mul(s3,fill_value=3)
>>>pd.to_sql('myDf',engine)
DataCamp
LearnPythonforDataScienceInteractively
LoadingTheData
AlsoseeNumPy&Pandas
YourdataneedstobenumericandstoredasNumPyarraysorSciPysparsematrices.Othertypesthatareconvertibletonumericarrays,suchasPandasDataFrame,arealsoacceptable.
>>>importnumpyasnp
>>>X=np.random.random((10,5))
>>>y=np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>>X[X<0.7]=0
CreateYourModel
SupervisedLearningEstimators
UnsupervisedLearningEstimators
PrincipalComponentAnalysis(PCA)
>>>fromsklearn.decompositionimportPCA
>>>pca=PCA(n_components=0.95)
KMeans
>>>fromsklearn.clusterimportKMeans
>>>k_means=KMeans(n_clusters=3,random_state=0)
LinearRegression
>>>fromsklearn.linear_modelimportLinearRegression
>>>lr=LinearRegression(normalize=True)
SupportVectorMachines(SVM)
>>>fromsklearn.svmimportSVC
>>>svc=SVC(kernel='linear')
NaiveBayes
>>>fromsklearn.naive_bayesimportGaussianNB
>>>gnb=GaussianNB()
KNN
>>>fromsklearnimportneighbors
>>>knn=neighbors.KNeighborsClassifier(n_neighbors=5)
ModelFitting
Fitthemodeltothedata
Fittodata,thentransformit
Fitthemodeltothedata
Supervisedlearning
>>>lr.fit(X,y)
>>>knn.fit(X_train,y_train)
>>>svc.fit(X_train,y_train)
UnsupervisedLearning
>>>k_means.fit(X_train)
>>>pca_model=pca.fit_transform(X_train)
TuneYourModel
Prediction
PredictlabelsPredictlabels
Estimateprobabilityofalabel
Predictlabelsinclusteringalgos
SupervisedEstimators
>>>y_pred=svc.predict(np.random.random((2,5)))
>>>y_pred=lr.predict(X_test)
>>>y_pred=knn.predict_proba(X_test)
UnsupervisedEstimators
>>>y_pred=k_means.predict(X_test)
EvaluateYourModel’sPerformance
ClassificationMetrics
RegressionMetrics
ClusteringMetrics
Cross-Validation
>>>fromsklearn.cross_validationimportcross_val_score
>>>print(cross_val_score(knn,X_train,y_train,cv=4))
>>>print(cross_val_score(lr,X,y,cv=2))
AdjustedRandIndex
>>>fromsklearn.metricsimportadjusted_rand_score
>>>adjusted_rand_score(y_true,y_pred)
Homogeneity
>>>fromsklearn.metricsimporthomogeneity_score
>>>homogeneity_score(y_true,y_pred)
V-measure
>>>fromsklearn.metricsimportv_measure_score
>>>metrics.v_measure_score(y_true,y_pred)
MeanAbsoluteError
>>>fromsklearn.metricsimportmean_absolute_error
>>>y_true=[3,-0.5,2]
>>>mean_absolute_error(y_true,y_pred)
MeanSquaredError
>>>fromsklearn.metricsimportmean_squared_error
>>>mean_squared_error(y_test,y_pred)
R²Score
>>>fromsklearn.metricsimportr2_score
>>>r2_score(y_true,y_pred)
Standardization
EncodingCategoricalFeatures
Normalization
ImputingMissingValues
Binarization
GeneratingPolynomialFeatures
PreprocessingTheData
>>>fromsklearn.preprocessingimportPolynomialFeatures
>>>poly=PolynomialFeatures(5)
>>>poly.fit_transform(X)
>>>fromsklearn.preprocessingimportBinarizer
>>>binarizer=Binarizer(threshold=0.0).fit(X)
>>>binary_X=binarizer.transform(X)
>>>fromsklearn.preprocessingimportImputer
>>>imp=Imputer(missing_values=0,strategy='mean',axis=0)
>>>imp.fit_transform(X_train)
>>>fromsklearn.preprocessingimportNormalizer
>>>scaler=Normalizer().fit(X_train)
>>>normalized_X=scaler.transform(X_train)
>>>normalized_X_test=scaler.transform(X_test)
>>>fromsklearn.preprocessingimportLabelEncoder
>>>enc=LabelEncoder()
>>>y=enc.fit_transform(y)
>>>fromsklearn.preprocessingimportStandardScaler
>>>scaler=StandardScaler().fit(X_train)
>>>standardized_X=scaler.transform(X_train)
>>>standardized_X_test=scaler.transform(X_test)
TrainingAndTestData
>>>fromsklearn.model_selectionimporttrain_test_split
>>>X_train,X_test,y_train,y_test=train_test_split(X,
y,random_state=0)
PythonForDataScienceCheatSheet
AccuracyScore
>>>knn.score(X_test,y_test)
Estimatorscoremethod
>>>fromsklearn.metricsimportaccuracy_score
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024年门市转让合同
- 2024年公司内部的车辆管理制度范本
- 买方土地转让居间合同范本2024年
- 2024年二手变压器购销合同
- 2024年收银员合同
- 2024年项目开发权转让合同书
- 2024年合同样本 集体劳动合同样本
- 2024年合作意向协议范本
- 2024年金融公司劳务合同范本
- 不锈钢保温水箱销售合同2024年
- 智慧黑板采购投标方案(技术方案)
- 大一新生个人生涯发展报告
- 2024年度2024行政复议法培训
- 外出作业安全教育的课件
- 《支部五化建设》课件2
- 学堂课程在线美好生活“油”此而来:油科学概论(暨大)课后作业答案
- 2024年公司与公司之间借款协议(多场合应用)
- 毕业论文写作指导A课件
- 讲课学前儿童注意的发展课件
- 规范网络行为课件
- 光伏星业务技能考试附有答案
评论
0/150
提交评论