版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
PythonForDataScienceCheatSheet
PythonBasics
LearnMorePythonforDataScienceInteractivelyat
VariablesandDataTypes
NumpyArrays
AlsoseeLists
SelectingNumpyArrayElements
Indexstartsat0
NumpyArrayOperations
NumpyArrayFunctions
DataCamp
LearnPythonforDataScienceInteractively
my_2darray[rows,columns]
Selectitemsatindex0and1
Selectitematindex1
Subset
>>>my_array[1]
2
Slice
>>>my_array[0:2]
array([1,2])
Subset2DNumpyarrays
>>>my_2darray[:,0]
array([1,4])
>>>my_array>3
array([False,False,False,True],dtype=bool)
>>>my_array*2
array([2,4,6,8])
>>>my_array+np.array([5,6,7,8])
array([6,8,10,12])
>>>my_list=[1,2,3,4]
>>>my_array=np.array(my_list)
>>>my_2darray=np.array([[1,2,3],[4,5,6]])
VariableAssignment
Lists
Selectitematindex1Select3rdlastitem
Selectitemsatindex1and2Selectitemsafterindex0Selectitemsbeforeindex3Copymy_list
my_list[list][itemOfList]
Subset
>>>my_list[1]
>>>my_list[-3]
Slice
>>>my_list[1:3]
>>>my_list[1:]
>>>my_list[:3]
>>>my_list[:]
SubsetListsofLists
>>>my_list2[1][0]
>>>my_list2[1][:2]
>>>a='is'
>>>b='nice'
>>>my_list=['my','list',a,b]
>>>my_list2=[[4,5,6,7],[3,4,5,6]]
SelectingListElements
AlsoseeNumPyArrays
Libraries
Importlibraries
>>>importnumpy
>>>importnumpyasnp
Selectiveimport
>>>frommathimportpi
InstallPython
Dataanalysis
Machinelearning
Scientificcomputing
2Dplotting
Leadingopendatascienceplatform FreeIDEthatisincluded CreateandsharepoweredbyPython withAnaconda documentswithlivecode,
visualizations,text,...
Indexstartsat0
>>>x=5
>>>x5
CalculationsWithVariables
>>>x+2
Sumoftwovariables
7
>>>x-2
Subtractionoftwovariables
3
>>>x*2
Multiplicationoftwovariables
10
>>>x**2
Exponentiationofavariable
25
>>>x%2
Remainderofavariable
1
>>>x/float(2)
Divisionofavariable
2.5
GettheindexofanitemCountanitem
AppendanitematatimeRemoveanitem
RemoveanitemReversethelistAppendanitemRemoveanitemInsertanitemSortthelist
>>>my_list.index(a)
>>>my_list.count(a)
>>>my_list.append('!')
>>>my_list.remove('!')
>>>del(my_list[0:1])
>>>my_list.reverse()
>>>my_list.extend('!')
>>>my_list.pop(-1)
>>>my_list.insert(0,'!')
>>>my_list.sort()
str()
'5','3.45','True'
Variablestostrings
int()
5,3,1
Variablestointegers
float()
5.0,1.0
Variablestofloats
bool()
True,True,True
Variablestobooleans
TypesandTypeConversion
ListOperations
>>>my_list+my_list
['my','list','is','nice','my','list','is','nice']
>>>my_list*2
['my','list','is','nice','my','list','is','nice']
>>>my_list2>4
True
ListMethods
AskingForHelp
>>>help(str)
>>>my_string='thisStringIsAwesome'
>>>my_string
'thisStringIsAwesome'
Strings
>>>my_string*2
'thisStringIsAwesomethisStringIsAwesome'
>>>my_string+'Innit'
'thisStringIsAwesomeInnit'
>>>'m'inmy_string
True
StringOperations
StringOperations
>>>my_string[3]
>>>my_string[4:9]
StringMethods
Indexstartsat0
>>>my_array.shape
>>>np.append(other_array)
GetthedimensionsofthearrayAppenditemstoanarray
>>>np.insert(my_array,1,5)
Insertitemsinanarray
>>>np.delete(my_array,[1])
Deleteitemsinanarray
>>>np.mean(my_array)
Meanofthearray
>>>np.median(my_array)
Medianofthearray
>>>my_array.corrcoef()
Correlationcoefficient
>>>np.std(my_array)
Standarddeviation
StringtouppercaseStringtolowercaseCountStringelementsReplaceStringelementsStripwhitespaces
>>>my_string.upper()
>>>my_string.lower()
>>>my_string.count('w')
>>>my_string.replace('e','i')
>>>my_string.strip()
PythonForDataScienceCheatSheet
JupyterNotebook
LearnMorePythonforDataScienceInteractivelyat
www.DataC
WorkingwithDifferentProgrammingLanguages
Kernelsprovidecomputationandcommunicationwithfront-endinterfaceslikethenotebooks.Therearethreemainkernels:
IRkernel IJulia
InstallingJupyterNotebookwillautomaticallyinstalltheIPythonkernel.
Widgets
Notebookwidgetsprovidetheabilitytovisualizeandcontrolchangesinyourdata,oftenasacontrollikeaslider,textbox,etc.
YoucanusethemtobuildinteractiveGUIsforyournotebooksortosynchronizestatefulandstatelessinformationbetweenPythonandJavaScript.
Saving/LoadingNotebooks
Createnewnotebook
Makeacopyofthecurrentnotebook
Savecurrentnotebookandrecordcheckpoint
Previewoftheprintednotebook
Closenotebook&stoprunninganyscripts
Openanexistingnotebook
Renamenotebook
Revertnotebooktoapreviouscheckpoint
Downloadnotebookas
IPythonnotebook
Python
HTML
Markdown
reST
Restartkernel
Restartkernel&runallcells
Restartkernel&runallcells
CommandMode:
Interruptkernel
Interruptkernel&clearalloutput
Connectbacktoaremotenotebook
Runotherinstalledkernels
Downloadserializedstateofallwidgetmodelsinuse
Savenotebookwithinteractivewidgets
Embedcurrentwidgets
15
13 14
WritingCodeAndText
LaTeX
1 2 3 4 5 67 8910 11 12
Codeandtextareencapsulatedby3basiccelltypes:markdowncells,codecells,andrawNBConvertcells.
EditCells
EditMode:
Saveandcheckpoint
Insertcellbelow
Interruptkernel
Restartkernel
Cutcurrentlyselectedcellstoclipboard
Pastecellsfromclipboardabovecurrentcell
Pastecellsfrom
Copycellsfromclipboardtocurrentcursorposition
Pastecellsfromclipboardbelowcurrentcell
ExecutingCells
Runselectedcell(s) Runcurrentcellsdownandcreateanewone
below
Cutcell
Copycell(s)
Pastecell(s)below
Movecellup
Movecelldown
Runcurrentcell
AskingForHelp
Displaycharacteristics
Opencommandpalette
Currentkernel
Kernelstatus
Logoutfromnotebookserver
clipboardontopofcurrentcel
Revert“DeleteCells”
invocation
Mergecurrentcellwiththeoneabove
Movecurrentcellup
Adjustmetadataunderlyingthecurrentnotebook
Removecellattachments
Pasteattachmentsofcurrentcell
InsertCells
Addnewcellabovethecurrentone
Deletecurrentcells
Splitupacellfromcurrentcursorposition
Mergecurrentcellwiththeonebelow
Movecurrentcelldown
Findandreplaceinselectedcells
Copyattachmentsofcurrentcell
Insertimageinselectedcells
Addnewcellbelowthecurrentone
Runcurrentcellsdownandcreateanewoneabove
Runallcellsabovethecurrentcell
Changethecelltypeofcurrentcell
toggle,togglescrollingandclearalloutput
ViewCells
ToggledisplayofJupyterlogoandfilename
Togglelinenumbersincells
Runallcells
Runallcellsbelowthecurrentcell
toggle,togglescrollingandclearcurrentoutputs
Toggledisplayoftoolbar
Toggledisplayofcellactionicons:
None
Editmetadata
Rawcellformat
Slideshow
Attachments
Tags
WalkthroughaUItour
Editthebuilt-inkeyboardshortcuts
Descriptionofmarkdownavailableinnotebook
PythonhelptopicsNumPyhelptopicsMatplotlibhelptopics
Pandashelptopics
DataCamp
Listofbuilt-inkeyboardshortcuts
Notebookhelptopics
InformationonunofficialJupyterNotebookextensions
IPythonhelptopicsSciPyhelptopicsSymPyhelptopics
AboutJupyterNotebook
LearnPythonforDataScienceInteractively
NumPy
TheNumPylibraryisthecorelibraryforscientificcomputinginPython.Itprovidesahigh-performancemultidimensionalarrayobject,andtoolsforworkingwiththesearrays.
Usethefollowingimportconvention:
>>>importnumpyasnp
NumPyArrays
2
1Darray
2Darray
axis1
axis0
3Darray
axis2
axis1
axis0
CreatingArrays
InitialPlaceholders
I/O
Saving&LoadingOnDisk
Saving&LoadingTextFiles
DataTypes
CreateanarrayofzerosCreateanarrayofonesCreateanarrayofevenlyspacedvalues(stepvalue)
Createanarrayofevenly
spacedvalues(numberofsamples)
CreateaconstantarrayCreatea2X2identitymatrix
CreateanarraywithrandomvaluesCreateanemptyarray
>>>np.zeros((3,4))
>>>np.ones((2,3,4),dtype=16)
>>>d=np.arange(10,25,5)
>>>np.linspace(0,2,9)
>>>e=np.full((2,2),7)
>>>f=np.eye(2)
>>>np.random.random((2,2))
>>>np.empty((3,2))
>>>np.loadtxt("myfile.txt")
>>>np.genfromtxt("my_file.csv",delimiter=',')
>>>np.savetxt("myarray.txt",a,delimiter="")
>>>np.save('my_array',a)
>>>np.savez('array.npz',a,b)
>>>np.load('my_array.npy')
>>>a=np.array([1,2,3])
>>>b=np.array([(1.5,2,3),(4,5,6)],dtype=float)
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]],
dtype=float)
1 2
3
Subsetting,Slicing,Indexing
AlsoseeLists
Subsetting
>>>a[2]
3
>>>b[1,2]
6.0
Slicing
>>>a[0:2]
array([1,2])
>>>b[0:2,1]
array([2.,5.])
>>>b[:1]
array([[1.5,2.,3.]])
>>>c[1,...]
array([[[3.,2.,1.],
[4.,5.,6.]]])
>>>a[::-1]
array([3,2,1])
BooleanIndexing
>>>a[a<2]
array([1])
12
3
1.52
4
5
3
6
Selecttheelementatthe2ndindex
Selecttheelementatrow0column2(equivalenttob[1][2])
123 Selectitemsatindex0and1
Selectitemsatrows0and1incolumn1
Selectallitemsatrow0(equivalenttob[0:1,:])
Sameas[1,:,:]
Reversedarraya
123 Selectelementsfromalessthan2
FancyIndexing
>>>b[[1,0,1,0],[0,1,2,0]] Selectelements(1,0),(0,1),(1,2)and(0,0)
array([4.,2.,6.,1.5])
>>>b[[1,0,1,0]][:,[0,1,2,0]] Selectasubsetofthematrix’srows
array([[4.,5.,6.,4.], andcolumns
[1.5,2.,3.,1.5],
[4.,5.,6.,4.],
[1.5,2.,3.,1.5]])
ArrayManipulation
CopyingArrays
SortingArrays
Sortanarray
Sorttheelementsofanarray'saxis
>>>a.sort()
>>>c.sort(axis=0)
InspectingYourArray
>>>a.shape
Arraydimensions
>>>len(a)
Lengthofarray
>>>b.ndim
Numberofarraydimensions
>>>e.size
Numberofarrayelements
>>>b.dtype
Datatypeofarrayelements
>>>
Nameofdatatype
>>>b.astype(int)
Convertanarraytoadifferenttype
>>>a.sum()
Array-wisesum
>>>a.min()
Array-wiseminimumvalue
>>>b.max(axis=0)
Maximumvalueofanarrayrow
>>>b.cumsum(axis=1)
Cumulativesumoftheelements
>>>a.mean()
Mean
>>>b.median()
Median
>>>a.corrcoef()
Correlationcoefficient
>>>np.std(b)
Standarddeviation
TransposingArray
>>>i=np.transpose(b)
>>>i.T
PermutearraydimensionsPermutearraydimensions
ChangingArrayShape
>>>b.ravel()
>>>g.reshape(3,-2)
Flattenthearray
Reshape,butdon’tchangedata
Adding/RemovingElements
>>>h.resize((2,6))
Returnanewarraywithshape(2,6)
>>>np.append(h,g)
Appenditemstoanarray
>>>np.insert(a,1,5)
Insertitemsinanarray
>>>np.delete(a,[1])
Deleteitemsfromanarray
CombiningArrays
>>>np.concatenate((a,d),axis=0)
Concatenatearrays
array([1,2,3,10,15,20])
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
array([[1.,2.,3.],
[1.5,2.,3.],
[4.,5.,6.]])
>>>np.r_[e,f]
Stackarraysvertically(row-wise)
>>>np.hstack((e,f))
array([[7.,7.,1.,0.],
Stackarrayshorizontally(column-wise)
[7.,7.,0.,1.]])
>>>np.column_stack((a,d))
Createstackedcolumn-wisearrays
array([[1,10],
[2,15],
[3,20]])
>>>np.c_[a,d]
Createstackedcolumn-wisearrays
SplittingArrays
>>>np.hsplit(a,3)
[array([1]),array([2]),array([3])]
>>>np.vsplit(c,2)
[array([[[1.5,2.,1.],
[4.,5.,6.]]]),
array([[[3.,2.,3.],
[4.,5.,6.]]])]
Splitthearrayhorizontallyatthe3rdindex
Splitthearrayverticallyatthe2ndindex
DataCamp
LearnPythonforDataScienceInteractively
>>>h=a.view()
>>>np.copy(a)
>>>h=a.copy()
CreateaviewofthearraywiththesamedataCreateacopyofthearray
Createadeepcopyofthearray
>>>64
Signed64-bitintegertypes
>>>np.float32
Standarddouble-precisionfloatingpoint
>>>plex
Complexnumbersrepresentedby128floats
>>>np.bool
BooleantypestoringTRUEandFALSEvalues
>>>np.object
Pythonobjecttype
>>>np.string_
Fixed-lengthstringtype
>>>np.unicode_
Fixed-lengthunicodetype
PythonForDataScienceCheatSheet
NumPyBasics
ArrayMathematics
ArithmeticOperations
Comparison
AggregateFunctions
Element-wisecomparison
Element-wisecomparisonArray-wisecomparison
>>>a==b
array([[False,True,True],
[False,False,False]],dtype=bool)
>>>a<2
array([True,False,False],dtype=bool)
>>>np.array_equal(a,b)
Subtraction
SubtractionAddition
AdditionDivision
DivisionMultiplication
MultiplicationExponentiationSquareroot
PrintsinesofanarrayElement-wisecosine
Element-wisenaturallogarithmDotproduct
>>>np.divide(a,b)
>>>a*b
array([[1.5, 4., 9.],
[4.,10.,18.]])
>>>np.multiply(a,b)
>>>np.exp(b)
>>>np.sqrt(b)
>>>np.sin(a)
>>>np.cos(b)
>>>np.log(a)
>>>e.dot(f)
array([[7.,7.],
[7.,7.]])
],
]])
>>>g=a-b
array([[-0.5,0.,0.],
[-3.,-3.,-3.]])
>>>np.subtract(a,b)
>>>b+a
array([[2.5,4.,6.],
[5.,7.,9.]])
>>>np.add(b,a)
>>>a/b
array([[0.66666667,1. ,1.
[0.25 ,0.4 ,0.5
LearnPythonforDataScienceInteractivelyat
www.DataC
AskingForHelp
>>>(np.ndarray.dtype)
1.5
4
2
3
5
6
1.523
4
5
6
1.5
2
3
4
5
6
PythonForDataScienceCheatSheet
SciPy-LinearAlgebra
LearnMorePythonforDataScienceInteractivelyat
LinearAlgebra
You’llusethelinalgandsparsemodules.Notethatscipy.linalgcontainsandexpandsonnumpy.linalg.
>>>fromscipyimportlinalg,sparse
MatrixFunctions
SciPy
TheSciPylibraryisoneofthecorepackagesforscientificcomputingthatprovidesmathematicalalgorithmsandconveniencefunctionsbuiltontheNumPyextensionofPython.
Addition
>>>np.add(A,D)
Subtraction
>>>np.subtract(A,D)
Division
>>>np.divide(A,D)
Multiplication
>>>np.multiply(D,A)
>>>np.dot(A,D)
>>>np.vdot(A,D)
>>>np.inner(A,D)
>>>np.outer(A,D)
>>>np.tensordot(A,D)
>>>np.kron(A,D)
ExponentialFunctions
>>>linalg.expm(A)
>>>linalg.expm2(A)
>>>linalg.expm3(D)
AdditionSubtractionDivision
MultiplicationDotproduct
Vectordotproduct
InnerproductOuterproductTensordotproductKroneckerproduct
Matrixexponential
Matrixexponential(TaylorSeries)
Matrixexponential(eigenvalue
decomposition)
LogarithmFunction
>>>linalg.logm(A)
TrigonometricTunctions
>>>linalg.sinm(D)
>>>linalg.cosm(D)
>>>linalg.tanm(A)
HyperbolicTrigonometricFunctions
>>>linalg.sinhm(D)
>>>linalg.coshm(D)
>>>linalg.tanhm(A)
MatrixSignFunction
>>>np.sigm(A)
MatrixSquareRoot
>>>linalg.sqrtm(A)
ArbitraryFunctions
>>>linalg.funm(A,lambdax:x*x)
Matrixlogarithm
MatrixsineMatrixcosineMatrixtangent
HypberbolicmatrixsineHyperbolicmatrixcosineHyperbolicmatrixtangent
MatrixsignfunctionMatrixsquarerootEvaluatematrixfunction
CreatingMatrices
AlsoseeNumPy
>>>A=np.matrix(np.random.random((2,2)))
>>>B=np.asmatrix(b)
>>>C=np.mat(np.random.random((10,5)))
>>>D=np.mat([[3,4],[5,6]])
InverseInverse
Tranposematrix
ConjugatetranspositionTrace
Frobeniusnorm
L1norm(maxcolumnsum)Linfnorm(maxrowsum)
MatrixrankDeterminant
SolverfordensematricesSolverfordensematrices
Least-squaressolutiontolinearmatrixequation
Computethepseudo-inverseofamatrix(least-squaressolver)
Computethepseudo-inverseofamatrix(SVD)
Inverse
>>>A.I
>>>linalg.inv(A)
>>>A.T
>>>A.H
>>>np.trace(A)
Norm
>>>linalg.norm(A)
>>>linalg.norm(A,1)
>>>linalg.norm(A,np.inf)
Rank
>>>np.linalg.matrix_rank(C)
Determinant
>>>linalg.det(A)
Solvinglinearproblems
>>>linalg.solve(A,b)
>>>E=np.mat(a).T
>>>linalg.lstsq(D,E)
Generalizedinverse
>>>linalg.pinv(C)
>>>linalg.pinv2(C)
>>>np.mgrid[0:5,0:5]
Createadensemeshgrid
>>>np.ogrid[0:2,0:2]
Createanopenmeshgrid
>>>np.r_[[3,[0]*5,-1:1:10j]
Stackarraysvertically(row-wise)
>>>np.c_[b,c]
Createstackedcolumn-wisearrays
BasicMatrixRoutines
InteractingWithNumPy
AlsoseeNumPy
IndexTricks
ShapeManipulation
Polynomials
VectorizingFunctions
TypeHandling
OtherUsefulFunctions
ReturntheangleofthecomplexargumentCreateanarrayofevenlyspacedvalues
(numberofsamples)
Unwrap
Createanarrayofevenlyspacedvalues(logscale)Returnvaluesfromalistofarraysdependingonconditions
Factorial
CombineNthingstakenatktimeWeightsforNp-pointcentralderivative
Findthen-thderivativeofafunctionatapoint
>>>np.angle(b,deg=True)
>>>g=np.linspace(0,np.pi,num=5)
>>>g[3:]+=np.pi
>>>np.unwrap(g)
>>>np.logspace(0,10,3)
>>>np.select([c<4],[c*2])
>>>misc.factorial(a)
>>>b(10,3,exact=True)
>>>misc.central_diff_weights(3)
>>>misc.derivative(myfunc,1.0)
ReturntherealpartofthearrayelementsReturntheimaginarypartofthearrayelementsReturnarealarrayifcomplexpartscloseto0Castobjecttoadatatype
>>>np.real(c)
>>>np.imag(c)
>>>np.real_if_close(c,tol=1000)
>>>np.cast['f'](np.pi)
Vectorizefunctions
>>>defmyfunc(a):
ifa<0:returna*2
else:
returna/2
>>>np.vectorize(myfunc)
Createapolynomialobject
>>>fromnumpyimportpoly1d
>>>p=poly1d([3,4,5])
>>>importnumpyasnp
>>>a=np.array([1,2,3])
>>>b=np.array([(1+5j,2j,3j),(4j,5j,6j)])
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]])
Createa2X2identitymatrixCreatea2x2identitymatrix
CompressedSparseRowmatrixCompressedSparseColumnmatrixDictionaryOfKeysmatrix
Sparsematrixtofullmatrix
Identifysparsematrix
>>>F=np.eye(3,k=1)
>>>G=np.mat(np.identity(2))
>>>C[C>0.5]=0
>>>H=sparse.csr_matrix(C)
>>>I=sparse.csc_matrix(D)
>>>J=sparse.dok_matrix(A)
>>>E.todense()
>>>sparse.isspmatrix_csc(A)
>>>np.transpose(b)
Permutearraydimensions
>>>b.flatten()
Flattenthearray
>>>np.hstack((b,c))
Stackarrayshorizontally(column-wise)
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
>>>np.hsplit(c,2)
Splitthearrayhorizontallyatthe2ndindex
>>>np.vpslit(d,2)
Splitthearrayverticallyatthe2ndindex
CreatingSparseMatrices
Decompositions
InverseNorm
Solverforsparsematrices
Inverse
>>>sparse.linalg.inv(I)
Norm
>>>sparse.linalg.norm(I)
Solvinglinearproblems
>>>sparse.linalg.spsolve(H,I)
SolveordinaryorgeneralizedeigenvalueproblemforsquarematrixUnpackeigenvalues
FirsteigenvectorSecondeigenvectorUnpackeigenvalues
SingularValueDecomposition(SVD)ConstructsigmamatrixinSVD
LUDecomposition
EigenvaluesandEigenvectors
>>>la,v=linalg.eig(A)
>>>l1,l2=la
>>>v[:,0]
>>>v[:,1]
>>>linalg.eigvals(A)
SingularValueDecomposition
>>>U,s,Vh=linalg.svd(B)
>>>M,N=B.shape
>>>Sig=linalg.diagsvd(s,M,N)
LUDecomposition
>>>P,L,U=linalg.lu(C)
SparseMatrixRoutines
SparseMatrixFunctions
SparseMatrixDecompositions
Sparsematrixexponential
>>>sparse.linalg.expm(I)
DataCamp
LearnPythonforDataScienceInteractively
EigenvaluesandeigenvectorsSVD
>>>la,v=sparse.linalg.eigs(F,1)
>>>sparse.linalg.svds(H,2)
AskingForHelp
>>>help(scipy.linalg.diagsvd)
>>>(np.matrix)
PythonForDataScienceCheatSheet
PandasBasics
LearnPythonforDataScienceInteractivelyat
www.DataC
Pandas
ThePandaslibraryisbuiltonNumPyandprovideseasy-to-usedatastructuresanddataanalysistoolsforthePythonprogramminglanguage.
Usethefollowingimportconvention:
>>>importpandasaspd
PandasDataStructures
Series
a
3
b
-5
c
7
d
4
Aone-dimensionallabeledarraycapableofholdinganydatatype
AskingForHelp
>>>help(pd.Series.loc)
Selection
>>>s['b']
-5
>>>df[1:]
Country
India
Brazil
CapitalNewDelhiBrasília
Population1303171035
207847528
Getoneelement
GetsubsetofaDataFrame
Getting
ByPosition
>>>df.iloc([0],[0])
'Belgium'
>>>df.iat([0],[0])
'Belgium'
ByLabel
>>>df.loc([0],['Country'])
'Belgium'
>>>df.at([0],['Country'])
'Belgium'
ByLabel/Position
>>>df.ix[2]
Country Brazil
Capital BrasíliaPopulation207847528
>>>df.ix[:,'Capital']
Brussels
NewDelhi
Brasília
>>>df.ix[1,'Capital']
'NewDelhi'
BooleanIndexing
>>>s[~(s>1)]
>>>s[(s<-1)|(s>2)]
>>>df[df['Population']>1200000000]
Setting
>>>s['a']=6
Selecting,BooleanIndexing&Setting
AlsoseeNumPyArrays
Selectsinglevaluebyrow&column
Dropping
Dropvaluesfromrows(axis=0)Dropvaluesfromcolumns(axis=1)
>>>s.drop(['a','c'])
>>>df.drop('Country',axis=1)
Sort&Rank
SortbylabelsalonganaxisSortbythevaluesalonganaxisAssignrankstoentries
>>>df.sort_index()
>>>df.sort_values(by='Country')
>>>df.rank()
RetrievingSeries/DataFrameInformation
BasicInformation
>>>df.shape
(rows,columns)
>>>df.index
Describeindex
>>>df.columns
DescribeDataFramecolumns
>>>()
InfoonDataFrame
>>>df.count()
Numberofnon-NAvalues
>>>df.sum()
Sumofvalues
>>>df.cumsum()
Cummulativesumofvalues
>>>df.min()/df.max()
Minimum/maximumvalues
>>>df.idxmin()/df.idxmax()
Minimum/Maximumindexvalue
>>>df.describe()
Summarystatistics
>>>df.mean()
Meanofvalues
>>>df.median()
Medianofvalues
Summary
Selectsinglevaluebyrow&columnlabels
Index
>>>s=pd.Series([3,-5,7,4],index=['a','b','c','d'])
DataFrame
Country
Capital
Population
0
Belgium
Brussels
11190846
1
India
NewDelhi
1303171035
2
Brazil
Brasília
207847528
Columns
Applyfunction
Applyfunctionelement-wise
>>>f=lambdax:x*2
>>>df.apply(f)
>>>df.applymap(f)
ApplyingFunctions
Index
Atwo-dimensionallabeleddatastructurewithcolumnsofpotentiallydifferenttypes
Selectsinglerowofsubsetofrows
Selectasinglecolumnofsubsetofcolumns
DataAlignment
InternalDataAlignment
NAvaluesareintroducedintheindicesthatdon’toverlap:
Selectrowsandcolumns
>>>data={'Country':['Belgium','India','Brazil'],
'Capital':['Brussels','NewDelhi','Brasília'],'Population':[11190846,1303171035,207847528]}
>>>df=pd.DataFrame(data,
columns=['Country','Capital','Population'])
I/O
>>>pd.read_csv('file.csv',header=None,nrows=5)
>>>df.to_csv('myDataFrame.csv')
ReadandWritetoCSV
ReadandWritetoExcel
Seriesswherevalueisnot>1
swherevalueis<-1or>2
UsefiltertoadjustDataFrame
SetindexaofSeriessto6
ReadandWritetoSQLQueryorDatabaseTable
>>>fromsqlalchemyimportcreate_engine
>>>engine=create_engine('sqlite:///:memory:')
>>>pd.read_sql("SELECT*FROMmy_table;",engine)
>>>pd.read_sql_table('my_table',engine)
>>>pd.read_sql_query("SELECT*FROMmy_table;",engine)
>>>pd.read_excel('file.xlsx')
>>>pd.to_excel('dir/myDataFrame.xlsx',sheet_name='Sheet1')
Readmultiplesheetsfromthesamefile
>>>xlsx=pd.ExcelFile('file.xls')
>>>df=pd.read_excel(xlsx,'Sheet1')
read_sql()isaconveniencewrapperaroundread_sql_table()and
read_sql_query()
>>>s3=pd.Series([7,-2,3],index=['a','c','d'])
>>>s+s3
a 10.0
b NaN
c 5.0
d 7.0
ArithmeticOperationswithFillMethods
Youcanalsodotheinternaldataalignmentyourselfwiththehelpofthefillmethods:
>>>s.add(s3,fill_value=0)
a
10.0
b
-5.0
c
5.0
d
7.0
>>>s.sub(s3,fill_value=2)
>>>s.div(s3,fill_value=4)
>>>s.mul(s3,fill_value=3)
>>>pd.to_sql('myDf',engine)
DataCamp
LearnPythonforDataScienceInteractively
LoadingTheData
AlsoseeNumPy&Pandas
YourdataneedstobenumericandstoredasNumPyarraysorSciPysparsematrices.Othertypesthatareconvertibletonumericarrays,suchasPandasDataFrame,arealsoacceptable.
>>>importnumpyasnp
>>>X=np.random.random((10,5))
>>>y=np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>>X[X<0.7]=0
CreateYourModel
SupervisedLearningEstimators
UnsupervisedLearningEstimators
PrincipalComponentAnalysis(PCA)
>>>fromsklearn.decompositionimportPCA
>>>pca=PCA(n_components=0.95)
KMeans
>>>fromsklearn.clusterimportKMeans
>>>k_means=KMeans(n_clusters=3,random_state=0)
LinearRegression
>>>fromsklearn.linear_modelimportLinearRegression
>>>lr=LinearRegression(normalize=True)
SupportVectorMachines(SVM)
>>>fromsklearn.svmimportSVC
>>>svc=SVC(kernel='linear')
NaiveBayes
>>>fromsklearn.naive_bayesimportGaussianNB
>>>gnb=GaussianNB()
KNN
>>>fromsklearnimportneighbors
>>>knn=neighbors.KNeighborsClassifier(n_neighbors=5)
ModelFitting
Fitthemodeltothedata
Fittodata,thentransformit
Fitthemodeltothedata
Supervisedlearning
>>>lr.fit(X,y)
>>>knn.fit(X_train,y_train)
>>>svc.fit(X_train,y_train)
UnsupervisedLearning
>>>k_means.fit(X_train)
>>>pca_model=pca.fit_transform(X_train)
TuneYourModel
Prediction
PredictlabelsPredictlabels
Estimateprobabilityofalabel
Predictlabelsinclusteringalgos
SupervisedEstimators
>>>y_pred=svc.predict(np.random.random((2,5)))
>>>y_pred=lr.predict(X_test)
>>>y_pred=knn.predict_proba(X_test)
UnsupervisedEstimators
>>>y_pred=k_means.predict(X_test)
EvaluateYourModel’sPerformance
ClassificationMetrics
RegressionMetrics
ClusteringMetrics
Cross-Validation
>>>fromsklearn.cross_validationimportcross_val_score
>>>print(cross_val_score(knn,X_train,y_train,cv=4))
>>>print(cross_val_score(lr,X,y,cv=2))
AdjustedRandIndex
>>>fromsklearn.metricsimportadjusted_rand_score
>>>adjusted_rand_score(y_true,y_pred)
Homogeneity
>>>fromsklearn.metricsimporthomogeneity_score
>>>homogeneity_score(y_true,y_pred)
V-measure
>>>fromsklearn.metricsimportv_measure_score
>>>metrics.v_measure_score(y_true,y_pred)
MeanAbsoluteError
>>>fromsklearn.metricsimportmean_absolute_error
>>>y_true=[3,-0.5,2]
>>>mean_absolute_error(y_true,y_pred)
MeanSquaredError
>>>fromsklearn.metricsimportmean_squared_error
>>>mean_squared_error(y_test,y_pred)
R²Score
>>>fromsklearn.metricsimportr2_score
>>>r2_score(y_true,y_pred)
Standardization
EncodingCategoricalFeatures
Normalization
ImputingMissingValues
Binarization
GeneratingPolynomialFeatures
PreprocessingTheData
>>>fromsklearn.preprocessingimportPolynomialFeatures
>>>poly=PolynomialFeatures(5)
>>>poly.fit_transform(X)
>>>fromsklearn.preprocessingimportBinarizer
>>>binarizer=Binarizer(threshold=0.0).fit(X)
>>>binary_X=binarizer.transform(X)
>>>fromsklearn.preprocessingimportImputer
>>>imp=Imputer(missing_values=0,strategy='mean',axis=0)
>>>imp.fit_transform(X_train)
>>>fromsklearn.preprocessingimportNormalizer
>>>scaler=Normalizer().fit(X_train)
>>>normalized_X=scaler.transform(X_train)
>>>normalized_X_test=scaler.transform(X_test)
>>>fromsklearn.preprocessingimportLabelEncoder
>>>enc=LabelEncoder()
>>>y=enc.fit_transform(y)
>>>fromsklearn.preprocessingimportStandardScaler
>>>scaler=StandardScaler().fit(X_train)
>>>standardized_X=scaler.transform(X_train)
>>>standardized_X_test=scaler.transform(X_test)
TrainingAndTestData
>>>fromsklearn.model_selectionimporttrain_test_split
>>>X_train,X_test,y_train,y_test=train_test_split(X,
y,random_state=0)
PythonForDataScienceCheatSheet
AccuracyScore
>>>knn.score(X_test,y_test)
Estimatorscoremethod
>>>fromsklearn.metricsimportaccuracy_score
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 4 《地球-我们的家园》(教学实录)部编版道德与法治六年级下册
- 化妆合同范例 简易范例
- 开发项目技术合同范例
- 2025年马鞍山货运上岗证考试题库
- 大学商铺合同范例
- 无锡农村平房买卖合同范例
- 再生钢材采购合同范例
- 农村合伙购房合同范例
- 技术成果合同范例
- 汕头律师合同范例
- 广西桂林市(2024年-2025年小学三年级语文)部编版期末考试(上学期)试卷(含答案)
- 公园建设投标方案(技术标)
- 2024年海南海口市省中考数学试题
- 国开(FJ)形考复习资料电大2024《资产评估》形成性考核二
- 【甲子光年】2024自动驾驶行业报告-“端到端”渐行渐近
- 2024至2030年中国防弹衣行业市场全景分析及投资策略研究报告
- 山西省医疗服务项目价格-(1)全文
- 2024秋国家开放大学“开放本科”行管专业《管理英语3》期末考试真题3试
- 2023-2024学年全国小学二年级上语文人教版期末试卷(含答案解析)
- 简约英文教学课件模板
- 加强企业文化建设提升员工凝聚力三篇
评论
0/150
提交评论