




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
PythonForDataScienceCheatSheet
PythonBasics
LearnMorePythonforDataScienceInteractivelyat
VariablesandDataTypes
NumpyArrays
AlsoseeLists
SelectingNumpyArrayElements
Indexstartsat0
NumpyArrayOperations
NumpyArrayFunctions
DataCamp
LearnPythonforDataScienceInteractively
my_2darray[rows,columns]
Selectitemsatindex0and1
Selectitematindex1
Subset
>>>my_array[1]
2
Slice
>>>my_array[0:2]
array([1,2])
Subset2DNumpyarrays
>>>my_2darray[:,0]
array([1,4])
>>>my_array>3
array([False,False,False,True],dtype=bool)
>>>my_array*2
array([2,4,6,8])
>>>my_array+np.array([5,6,7,8])
array([6,8,10,12])
>>>my_list=[1,2,3,4]
>>>my_array=np.array(my_list)
>>>my_2darray=np.array([[1,2,3],[4,5,6]])
VariableAssignment
Lists
Selectitematindex1Select3rdlastitem
Selectitemsatindex1and2Selectitemsafterindex0Selectitemsbeforeindex3Copymy_list
my_list[list][itemOfList]
Subset
>>>my_list[1]
>>>my_list[-3]
Slice
>>>my_list[1:3]
>>>my_list[1:]
>>>my_list[:3]
>>>my_list[:]
SubsetListsofLists
>>>my_list2[1][0]
>>>my_list2[1][:2]
>>>a='is'
>>>b='nice'
>>>my_list=['my','list',a,b]
>>>my_list2=[[4,5,6,7],[3,4,5,6]]
SelectingListElements
AlsoseeNumPyArrays
Libraries
Importlibraries
>>>importnumpy
>>>importnumpyasnp
Selectiveimport
>>>frommathimportpi
InstallPython
Dataanalysis
Machinelearning
Scientificcomputing
2Dplotting
Leadingopendatascienceplatform FreeIDEthatisincluded CreateandsharepoweredbyPython withAnaconda documentswithlivecode,
visualizations,text,...
Indexstartsat0
>>>x=5
>>>x5
CalculationsWithVariables
>>>x+2
Sumoftwovariables
7
>>>x-2
Subtractionoftwovariables
3
>>>x*2
Multiplicationoftwovariables
10
>>>x**2
Exponentiationofavariable
25
>>>x%2
Remainderofavariable
1
>>>x/float(2)
Divisionofavariable
2.5
GettheindexofanitemCountanitem
AppendanitematatimeRemoveanitem
RemoveanitemReversethelistAppendanitemRemoveanitemInsertanitemSortthelist
>>>my_list.index(a)
>>>my_list.count(a)
>>>my_list.append('!')
>>>my_list.remove('!')
>>>del(my_list[0:1])
>>>my_list.reverse()
>>>my_list.extend('!')
>>>my_list.pop(-1)
>>>my_list.insert(0,'!')
>>>my_list.sort()
str()
'5','3.45','True'
Variablestostrings
int()
5,3,1
Variablestointegers
float()
5.0,1.0
Variablestofloats
bool()
True,True,True
Variablestobooleans
TypesandTypeConversion
ListOperations
>>>my_list+my_list
['my','list','is','nice','my','list','is','nice']
>>>my_list*2
['my','list','is','nice','my','list','is','nice']
>>>my_list2>4
True
ListMethods
AskingForHelp
>>>help(str)
>>>my_string='thisStringIsAwesome'
>>>my_string
'thisStringIsAwesome'
Strings
>>>my_string*2
'thisStringIsAwesomethisStringIsAwesome'
>>>my_string+'Innit'
'thisStringIsAwesomeInnit'
>>>'m'inmy_string
True
StringOperations
StringOperations
>>>my_string[3]
>>>my_string[4:9]
StringMethods
Indexstartsat0
>>>my_array.shape
>>>np.append(other_array)
GetthedimensionsofthearrayAppenditemstoanarray
>>>np.insert(my_array,1,5)
Insertitemsinanarray
>>>np.delete(my_array,[1])
Deleteitemsinanarray
>>>np.mean(my_array)
Meanofthearray
>>>np.median(my_array)
Medianofthearray
>>>my_array.corrcoef()
Correlationcoefficient
>>>np.std(my_array)
Standarddeviation
StringtouppercaseStringtolowercaseCountStringelementsReplaceStringelementsStripwhitespaces
>>>my_string.upper()
>>>my_string.lower()
>>>my_string.count('w')
>>>my_string.replace('e','i')
>>>my_string.strip()
PythonForDataScienceCheatSheet
JupyterNotebook
LearnMorePythonforDataScienceInteractivelyat
www.DataC
WorkingwithDifferentProgrammingLanguages
Kernelsprovidecomputationandcommunicationwithfront-endinterfaceslikethenotebooks.Therearethreemainkernels:
IRkernel IJulia
InstallingJupyterNotebookwillautomaticallyinstalltheIPythonkernel.
Widgets
Notebookwidgetsprovidetheabilitytovisualizeandcontrolchangesinyourdata,oftenasacontrollikeaslider,textbox,etc.
YoucanusethemtobuildinteractiveGUIsforyournotebooksortosynchronizestatefulandstatelessinformationbetweenPythonandJavaScript.
Saving/LoadingNotebooks
Createnewnotebook
Makeacopyofthecurrentnotebook
Savecurrentnotebookandrecordcheckpoint
Previewoftheprintednotebook
Closenotebook&stoprunninganyscripts
Openanexistingnotebook
Renamenotebook
Revertnotebooktoapreviouscheckpoint
Downloadnotebookas
IPythonnotebook
Python
HTML
Markdown
reST
Restartkernel
Restartkernel&runallcells
Restartkernel&runallcells
CommandMode:
Interruptkernel
Interruptkernel&clearalloutput
Connectbacktoaremotenotebook
Runotherinstalledkernels
Downloadserializedstateofallwidgetmodelsinuse
Savenotebookwithinteractivewidgets
Embedcurrentwidgets
15
13 14
WritingCodeAndText
LaTeX
1 2 3 4 5 67 8910 11 12
Codeandtextareencapsulatedby3basiccelltypes:markdowncells,codecells,andrawNBConvertcells.
EditCells
EditMode:
Saveandcheckpoint
Insertcellbelow
Interruptkernel
Restartkernel
Cutcurrentlyselectedcellstoclipboard
Pastecellsfromclipboardabovecurrentcell
Pastecellsfrom
Copycellsfromclipboardtocurrentcursorposition
Pastecellsfromclipboardbelowcurrentcell
ExecutingCells
Runselectedcell(s) Runcurrentcellsdownandcreateanewone
below
Cutcell
Copycell(s)
Pastecell(s)below
Movecellup
Movecelldown
Runcurrentcell
AskingForHelp
Displaycharacteristics
Opencommandpalette
Currentkernel
Kernelstatus
Logoutfromnotebookserver
clipboardontopofcurrentcel
Revert“DeleteCells”
invocation
Mergecurrentcellwiththeoneabove
Movecurrentcellup
Adjustmetadataunderlyingthecurrentnotebook
Removecellattachments
Pasteattachmentsofcurrentcell
InsertCells
Addnewcellabovethecurrentone
Deletecurrentcells
Splitupacellfromcurrentcursorposition
Mergecurrentcellwiththeonebelow
Movecurrentcelldown
Findandreplaceinselectedcells
Copyattachmentsofcurrentcell
Insertimageinselectedcells
Addnewcellbelowthecurrentone
Runcurrentcellsdownandcreateanewoneabove
Runallcellsabovethecurrentcell
Changethecelltypeofcurrentcell
toggle,togglescrollingandclearalloutput
ViewCells
ToggledisplayofJupyterlogoandfilename
Togglelinenumbersincells
Runallcells
Runallcellsbelowthecurrentcell
toggle,togglescrollingandclearcurrentoutputs
Toggledisplayoftoolbar
Toggledisplayofcellactionicons:
None
Editmetadata
Rawcellformat
Slideshow
Attachments
Tags
WalkthroughaUItour
Editthebuilt-inkeyboardshortcuts
Descriptionofmarkdownavailableinnotebook
PythonhelptopicsNumPyhelptopicsMatplotlibhelptopics
Pandashelptopics
DataCamp
Listofbuilt-inkeyboardshortcuts
Notebookhelptopics
InformationonunofficialJupyterNotebookextensions
IPythonhelptopicsSciPyhelptopicsSymPyhelptopics
AboutJupyterNotebook
LearnPythonforDataScienceInteractively
NumPy
TheNumPylibraryisthecorelibraryforscientificcomputinginPython.Itprovidesahigh-performancemultidimensionalarrayobject,andtoolsforworkingwiththesearrays.
Usethefollowingimportconvention:
>>>importnumpyasnp
NumPyArrays
2
1Darray
2Darray
axis1
axis0
3Darray
axis2
axis1
axis0
CreatingArrays
InitialPlaceholders
I/O
Saving&LoadingOnDisk
Saving&LoadingTextFiles
DataTypes
CreateanarrayofzerosCreateanarrayofonesCreateanarrayofevenlyspacedvalues(stepvalue)
Createanarrayofevenly
spacedvalues(numberofsamples)
CreateaconstantarrayCreatea2X2identitymatrix
CreateanarraywithrandomvaluesCreateanemptyarray
>>>np.zeros((3,4))
>>>np.ones((2,3,4),dtype=16)
>>>d=np.arange(10,25,5)
>>>np.linspace(0,2,9)
>>>e=np.full((2,2),7)
>>>f=np.eye(2)
>>>np.random.random((2,2))
>>>np.empty((3,2))
>>>np.loadtxt("myfile.txt")
>>>np.genfromtxt("my_file.csv",delimiter=',')
>>>np.savetxt("myarray.txt",a,delimiter="")
>>>np.save('my_array',a)
>>>np.savez('array.npz',a,b)
>>>np.load('my_array.npy')
>>>a=np.array([1,2,3])
>>>b=np.array([(1.5,2,3),(4,5,6)],dtype=float)
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]],
dtype=float)
1 2
3
Subsetting,Slicing,Indexing
AlsoseeLists
Subsetting
>>>a[2]
3
>>>b[1,2]
6.0
Slicing
>>>a[0:2]
array([1,2])
>>>b[0:2,1]
array([2.,5.])
>>>b[:1]
array([[1.5,2.,3.]])
>>>c[1,...]
array([[[3.,2.,1.],
[4.,5.,6.]]])
>>>a[::-1]
array([3,2,1])
BooleanIndexing
>>>a[a<2]
array([1])
12
3
1.52
4
5
3
6
Selecttheelementatthe2ndindex
Selecttheelementatrow0column2(equivalenttob[1][2])
123 Selectitemsatindex0and1
Selectitemsatrows0and1incolumn1
Selectallitemsatrow0(equivalenttob[0:1,:])
Sameas[1,:,:]
Reversedarraya
123 Selectelementsfromalessthan2
FancyIndexing
>>>b[[1,0,1,0],[0,1,2,0]] Selectelements(1,0),(0,1),(1,2)and(0,0)
array([4.,2.,6.,1.5])
>>>b[[1,0,1,0]][:,[0,1,2,0]] Selectasubsetofthematrix’srows
array([[4.,5.,6.,4.], andcolumns
[1.5,2.,3.,1.5],
[4.,5.,6.,4.],
[1.5,2.,3.,1.5]])
ArrayManipulation
CopyingArrays
SortingArrays
Sortanarray
Sorttheelementsofanarray'saxis
>>>a.sort()
>>>c.sort(axis=0)
InspectingYourArray
>>>a.shape
Arraydimensions
>>>len(a)
Lengthofarray
>>>b.ndim
Numberofarraydimensions
>>>e.size
Numberofarrayelements
>>>b.dtype
Datatypeofarrayelements
>>>
Nameofdatatype
>>>b.astype(int)
Convertanarraytoadifferenttype
>>>a.sum()
Array-wisesum
>>>a.min()
Array-wiseminimumvalue
>>>b.max(axis=0)
Maximumvalueofanarrayrow
>>>b.cumsum(axis=1)
Cumulativesumoftheelements
>>>a.mean()
Mean
>>>b.median()
Median
>>>a.corrcoef()
Correlationcoefficient
>>>np.std(b)
Standarddeviation
TransposingArray
>>>i=np.transpose(b)
>>>i.T
PermutearraydimensionsPermutearraydimensions
ChangingArrayShape
>>>b.ravel()
>>>g.reshape(3,-2)
Flattenthearray
Reshape,butdon’tchangedata
Adding/RemovingElements
>>>h.resize((2,6))
Returnanewarraywithshape(2,6)
>>>np.append(h,g)
Appenditemstoanarray
>>>np.insert(a,1,5)
Insertitemsinanarray
>>>np.delete(a,[1])
Deleteitemsfromanarray
CombiningArrays
>>>np.concatenate((a,d),axis=0)
Concatenatearrays
array([1,2,3,10,15,20])
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
array([[1.,2.,3.],
[1.5,2.,3.],
[4.,5.,6.]])
>>>np.r_[e,f]
Stackarraysvertically(row-wise)
>>>np.hstack((e,f))
array([[7.,7.,1.,0.],
Stackarrayshorizontally(column-wise)
[7.,7.,0.,1.]])
>>>np.column_stack((a,d))
Createstackedcolumn-wisearrays
array([[1,10],
[2,15],
[3,20]])
>>>np.c_[a,d]
Createstackedcolumn-wisearrays
SplittingArrays
>>>np.hsplit(a,3)
[array([1]),array([2]),array([3])]
>>>np.vsplit(c,2)
[array([[[1.5,2.,1.],
[4.,5.,6.]]]),
array([[[3.,2.,3.],
[4.,5.,6.]]])]
Splitthearrayhorizontallyatthe3rdindex
Splitthearrayverticallyatthe2ndindex
DataCamp
LearnPythonforDataScienceInteractively
>>>h=a.view()
>>>np.copy(a)
>>>h=a.copy()
CreateaviewofthearraywiththesamedataCreateacopyofthearray
Createadeepcopyofthearray
>>>64
Signed64-bitintegertypes
>>>np.float32
Standarddouble-precisionfloatingpoint
>>>plex
Complexnumbersrepresentedby128floats
>>>np.bool
BooleantypestoringTRUEandFALSEvalues
>>>np.object
Pythonobjecttype
>>>np.string_
Fixed-lengthstringtype
>>>np.unicode_
Fixed-lengthunicodetype
PythonForDataScienceCheatSheet
NumPyBasics
ArrayMathematics
ArithmeticOperations
Comparison
AggregateFunctions
Element-wisecomparison
Element-wisecomparisonArray-wisecomparison
>>>a==b
array([[False,True,True],
[False,False,False]],dtype=bool)
>>>a<2
array([True,False,False],dtype=bool)
>>>np.array_equal(a,b)
Subtraction
SubtractionAddition
AdditionDivision
DivisionMultiplication
MultiplicationExponentiationSquareroot
PrintsinesofanarrayElement-wisecosine
Element-wisenaturallogarithmDotproduct
>>>np.divide(a,b)
>>>a*b
array([[1.5, 4., 9.],
[4.,10.,18.]])
>>>np.multiply(a,b)
>>>np.exp(b)
>>>np.sqrt(b)
>>>np.sin(a)
>>>np.cos(b)
>>>np.log(a)
>>>e.dot(f)
array([[7.,7.],
[7.,7.]])
],
]])
>>>g=a-b
array([[-0.5,0.,0.],
[-3.,-3.,-3.]])
>>>np.subtract(a,b)
>>>b+a
array([[2.5,4.,6.],
[5.,7.,9.]])
>>>np.add(b,a)
>>>a/b
array([[0.66666667,1. ,1.
[0.25 ,0.4 ,0.5
LearnPythonforDataScienceInteractivelyat
www.DataC
AskingForHelp
>>>(np.ndarray.dtype)
1.5
4
2
3
5
6
1.523
4
5
6
1.5
2
3
4
5
6
PythonForDataScienceCheatSheet
SciPy-LinearAlgebra
LearnMorePythonforDataScienceInteractivelyat
LinearAlgebra
You’llusethelinalgandsparsemodules.Notethatscipy.linalgcontainsandexpandsonnumpy.linalg.
>>>fromscipyimportlinalg,sparse
MatrixFunctions
SciPy
TheSciPylibraryisoneofthecorepackagesforscientificcomputingthatprovidesmathematicalalgorithmsandconveniencefunctionsbuiltontheNumPyextensionofPython.
Addition
>>>np.add(A,D)
Subtraction
>>>np.subtract(A,D)
Division
>>>np.divide(A,D)
Multiplication
>>>np.multiply(D,A)
>>>np.dot(A,D)
>>>np.vdot(A,D)
>>>np.inner(A,D)
>>>np.outer(A,D)
>>>np.tensordot(A,D)
>>>np.kron(A,D)
ExponentialFunctions
>>>linalg.expm(A)
>>>linalg.expm2(A)
>>>linalg.expm3(D)
AdditionSubtractionDivision
MultiplicationDotproduct
Vectordotproduct
InnerproductOuterproductTensordotproductKroneckerproduct
Matrixexponential
Matrixexponential(TaylorSeries)
Matrixexponential(eigenvalue
decomposition)
LogarithmFunction
>>>linalg.logm(A)
TrigonometricTunctions
>>>linalg.sinm(D)
>>>linalg.cosm(D)
>>>linalg.tanm(A)
HyperbolicTrigonometricFunctions
>>>linalg.sinhm(D)
>>>linalg.coshm(D)
>>>linalg.tanhm(A)
MatrixSignFunction
>>>np.sigm(A)
MatrixSquareRoot
>>>linalg.sqrtm(A)
ArbitraryFunctions
>>>linalg.funm(A,lambdax:x*x)
Matrixlogarithm
MatrixsineMatrixcosineMatrixtangent
HypberbolicmatrixsineHyperbolicmatrixcosineHyperbolicmatrixtangent
MatrixsignfunctionMatrixsquarerootEvaluatematrixfunction
CreatingMatrices
AlsoseeNumPy
>>>A=np.matrix(np.random.random((2,2)))
>>>B=np.asmatrix(b)
>>>C=np.mat(np.random.random((10,5)))
>>>D=np.mat([[3,4],[5,6]])
InverseInverse
Tranposematrix
ConjugatetranspositionTrace
Frobeniusnorm
L1norm(maxcolumnsum)Linfnorm(maxrowsum)
MatrixrankDeterminant
SolverfordensematricesSolverfordensematrices
Least-squaressolutiontolinearmatrixequation
Computethepseudo-inverseofamatrix(least-squaressolver)
Computethepseudo-inverseofamatrix(SVD)
Inverse
>>>A.I
>>>linalg.inv(A)
>>>A.T
>>>A.H
>>>np.trace(A)
Norm
>>>linalg.norm(A)
>>>linalg.norm(A,1)
>>>linalg.norm(A,np.inf)
Rank
>>>np.linalg.matrix_rank(C)
Determinant
>>>linalg.det(A)
Solvinglinearproblems
>>>linalg.solve(A,b)
>>>E=np.mat(a).T
>>>linalg.lstsq(D,E)
Generalizedinverse
>>>linalg.pinv(C)
>>>linalg.pinv2(C)
>>>np.mgrid[0:5,0:5]
Createadensemeshgrid
>>>np.ogrid[0:2,0:2]
Createanopenmeshgrid
>>>np.r_[[3,[0]*5,-1:1:10j]
Stackarraysvertically(row-wise)
>>>np.c_[b,c]
Createstackedcolumn-wisearrays
BasicMatrixRoutines
InteractingWithNumPy
AlsoseeNumPy
IndexTricks
ShapeManipulation
Polynomials
VectorizingFunctions
TypeHandling
OtherUsefulFunctions
ReturntheangleofthecomplexargumentCreateanarrayofevenlyspacedvalues
(numberofsamples)
Unwrap
Createanarrayofevenlyspacedvalues(logscale)Returnvaluesfromalistofarraysdependingonconditions
Factorial
CombineNthingstakenatktimeWeightsforNp-pointcentralderivative
Findthen-thderivativeofafunctionatapoint
>>>np.angle(b,deg=True)
>>>g=np.linspace(0,np.pi,num=5)
>>>g[3:]+=np.pi
>>>np.unwrap(g)
>>>np.logspace(0,10,3)
>>>np.select([c<4],[c*2])
>>>misc.factorial(a)
>>>b(10,3,exact=True)
>>>misc.central_diff_weights(3)
>>>misc.derivative(myfunc,1.0)
ReturntherealpartofthearrayelementsReturntheimaginarypartofthearrayelementsReturnarealarrayifcomplexpartscloseto0Castobjecttoadatatype
>>>np.real(c)
>>>np.imag(c)
>>>np.real_if_close(c,tol=1000)
>>>np.cast['f'](np.pi)
Vectorizefunctions
>>>defmyfunc(a):
ifa<0:returna*2
else:
returna/2
>>>np.vectorize(myfunc)
Createapolynomialobject
>>>fromnumpyimportpoly1d
>>>p=poly1d([3,4,5])
>>>importnumpyasnp
>>>a=np.array([1,2,3])
>>>b=np.array([(1+5j,2j,3j),(4j,5j,6j)])
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]])
Createa2X2identitymatrixCreatea2x2identitymatrix
CompressedSparseRowmatrixCompressedSparseColumnmatrixDictionaryOfKeysmatrix
Sparsematrixtofullmatrix
Identifysparsematrix
>>>F=np.eye(3,k=1)
>>>G=np.mat(np.identity(2))
>>>C[C>0.5]=0
>>>H=sparse.csr_matrix(C)
>>>I=sparse.csc_matrix(D)
>>>J=sparse.dok_matrix(A)
>>>E.todense()
>>>sparse.isspmatrix_csc(A)
>>>np.transpose(b)
Permutearraydimensions
>>>b.flatten()
Flattenthearray
>>>np.hstack((b,c))
Stackarrayshorizontally(column-wise)
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
>>>np.hsplit(c,2)
Splitthearrayhorizontallyatthe2ndindex
>>>np.vpslit(d,2)
Splitthearrayverticallyatthe2ndindex
CreatingSparseMatrices
Decompositions
InverseNorm
Solverforsparsematrices
Inverse
>>>sparse.linalg.inv(I)
Norm
>>>sparse.linalg.norm(I)
Solvinglinearproblems
>>>sparse.linalg.spsolve(H,I)
SolveordinaryorgeneralizedeigenvalueproblemforsquarematrixUnpackeigenvalues
FirsteigenvectorSecondeigenvectorUnpackeigenvalues
SingularValueDecomposition(SVD)ConstructsigmamatrixinSVD
LUDecomposition
EigenvaluesandEigenvectors
>>>la,v=linalg.eig(A)
>>>l1,l2=la
>>>v[:,0]
>>>v[:,1]
>>>linalg.eigvals(A)
SingularValueDecomposition
>>>U,s,Vh=linalg.svd(B)
>>>M,N=B.shape
>>>Sig=linalg.diagsvd(s,M,N)
LUDecomposition
>>>P,L,U=linalg.lu(C)
SparseMatrixRoutines
SparseMatrixFunctions
SparseMatrixDecompositions
Sparsematrixexponential
>>>sparse.linalg.expm(I)
DataCamp
LearnPythonforDataScienceInteractively
EigenvaluesandeigenvectorsSVD
>>>la,v=sparse.linalg.eigs(F,1)
>>>sparse.linalg.svds(H,2)
AskingForHelp
>>>help(scipy.linalg.diagsvd)
>>>(np.matrix)
PythonForDataScienceCheatSheet
PandasBasics
LearnPythonforDataScienceInteractivelyat
www.DataC
Pandas
ThePandaslibraryisbuiltonNumPyandprovideseasy-to-usedatastructuresanddataanalysistoolsforthePythonprogramminglanguage.
Usethefollowingimportconvention:
>>>importpandasaspd
PandasDataStructures
Series
a
3
b
-5
c
7
d
4
Aone-dimensionallabeledarraycapableofholdinganydatatype
AskingForHelp
>>>help(pd.Series.loc)
Selection
>>>s['b']
-5
>>>df[1:]
Country
India
Brazil
CapitalNewDelhiBrasília
Population1303171035
207847528
Getoneelement
GetsubsetofaDataFrame
Getting
ByPosition
>>>df.iloc([0],[0])
'Belgium'
>>>df.iat([0],[0])
'Belgium'
ByLabel
>>>df.loc([0],['Country'])
'Belgium'
>>>df.at([0],['Country'])
'Belgium'
ByLabel/Position
>>>df.ix[2]
Country Brazil
Capital BrasíliaPopulation207847528
>>>df.ix[:,'Capital']
Brussels
NewDelhi
Brasília
>>>df.ix[1,'Capital']
'NewDelhi'
BooleanIndexing
>>>s[~(s>1)]
>>>s[(s<-1)|(s>2)]
>>>df[df['Population']>1200000000]
Setting
>>>s['a']=6
Selecting,BooleanIndexing&Setting
AlsoseeNumPyArrays
Selectsinglevaluebyrow&column
Dropping
Dropvaluesfromrows(axis=0)Dropvaluesfromcolumns(axis=1)
>>>s.drop(['a','c'])
>>>df.drop('Country',axis=1)
Sort&Rank
SortbylabelsalonganaxisSortbythevaluesalonganaxisAssignrankstoentries
>>>df.sort_index()
>>>df.sort_values(by='Country')
>>>df.rank()
RetrievingSeries/DataFrameInformation
BasicInformation
>>>df.shape
(rows,columns)
>>>df.index
Describeindex
>>>df.columns
DescribeDataFramecolumns
>>>()
InfoonDataFrame
>>>df.count()
Numberofnon-NAvalues
>>>df.sum()
Sumofvalues
>>>df.cumsum()
Cummulativesumofvalues
>>>df.min()/df.max()
Minimum/maximumvalues
>>>df.idxmin()/df.idxmax()
Minimum/Maximumindexvalue
>>>df.describe()
Summarystatistics
>>>df.mean()
Meanofvalues
>>>df.median()
Medianofvalues
Summary
Selectsinglevaluebyrow&columnlabels
Index
>>>s=pd.Series([3,-5,7,4],index=['a','b','c','d'])
DataFrame
Country
Capital
Population
0
Belgium
Brussels
11190846
1
India
NewDelhi
1303171035
2
Brazil
Brasília
207847528
Columns
Applyfunction
Applyfunctionelement-wise
>>>f=lambdax:x*2
>>>df.apply(f)
>>>df.applymap(f)
ApplyingFunctions
Index
Atwo-dimensionallabeleddatastructurewithcolumnsofpotentiallydifferenttypes
Selectsinglerowofsubsetofrows
Selectasinglecolumnofsubsetofcolumns
DataAlignment
InternalDataAlignment
NAvaluesareintroducedintheindicesthatdon’toverlap:
Selectrowsandcolumns
>>>data={'Country':['Belgium','India','Brazil'],
'Capital':['Brussels','NewDelhi','Brasília'],'Population':[11190846,1303171035,207847528]}
>>>df=pd.DataFrame(data,
columns=['Country','Capital','Population'])
I/O
>>>pd.read_csv('file.csv',header=None,nrows=5)
>>>df.to_csv('myDataFrame.csv')
ReadandWritetoCSV
ReadandWritetoExcel
Seriesswherevalueisnot>1
swherevalueis<-1or>2
UsefiltertoadjustDataFrame
SetindexaofSeriessto6
ReadandWritetoSQLQueryorDatabaseTable
>>>fromsqlalchemyimportcreate_engine
>>>engine=create_engine('sqlite:///:memory:')
>>>pd.read_sql("SELECT*FROMmy_table;",engine)
>>>pd.read_sql_table('my_table',engine)
>>>pd.read_sql_query("SELECT*FROMmy_table;",engine)
>>>pd.read_excel('file.xlsx')
>>>pd.to_excel('dir/myDataFrame.xlsx',sheet_name='Sheet1')
Readmultiplesheetsfromthesamefile
>>>xlsx=pd.ExcelFile('file.xls')
>>>df=pd.read_excel(xlsx,'Sheet1')
read_sql()isaconveniencewrapperaroundread_sql_table()and
read_sql_query()
>>>s3=pd.Series([7,-2,3],index=['a','c','d'])
>>>s+s3
a 10.0
b NaN
c 5.0
d 7.0
ArithmeticOperationswithFillMethods
Youcanalsodotheinternaldataalignmentyourselfwiththehelpofthefillmethods:
>>>s.add(s3,fill_value=0)
a
10.0
b
-5.0
c
5.0
d
7.0
>>>s.sub(s3,fill_value=2)
>>>s.div(s3,fill_value=4)
>>>s.mul(s3,fill_value=3)
>>>pd.to_sql('myDf',engine)
DataCamp
LearnPythonforDataScienceInteractively
LoadingTheData
AlsoseeNumPy&Pandas
YourdataneedstobenumericandstoredasNumPyarraysorSciPysparsematrices.Othertypesthatareconvertibletonumericarrays,suchasPandasDataFrame,arealsoacceptable.
>>>importnumpyasnp
>>>X=np.random.random((10,5))
>>>y=np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>>X[X<0.7]=0
CreateYourModel
SupervisedLearningEstimators
UnsupervisedLearningEstimators
PrincipalComponentAnalysis(PCA)
>>>fromsklearn.decompositionimportPCA
>>>pca=PCA(n_components=0.95)
KMeans
>>>fromsklearn.clusterimportKMeans
>>>k_means=KMeans(n_clusters=3,random_state=0)
LinearRegression
>>>fromsklearn.linear_modelimportLinearRegression
>>>lr=LinearRegression(normalize=True)
SupportVectorMachines(SVM)
>>>fromsklearn.svmimportSVC
>>>svc=SVC(kernel='linear')
NaiveBayes
>>>fromsklearn.naive_bayesimportGaussianNB
>>>gnb=GaussianNB()
KNN
>>>fromsklearnimportneighbors
>>>knn=neighbors.KNeighborsClassifier(n_neighbors=5)
ModelFitting
Fitthemodeltothedata
Fittodata,thentransformit
Fitthemodeltothedata
Supervisedlearning
>>>lr.fit(X,y)
>>>knn.fit(X_train,y_train)
>>>svc.fit(X_train,y_train)
UnsupervisedLearning
>>>k_means.fit(X_train)
>>>pca_model=pca.fit_transform(X_train)
TuneYourModel
Prediction
PredictlabelsPredictlabels
Estimateprobabilityofalabel
Predictlabelsinclusteringalgos
SupervisedEstimators
>>>y_pred=svc.predict(np.random.random((2,5)))
>>>y_pred=lr.predict(X_test)
>>>y_pred=knn.predict_proba(X_test)
UnsupervisedEstimators
>>>y_pred=k_means.predict(X_test)
EvaluateYourModel’sPerformance
ClassificationMetrics
RegressionMetrics
ClusteringMetrics
Cross-Validation
>>>fromsklearn.cross_validationimportcross_val_score
>>>print(cross_val_score(knn,X_train,y_train,cv=4))
>>>print(cross_val_score(lr,X,y,cv=2))
AdjustedRandIndex
>>>fromsklearn.metricsimportadjusted_rand_score
>>>adjusted_rand_score(y_true,y_pred)
Homogeneity
>>>fromsklearn.metricsimporthomogeneity_score
>>>homogeneity_score(y_true,y_pred)
V-measure
>>>fromsklearn.metricsimportv_measure_score
>>>metrics.v_measure_score(y_true,y_pred)
MeanAbsoluteError
>>>fromsklearn.metricsimportmean_absolute_error
>>>y_true=[3,-0.5,2]
>>>mean_absolute_error(y_true,y_pred)
MeanSquaredError
>>>fromsklearn.metricsimportmean_squared_error
>>>mean_squared_error(y_test,y_pred)
R²Score
>>>fromsklearn.metricsimportr2_score
>>>r2_score(y_true,y_pred)
Standardization
EncodingCategoricalFeatures
Normalization
ImputingMissingValues
Binarization
GeneratingPolynomialFeatures
PreprocessingTheData
>>>fromsklearn.preprocessingimportPolynomialFeatures
>>>poly=PolynomialFeatures(5)
>>>poly.fit_transform(X)
>>>fromsklearn.preprocessingimportBinarizer
>>>binarizer=Binarizer(threshold=0.0).fit(X)
>>>binary_X=binarizer.transform(X)
>>>fromsklearn.preprocessingimportImputer
>>>imp=Imputer(missing_values=0,strategy='mean',axis=0)
>>>imp.fit_transform(X_train)
>>>fromsklearn.preprocessingimportNormalizer
>>>scaler=Normalizer().fit(X_train)
>>>normalized_X=scaler.transform(X_train)
>>>normalized_X_test=scaler.transform(X_test)
>>>fromsklearn.preprocessingimportLabelEncoder
>>>enc=LabelEncoder()
>>>y=enc.fit_transform(y)
>>>fromsklearn.preprocessingimportStandardScaler
>>>scaler=StandardScaler().fit(X_train)
>>>standardized_X=scaler.transform(X_train)
>>>standardized_X_test=scaler.transform(X_test)
TrainingAndTestData
>>>fromsklearn.model_selectionimporttrain_test_split
>>>X_train,X_test,y_train,y_test=train_test_split(X,
y,random_state=0)
PythonForDataScienceCheatSheet
AccuracyScore
>>>knn.score(X_test,y_test)
Estimatorscoremethod
>>>fromsklearn.metricsimportaccuracy_score
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 电子商务平台服务与使用协议
- 人家服务合同范本
- 中班社会领域活动方案
- 房屋双方买卖合同
- 车辆挂靠经营协议书
- 交通物流配送优化方案
- 班车租赁合同集锦
- 智能制造模具研发投资合同
- 工程机械施工协议书
- 新能源材料研发投资合同
- 《中国居民膳食指南》课件
- 银行柜面业务操作流程手册
- 烧烤配方出售合同范例
- 妇科手术麻醉
- Unit1RelationshipsLesson2HowDoWeLikeTeachers'Feedback课件高中英语北师大版选择性
- 库存管理规划
- 灌篮高手培训课件
- 小学生心理健康讲座5
- 贵州省房屋建筑和市政工程标准监理电子招标文件(2023年版)
- 高级职业培训师(三级)职业资格鉴定考试题及答案
- 静脉留置针操作常见的并发症及处理
评论
0/150
提交评论