版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
PythonForDataScienceCheatSheet
PythonBasics
LearnMorePythonforDataScienceInteractivelyat
VariablesandDataTypes
NumpyArrays
AlsoseeLists
SelectingNumpyArrayElements
Indexstartsat0
NumpyArrayOperations
NumpyArrayFunctions
DataCamp
LearnPythonforDataScienceInteractively
my_2darray[rows,columns]
Selectitemsatindex0and1
Selectitematindex1
Subset
>>>my_array[1]
2
Slice
>>>my_array[0:2]
array([1,2])
Subset2DNumpyarrays
>>>my_2darray[:,0]
array([1,4])
>>>my_array>3
array([False,False,False,True],dtype=bool)
>>>my_array*2
array([2,4,6,8])
>>>my_array+np.array([5,6,7,8])
array([6,8,10,12])
>>>my_list=[1,2,3,4]
>>>my_array=np.array(my_list)
>>>my_2darray=np.array([[1,2,3],[4,5,6]])
VariableAssignment
Lists
Selectitematindex1Select3rdlastitem
Selectitemsatindex1and2Selectitemsafterindex0Selectitemsbeforeindex3Copymy_list
my_list[list][itemOfList]
Subset
>>>my_list[1]
>>>my_list[-3]
Slice
>>>my_list[1:3]
>>>my_list[1:]
>>>my_list[:3]
>>>my_list[:]
SubsetListsofLists
>>>my_list2[1][0]
>>>my_list2[1][:2]
>>>a='is'
>>>b='nice'
>>>my_list=['my','list',a,b]
>>>my_list2=[[4,5,6,7],[3,4,5,6]]
SelectingListElements
AlsoseeNumPyArrays
Libraries
Importlibraries
>>>importnumpy
>>>importnumpyasnp
Selectiveimport
>>>frommathimportpi
InstallPython
Dataanalysis
Machinelearning
Scientificcomputing
2Dplotting
Leadingopendatascienceplatform FreeIDEthatisincluded CreateandsharepoweredbyPython withAnaconda documentswithlivecode,
visualizations,text,...
Indexstartsat0
>>>x=5
>>>x5
CalculationsWithVariables
>>>x+2
Sumoftwovariables
7
>>>x-2
Subtractionoftwovariables
3
>>>x*2
Multiplicationoftwovariables
10
>>>x**2
Exponentiationofavariable
25
>>>x%2
Remainderofavariable
1
>>>x/float(2)
Divisionofavariable
2.5
GettheindexofanitemCountanitem
AppendanitematatimeRemoveanitem
RemoveanitemReversethelistAppendanitemRemoveanitemInsertanitemSortthelist
>>>my_list.index(a)
>>>my_list.count(a)
>>>my_list.append('!')
>>>my_list.remove('!')
>>>del(my_list[0:1])
>>>my_list.reverse()
>>>my_list.extend('!')
>>>my_list.pop(-1)
>>>my_list.insert(0,'!')
>>>my_list.sort()
str()
'5','3.45','True'
Variablestostrings
int()
5,3,1
Variablestointegers
float()
5.0,1.0
Variablestofloats
bool()
True,True,True
Variablestobooleans
TypesandTypeConversion
ListOperations
>>>my_list+my_list
['my','list','is','nice','my','list','is','nice']
>>>my_list*2
['my','list','is','nice','my','list','is','nice']
>>>my_list2>4
True
ListMethods
AskingForHelp
>>>help(str)
>>>my_string='thisStringIsAwesome'
>>>my_string
'thisStringIsAwesome'
Strings
>>>my_string*2
'thisStringIsAwesomethisStringIsAwesome'
>>>my_string+'Innit'
'thisStringIsAwesomeInnit'
>>>'m'inmy_string
True
StringOperations
StringOperations
>>>my_string[3]
>>>my_string[4:9]
StringMethods
Indexstartsat0
>>>my_array.shape
>>>np.append(other_array)
GetthedimensionsofthearrayAppenditemstoanarray
>>>np.insert(my_array,1,5)
Insertitemsinanarray
>>>np.delete(my_array,[1])
Deleteitemsinanarray
>>>np.mean(my_array)
Meanofthearray
>>>np.median(my_array)
Medianofthearray
>>>my_array.corrcoef()
Correlationcoefficient
>>>np.std(my_array)
Standarddeviation
StringtouppercaseStringtolowercaseCountStringelementsReplaceStringelementsStripwhitespaces
>>>my_string.upper()
>>>my_string.lower()
>>>my_string.count('w')
>>>my_string.replace('e','i')
>>>my_string.strip()
PythonForDataScienceCheatSheet
JupyterNotebook
LearnMorePythonforDataScienceInteractivelyat
www.DataC
WorkingwithDifferentProgrammingLanguages
Kernelsprovidecomputationandcommunicationwithfront-endinterfaceslikethenotebooks.Therearethreemainkernels:
IRkernel IJulia
InstallingJupyterNotebookwillautomaticallyinstalltheIPythonkernel.
Widgets
Notebookwidgetsprovidetheabilitytovisualizeandcontrolchangesinyourdata,oftenasacontrollikeaslider,textbox,etc.
YoucanusethemtobuildinteractiveGUIsforyournotebooksortosynchronizestatefulandstatelessinformationbetweenPythonandJavaScript.
Saving/LoadingNotebooks
Createnewnotebook
Makeacopyofthecurrentnotebook
Savecurrentnotebookandrecordcheckpoint
Previewoftheprintednotebook
Closenotebook&stoprunninganyscripts
Openanexistingnotebook
Renamenotebook
Revertnotebooktoapreviouscheckpoint
Downloadnotebookas
IPythonnotebook
Python
HTML
Markdown
reST
Restartkernel
Restartkernel&runallcells
Restartkernel&runallcells
CommandMode:
Interruptkernel
Interruptkernel&clearalloutput
Connectbacktoaremotenotebook
Runotherinstalledkernels
Downloadserializedstateofallwidgetmodelsinuse
Savenotebookwithinteractivewidgets
Embedcurrentwidgets
15
13 14
WritingCodeAndText
LaTeX
1 2 3 4 5 67 8910 11 12
Codeandtextareencapsulatedby3basiccelltypes:markdowncells,codecells,andrawNBConvertcells.
EditCells
EditMode:
Saveandcheckpoint
Insertcellbelow
Interruptkernel
Restartkernel
Cutcurrentlyselectedcellstoclipboard
Pastecellsfromclipboardabovecurrentcell
Pastecellsfrom
Copycellsfromclipboardtocurrentcursorposition
Pastecellsfromclipboardbelowcurrentcell
ExecutingCells
Runselectedcell(s) Runcurrentcellsdownandcreateanewone
below
Cutcell
Copycell(s)
Pastecell(s)below
Movecellup
Movecelldown
Runcurrentcell
AskingForHelp
Displaycharacteristics
Opencommandpalette
Currentkernel
Kernelstatus
Logoutfromnotebookserver
clipboardontopofcurrentcel
Revert“DeleteCells”
invocation
Mergecurrentcellwiththeoneabove
Movecurrentcellup
Adjustmetadataunderlyingthecurrentnotebook
Removecellattachments
Pasteattachmentsofcurrentcell
InsertCells
Addnewcellabovethecurrentone
Deletecurrentcells
Splitupacellfromcurrentcursorposition
Mergecurrentcellwiththeonebelow
Movecurrentcelldown
Findandreplaceinselectedcells
Copyattachmentsofcurrentcell
Insertimageinselectedcells
Addnewcellbelowthecurrentone
Runcurrentcellsdownandcreateanewoneabove
Runallcellsabovethecurrentcell
Changethecelltypeofcurrentcell
toggle,togglescrollingandclearalloutput
ViewCells
ToggledisplayofJupyterlogoandfilename
Togglelinenumbersincells
Runallcells
Runallcellsbelowthecurrentcell
toggle,togglescrollingandclearcurrentoutputs
Toggledisplayoftoolbar
Toggledisplayofcellactionicons:
None
Editmetadata
Rawcellformat
Slideshow
Attachments
Tags
WalkthroughaUItour
Editthebuilt-inkeyboardshortcuts
Descriptionofmarkdownavailableinnotebook
PythonhelptopicsNumPyhelptopicsMatplotlibhelptopics
Pandashelptopics
DataCamp
Listofbuilt-inkeyboardshortcuts
Notebookhelptopics
InformationonunofficialJupyterNotebookextensions
IPythonhelptopicsSciPyhelptopicsSymPyhelptopics
AboutJupyterNotebook
LearnPythonforDataScienceInteractively
NumPy
TheNumPylibraryisthecorelibraryforscientificcomputinginPython.Itprovidesahigh-performancemultidimensionalarrayobject,andtoolsforworkingwiththesearrays.
Usethefollowingimportconvention:
>>>importnumpyasnp
NumPyArrays
2
1Darray
2Darray
axis1
axis0
3Darray
axis2
axis1
axis0
CreatingArrays
InitialPlaceholders
I/O
Saving&LoadingOnDisk
Saving&LoadingTextFiles
DataTypes
CreateanarrayofzerosCreateanarrayofonesCreateanarrayofevenlyspacedvalues(stepvalue)
Createanarrayofevenly
spacedvalues(numberofsamples)
CreateaconstantarrayCreatea2X2identitymatrix
CreateanarraywithrandomvaluesCreateanemptyarray
>>>np.zeros((3,4))
>>>np.ones((2,3,4),dtype=16)
>>>d=np.arange(10,25,5)
>>>np.linspace(0,2,9)
>>>e=np.full((2,2),7)
>>>f=np.eye(2)
>>>np.random.random((2,2))
>>>np.empty((3,2))
>>>np.loadtxt("myfile.txt")
>>>np.genfromtxt("my_file.csv",delimiter=',')
>>>np.savetxt("myarray.txt",a,delimiter="")
>>>np.save('my_array',a)
>>>np.savez('array.npz',a,b)
>>>np.load('my_array.npy')
>>>a=np.array([1,2,3])
>>>b=np.array([(1.5,2,3),(4,5,6)],dtype=float)
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]],
dtype=float)
1 2
3
Subsetting,Slicing,Indexing
AlsoseeLists
Subsetting
>>>a[2]
3
>>>b[1,2]
6.0
Slicing
>>>a[0:2]
array([1,2])
>>>b[0:2,1]
array([2.,5.])
>>>b[:1]
array([[1.5,2.,3.]])
>>>c[1,...]
array([[[3.,2.,1.],
[4.,5.,6.]]])
>>>a[::-1]
array([3,2,1])
BooleanIndexing
>>>a[a<2]
array([1])
12
3
1.52
4
5
3
6
Selecttheelementatthe2ndindex
Selecttheelementatrow0column2(equivalenttob[1][2])
123 Selectitemsatindex0and1
Selectitemsatrows0and1incolumn1
Selectallitemsatrow0(equivalenttob[0:1,:])
Sameas[1,:,:]
Reversedarraya
123 Selectelementsfromalessthan2
FancyIndexing
>>>b[[1,0,1,0],[0,1,2,0]] Selectelements(1,0),(0,1),(1,2)and(0,0)
array([4.,2.,6.,1.5])
>>>b[[1,0,1,0]][:,[0,1,2,0]] Selectasubsetofthematrix’srows
array([[4.,5.,6.,4.], andcolumns
[1.5,2.,3.,1.5],
[4.,5.,6.,4.],
[1.5,2.,3.,1.5]])
ArrayManipulation
CopyingArrays
SortingArrays
Sortanarray
Sorttheelementsofanarray'saxis
>>>a.sort()
>>>c.sort(axis=0)
InspectingYourArray
>>>a.shape
Arraydimensions
>>>len(a)
Lengthofarray
>>>b.ndim
Numberofarraydimensions
>>>e.size
Numberofarrayelements
>>>b.dtype
Datatypeofarrayelements
>>>
Nameofdatatype
>>>b.astype(int)
Convertanarraytoadifferenttype
>>>a.sum()
Array-wisesum
>>>a.min()
Array-wiseminimumvalue
>>>b.max(axis=0)
Maximumvalueofanarrayrow
>>>b.cumsum(axis=1)
Cumulativesumoftheelements
>>>a.mean()
Mean
>>>b.median()
Median
>>>a.corrcoef()
Correlationcoefficient
>>>np.std(b)
Standarddeviation
TransposingArray
>>>i=np.transpose(b)
>>>i.T
PermutearraydimensionsPermutearraydimensions
ChangingArrayShape
>>>b.ravel()
>>>g.reshape(3,-2)
Flattenthearray
Reshape,butdon’tchangedata
Adding/RemovingElements
>>>h.resize((2,6))
Returnanewarraywithshape(2,6)
>>>np.append(h,g)
Appenditemstoanarray
>>>np.insert(a,1,5)
Insertitemsinanarray
>>>np.delete(a,[1])
Deleteitemsfromanarray
CombiningArrays
>>>np.concatenate((a,d),axis=0)
Concatenatearrays
array([1,2,3,10,15,20])
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
array([[1.,2.,3.],
[1.5,2.,3.],
[4.,5.,6.]])
>>>np.r_[e,f]
Stackarraysvertically(row-wise)
>>>np.hstack((e,f))
array([[7.,7.,1.,0.],
Stackarrayshorizontally(column-wise)
[7.,7.,0.,1.]])
>>>np.column_stack((a,d))
Createstackedcolumn-wisearrays
array([[1,10],
[2,15],
[3,20]])
>>>np.c_[a,d]
Createstackedcolumn-wisearrays
SplittingArrays
>>>np.hsplit(a,3)
[array([1]),array([2]),array([3])]
>>>np.vsplit(c,2)
[array([[[1.5,2.,1.],
[4.,5.,6.]]]),
array([[[3.,2.,3.],
[4.,5.,6.]]])]
Splitthearrayhorizontallyatthe3rdindex
Splitthearrayverticallyatthe2ndindex
DataCamp
LearnPythonforDataScienceInteractively
>>>h=a.view()
>>>np.copy(a)
>>>h=a.copy()
CreateaviewofthearraywiththesamedataCreateacopyofthearray
Createadeepcopyofthearray
>>>64
Signed64-bitintegertypes
>>>np.float32
Standarddouble-precisionfloatingpoint
>>>plex
Complexnumbersrepresentedby128floats
>>>np.bool
BooleantypestoringTRUEandFALSEvalues
>>>np.object
Pythonobjecttype
>>>np.string_
Fixed-lengthstringtype
>>>np.unicode_
Fixed-lengthunicodetype
PythonForDataScienceCheatSheet
NumPyBasics
ArrayMathematics
ArithmeticOperations
Comparison
AggregateFunctions
Element-wisecomparison
Element-wisecomparisonArray-wisecomparison
>>>a==b
array([[False,True,True],
[False,False,False]],dtype=bool)
>>>a<2
array([True,False,False],dtype=bool)
>>>np.array_equal(a,b)
Subtraction
SubtractionAddition
AdditionDivision
DivisionMultiplication
MultiplicationExponentiationSquareroot
PrintsinesofanarrayElement-wisecosine
Element-wisenaturallogarithmDotproduct
>>>np.divide(a,b)
>>>a*b
array([[1.5, 4., 9.],
[4.,10.,18.]])
>>>np.multiply(a,b)
>>>np.exp(b)
>>>np.sqrt(b)
>>>np.sin(a)
>>>np.cos(b)
>>>np.log(a)
>>>e.dot(f)
array([[7.,7.],
[7.,7.]])
],
]])
>>>g=a-b
array([[-0.5,0.,0.],
[-3.,-3.,-3.]])
>>>np.subtract(a,b)
>>>b+a
array([[2.5,4.,6.],
[5.,7.,9.]])
>>>np.add(b,a)
>>>a/b
array([[0.66666667,1. ,1.
[0.25 ,0.4 ,0.5
LearnPythonforDataScienceInteractivelyat
www.DataC
AskingForHelp
>>>(np.ndarray.dtype)
1.5
4
2
3
5
6
1.523
4
5
6
1.5
2
3
4
5
6
PythonForDataScienceCheatSheet
SciPy-LinearAlgebra
LearnMorePythonforDataScienceInteractivelyat
LinearAlgebra
You’llusethelinalgandsparsemodules.Notethatscipy.linalgcontainsandexpandsonnumpy.linalg.
>>>fromscipyimportlinalg,sparse
MatrixFunctions
SciPy
TheSciPylibraryisoneofthecorepackagesforscientificcomputingthatprovidesmathematicalalgorithmsandconveniencefunctionsbuiltontheNumPyextensionofPython.
Addition
>>>np.add(A,D)
Subtraction
>>>np.subtract(A,D)
Division
>>>np.divide(A,D)
Multiplication
>>>np.multiply(D,A)
>>>np.dot(A,D)
>>>np.vdot(A,D)
>>>np.inner(A,D)
>>>np.outer(A,D)
>>>np.tensordot(A,D)
>>>np.kron(A,D)
ExponentialFunctions
>>>linalg.expm(A)
>>>linalg.expm2(A)
>>>linalg.expm3(D)
AdditionSubtractionDivision
MultiplicationDotproduct
Vectordotproduct
InnerproductOuterproductTensordotproductKroneckerproduct
Matrixexponential
Matrixexponential(TaylorSeries)
Matrixexponential(eigenvalue
decomposition)
LogarithmFunction
>>>linalg.logm(A)
TrigonometricTunctions
>>>linalg.sinm(D)
>>>linalg.cosm(D)
>>>linalg.tanm(A)
HyperbolicTrigonometricFunctions
>>>linalg.sinhm(D)
>>>linalg.coshm(D)
>>>linalg.tanhm(A)
MatrixSignFunction
>>>np.sigm(A)
MatrixSquareRoot
>>>linalg.sqrtm(A)
ArbitraryFunctions
>>>linalg.funm(A,lambdax:x*x)
Matrixlogarithm
MatrixsineMatrixcosineMatrixtangent
HypberbolicmatrixsineHyperbolicmatrixcosineHyperbolicmatrixtangent
MatrixsignfunctionMatrixsquarerootEvaluatematrixfunction
CreatingMatrices
AlsoseeNumPy
>>>A=np.matrix(np.random.random((2,2)))
>>>B=np.asmatrix(b)
>>>C=np.mat(np.random.random((10,5)))
>>>D=np.mat([[3,4],[5,6]])
InverseInverse
Tranposematrix
ConjugatetranspositionTrace
Frobeniusnorm
L1norm(maxcolumnsum)Linfnorm(maxrowsum)
MatrixrankDeterminant
SolverfordensematricesSolverfordensematrices
Least-squaressolutiontolinearmatrixequation
Computethepseudo-inverseofamatrix(least-squaressolver)
Computethepseudo-inverseofamatrix(SVD)
Inverse
>>>A.I
>>>linalg.inv(A)
>>>A.T
>>>A.H
>>>np.trace(A)
Norm
>>>linalg.norm(A)
>>>linalg.norm(A,1)
>>>linalg.norm(A,np.inf)
Rank
>>>np.linalg.matrix_rank(C)
Determinant
>>>linalg.det(A)
Solvinglinearproblems
>>>linalg.solve(A,b)
>>>E=np.mat(a).T
>>>linalg.lstsq(D,E)
Generalizedinverse
>>>linalg.pinv(C)
>>>linalg.pinv2(C)
>>>np.mgrid[0:5,0:5]
Createadensemeshgrid
>>>np.ogrid[0:2,0:2]
Createanopenmeshgrid
>>>np.r_[[3,[0]*5,-1:1:10j]
Stackarraysvertically(row-wise)
>>>np.c_[b,c]
Createstackedcolumn-wisearrays
BasicMatrixRoutines
InteractingWithNumPy
AlsoseeNumPy
IndexTricks
ShapeManipulation
Polynomials
VectorizingFunctions
TypeHandling
OtherUsefulFunctions
ReturntheangleofthecomplexargumentCreateanarrayofevenlyspacedvalues
(numberofsamples)
Unwrap
Createanarrayofevenlyspacedvalues(logscale)Returnvaluesfromalistofarraysdependingonconditions
Factorial
CombineNthingstakenatktimeWeightsforNp-pointcentralderivative
Findthen-thderivativeofafunctionatapoint
>>>np.angle(b,deg=True)
>>>g=np.linspace(0,np.pi,num=5)
>>>g[3:]+=np.pi
>>>np.unwrap(g)
>>>np.logspace(0,10,3)
>>>np.select([c<4],[c*2])
>>>misc.factorial(a)
>>>b(10,3,exact=True)
>>>misc.central_diff_weights(3)
>>>misc.derivative(myfunc,1.0)
ReturntherealpartofthearrayelementsReturntheimaginarypartofthearrayelementsReturnarealarrayifcomplexpartscloseto0Castobjecttoadatatype
>>>np.real(c)
>>>np.imag(c)
>>>np.real_if_close(c,tol=1000)
>>>np.cast['f'](np.pi)
Vectorizefunctions
>>>defmyfunc(a):
ifa<0:returna*2
else:
returna/2
>>>np.vectorize(myfunc)
Createapolynomialobject
>>>fromnumpyimportpoly1d
>>>p=poly1d([3,4,5])
>>>importnumpyasnp
>>>a=np.array([1,2,3])
>>>b=np.array([(1+5j,2j,3j),(4j,5j,6j)])
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]])
Createa2X2identitymatrixCreatea2x2identitymatrix
CompressedSparseRowmatrixCompressedSparseColumnmatrixDictionaryOfKeysmatrix
Sparsematrixtofullmatrix
Identifysparsematrix
>>>F=np.eye(3,k=1)
>>>G=np.mat(np.identity(2))
>>>C[C>0.5]=0
>>>H=sparse.csr_matrix(C)
>>>I=sparse.csc_matrix(D)
>>>J=sparse.dok_matrix(A)
>>>E.todense()
>>>sparse.isspmatrix_csc(A)
>>>np.transpose(b)
Permutearraydimensions
>>>b.flatten()
Flattenthearray
>>>np.hstack((b,c))
Stackarrayshorizontally(column-wise)
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
>>>np.hsplit(c,2)
Splitthearrayhorizontallyatthe2ndindex
>>>np.vpslit(d,2)
Splitthearrayverticallyatthe2ndindex
CreatingSparseMatrices
Decompositions
InverseNorm
Solverforsparsematrices
Inverse
>>>sparse.linalg.inv(I)
Norm
>>>sparse.linalg.norm(I)
Solvinglinearproblems
>>>sparse.linalg.spsolve(H,I)
SolveordinaryorgeneralizedeigenvalueproblemforsquarematrixUnpackeigenvalues
FirsteigenvectorSecondeigenvectorUnpackeigenvalues
SingularValueDecomposition(SVD)ConstructsigmamatrixinSVD
LUDecomposition
EigenvaluesandEigenvectors
>>>la,v=linalg.eig(A)
>>>l1,l2=la
>>>v[:,0]
>>>v[:,1]
>>>linalg.eigvals(A)
SingularValueDecomposition
>>>U,s,Vh=linalg.svd(B)
>>>M,N=B.shape
>>>Sig=linalg.diagsvd(s,M,N)
LUDecomposition
>>>P,L,U=linalg.lu(C)
SparseMatrixRoutines
SparseMatrixFunctions
SparseMatrixDecompositions
Sparsematrixexponential
>>>sparse.linalg.expm(I)
DataCamp
LearnPythonforDataScienceInteractively
EigenvaluesandeigenvectorsSVD
>>>la,v=sparse.linalg.eigs(F,1)
>>>sparse.linalg.svds(H,2)
AskingForHelp
>>>help(scipy.linalg.diagsvd)
>>>(np.matrix)
PythonForDataScienceCheatSheet
PandasBasics
LearnPythonforDataScienceInteractivelyat
www.DataC
Pandas
ThePandaslibraryisbuiltonNumPyandprovideseasy-to-usedatastructuresanddataanalysistoolsforthePythonprogramminglanguage.
Usethefollowingimportconvention:
>>>importpandasaspd
PandasDataStructures
Series
a
3
b
-5
c
7
d
4
Aone-dimensionallabeledarraycapableofholdinganydatatype
AskingForHelp
>>>help(pd.Series.loc)
Selection
>>>s['b']
-5
>>>df[1:]
Country
India
Brazil
CapitalNewDelhiBrasília
Population1303171035
207847528
Getoneelement
GetsubsetofaDataFrame
Getting
ByPosition
>>>df.iloc([0],[0])
'Belgium'
>>>df.iat([0],[0])
'Belgium'
ByLabel
>>>df.loc([0],['Country'])
'Belgium'
>>>df.at([0],['Country'])
'Belgium'
ByLabel/Position
>>>df.ix[2]
Country Brazil
Capital BrasíliaPopulation207847528
>>>df.ix[:,'Capital']
Brussels
NewDelhi
Brasília
>>>df.ix[1,'Capital']
'NewDelhi'
BooleanIndexing
>>>s[~(s>1)]
>>>s[(s<-1)|(s>2)]
>>>df[df['Population']>1200000000]
Setting
>>>s['a']=6
Selecting,BooleanIndexing&Setting
AlsoseeNumPyArrays
Selectsinglevaluebyrow&column
Dropping
Dropvaluesfromrows(axis=0)Dropvaluesfromcolumns(axis=1)
>>>s.drop(['a','c'])
>>>df.drop('Country',axis=1)
Sort&Rank
SortbylabelsalonganaxisSortbythevaluesalonganaxisAssignrankstoentries
>>>df.sort_index()
>>>df.sort_values(by='Country')
>>>df.rank()
RetrievingSeries/DataFrameInformation
BasicInformation
>>>df.shape
(rows,columns)
>>>df.index
Describeindex
>>>df.columns
DescribeDataFramecolumns
>>>()
InfoonDataFrame
>>>df.count()
Numberofnon-NAvalues
>>>df.sum()
Sumofvalues
>>>df.cumsum()
Cummulativesumofvalues
>>>df.min()/df.max()
Minimum/maximumvalues
>>>df.idxmin()/df.idxmax()
Minimum/Maximumindexvalue
>>>df.describe()
Summarystatistics
>>>df.mean()
Meanofvalues
>>>df.median()
Medianofvalues
Summary
Selectsinglevaluebyrow&columnlabels
Index
>>>s=pd.Series([3,-5,7,4],index=['a','b','c','d'])
DataFrame
Country
Capital
Population
0
Belgium
Brussels
11190846
1
India
NewDelhi
1303171035
2
Brazil
Brasília
207847528
Columns
Applyfunction
Applyfunctionelement-wise
>>>f=lambdax:x*2
>>>df.apply(f)
>>>df.applymap(f)
ApplyingFunctions
Index
Atwo-dimensionallabeleddatastructurewithcolumnsofpotentiallydifferenttypes
Selectsinglerowofsubsetofrows
Selectasinglecolumnofsubsetofcolumns
DataAlignment
InternalDataAlignment
NAvaluesareintroducedintheindicesthatdon’toverlap:
Selectrowsandcolumns
>>>data={'Country':['Belgium','India','Brazil'],
'Capital':['Brussels','NewDelhi','Brasília'],'Population':[11190846,1303171035,207847528]}
>>>df=pd.DataFrame(data,
columns=['Country','Capital','Population'])
I/O
>>>pd.read_csv('file.csv',header=None,nrows=5)
>>>df.to_csv('myDataFrame.csv')
ReadandWritetoCSV
ReadandWritetoExcel
Seriesswherevalueisnot>1
swherevalueis<-1or>2
UsefiltertoadjustDataFrame
SetindexaofSeriessto6
ReadandWritetoSQLQueryorDatabaseTable
>>>fromsqlalchemyimportcreate_engine
>>>engine=create_engine('sqlite:///:memory:')
>>>pd.read_sql("SELECT*FROMmy_table;",engine)
>>>pd.read_sql_table('my_table',engine)
>>>pd.read_sql_query("SELECT*FROMmy_table;",engine)
>>>pd.read_excel('file.xlsx')
>>>pd.to_excel('dir/myDataFrame.xlsx',sheet_name='Sheet1')
Readmultiplesheetsfromthesamefile
>>>xlsx=pd.ExcelFile('file.xls')
>>>df=pd.read_excel(xlsx,'Sheet1')
read_sql()isaconveniencewrapperaroundread_sql_table()and
read_sql_query()
>>>s3=pd.Series([7,-2,3],index=['a','c','d'])
>>>s+s3
a 10.0
b NaN
c 5.0
d 7.0
ArithmeticOperationswithFillMethods
Youcanalsodotheinternaldataalignmentyourselfwiththehelpofthefillmethods:
>>>s.add(s3,fill_value=0)
a
10.0
b
-5.0
c
5.0
d
7.0
>>>s.sub(s3,fill_value=2)
>>>s.div(s3,fill_value=4)
>>>s.mul(s3,fill_value=3)
>>>pd.to_sql('myDf',engine)
DataCamp
LearnPythonforDataScienceInteractively
LoadingTheData
AlsoseeNumPy&Pandas
YourdataneedstobenumericandstoredasNumPyarraysorSciPysparsematrices.Othertypesthatareconvertibletonumericarrays,suchasPandasDataFrame,arealsoacceptable.
>>>importnumpyasnp
>>>X=np.random.random((10,5))
>>>y=np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>>X[X<0.7]=0
CreateYourModel
SupervisedLearningEstimators
UnsupervisedLearningEstimators
PrincipalComponentAnalysis(PCA)
>>>fromsklearn.decompositionimportPCA
>>>pca=PCA(n_components=0.95)
KMeans
>>>fromsklearn.clusterimportKMeans
>>>k_means=KMeans(n_clusters=3,random_state=0)
LinearRegression
>>>fromsklearn.linear_modelimportLinearRegression
>>>lr=LinearRegression(normalize=True)
SupportVectorMachines(SVM)
>>>fromsklearn.svmimportSVC
>>>svc=SVC(kernel='linear')
NaiveBayes
>>>fromsklearn.naive_bayesimportGaussianNB
>>>gnb=GaussianNB()
KNN
>>>fromsklearnimportneighbors
>>>knn=neighbors.KNeighborsClassifier(n_neighbors=5)
ModelFitting
Fitthemodeltothedata
Fittodata,thentransformit
Fitthemodeltothedata
Supervisedlearning
>>>lr.fit(X,y)
>>>knn.fit(X_train,y_train)
>>>svc.fit(X_train,y_train)
UnsupervisedLearning
>>>k_means.fit(X_train)
>>>pca_model=pca.fit_transform(X_train)
TuneYourModel
Prediction
PredictlabelsPredictlabels
Estimateprobabilityofalabel
Predictlabelsinclusteringalgos
SupervisedEstimators
>>>y_pred=svc.predict(np.random.random((2,5)))
>>>y_pred=lr.predict(X_test)
>>>y_pred=knn.predict_proba(X_test)
UnsupervisedEstimators
>>>y_pred=k_means.predict(X_test)
EvaluateYourModel’sPerformance
ClassificationMetrics
RegressionMetrics
ClusteringMetrics
Cross-Validation
>>>fromsklearn.cross_validationimportcross_val_score
>>>print(cross_val_score(knn,X_train,y_train,cv=4))
>>>print(cross_val_score(lr,X,y,cv=2))
AdjustedRandIndex
>>>fromsklearn.metricsimportadjusted_rand_score
>>>adjusted_rand_score(y_true,y_pred)
Homogeneity
>>>fromsklearn.metricsimporthomogeneity_score
>>>homogeneity_score(y_true,y_pred)
V-measure
>>>fromsklearn.metricsimportv_measure_score
>>>metrics.v_measure_score(y_true,y_pred)
MeanAbsoluteError
>>>fromsklearn.metricsimportmean_absolute_error
>>>y_true=[3,-0.5,2]
>>>mean_absolute_error(y_true,y_pred)
MeanSquaredError
>>>fromsklearn.metricsimportmean_squared_error
>>>mean_squared_error(y_test,y_pred)
R²Score
>>>fromsklearn.metricsimportr2_score
>>>r2_score(y_true,y_pred)
Standardization
EncodingCategoricalFeatures
Normalization
ImputingMissingValues
Binarization
GeneratingPolynomialFeatures
PreprocessingTheData
>>>fromsklearn.preprocessingimportPolynomialFeatures
>>>poly=PolynomialFeatures(5)
>>>poly.fit_transform(X)
>>>fromsklearn.preprocessingimportBinarizer
>>>binarizer=Binarizer(threshold=0.0).fit(X)
>>>binary_X=binarizer.transform(X)
>>>fromsklearn.preprocessingimportImputer
>>>imp=Imputer(missing_values=0,strategy='mean',axis=0)
>>>imp.fit_transform(X_train)
>>>fromsklearn.preprocessingimportNormalizer
>>>scaler=Normalizer().fit(X_train)
>>>normalized_X=scaler.transform(X_train)
>>>normalized_X_test=scaler.transform(X_test)
>>>fromsklearn.preprocessingimportLabelEncoder
>>>enc=LabelEncoder()
>>>y=enc.fit_transform(y)
>>>fromsklearn.preprocessingimportStandardScaler
>>>scaler=StandardScaler().fit(X_train)
>>>standardized_X=scaler.transform(X_train)
>>>standardized_X_test=scaler.transform(X_test)
TrainingAndTestData
>>>fromsklearn.model_selectionimporttrain_test_split
>>>X_train,X_test,y_train,y_test=train_test_split(X,
y,random_state=0)
PythonForDataScienceCheatSheet
AccuracyScore
>>>knn.score(X_test,y_test)
Estimatorscoremethod
>>>fromsklearn.metricsimportaccuracy_score
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026北京城乡旅游汽车出租有限责任公司招聘40人备考题库及答案详解(历年真题)
- 2026上海复旦大学计算力学与人工智能交叉研究院(筹)招聘专任研究员2人备考题库及答案详解(历年真题)
- 2026广西贵港市电子商务促进中心招募就业见习人员2人备考题库带答案详解(达标题)
- 2026内蒙古地质矿产集团有限公司社会招聘65人备考题库附答案详解(基础题)
- 2026山东大学微生物技术研究院张友明教授团队招聘1人备考题库及答案详解1套
- 2026广东东莞市应急管理局第一次招聘编外聘用人员1人备考题库含答案详解(a卷)
- 2026内蒙古锡林郭勒盟多伦县第一批招募公益性岗位工作人员9人备考题库附参考答案详解(基础题)
- 2026山东青岛市人力资源集团有限公司招聘14人备考题库附答案详解(培优b卷)
- 2026上半年贵州事业单位联考遵义市红花岗区招聘291人备考题库含答案详解(预热题)
- 2026中国贸促会直属单位招聘工作人员10人备考题库含答案详解(基础题)
- GB/T 45891-2025肥料和土壤调理剂肥料原料中腐植酸和疏水性黄腐酸含量的测定
- DB54T 0496-2025 退化高寒草原免耕补播技术规程
- 住建局窗口管理办法
- 2025年离婚抖音作品离婚协议书
- 新时代教育者核心素养与使命担当
- 2024年新高考Ⅰ卷数学真题解题技巧(1题2-4解)和考前变式训练(原卷版)
- 加气站气瓶充装质量保证体系手册2024版
- 2025年九江职业大学高职单招职业技能测试近5年常考版参考题库含答案解析
- 上海市重点建设项目社会稳定风险评估报告编制指南
- 专题03绕某点旋转90度求坐标
- 《6.2.2 平面向量的数量积》考点讲解复习与同步训练
评论
0/150
提交评论