统计专业英语 Chap1-OverviewProbability-and-Statistics_第1页
统计专业英语 Chap1-OverviewProbability-and-Statistics_第2页
统计专业英语 Chap1-OverviewProbability-and-Statistics_第3页
统计专业英语 Chap1-OverviewProbability-and-Statistics_第4页
统计专业英语 Chap1-OverviewProbability-and-Statistics_第5页
已阅读5页,还剩62页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Chapter1.OverviewandDescriptiveStatistics

WeiqiLuo(骆伟祺)SchoolofSoftwareSunYat-SenUniversityEmail:weiqi.luo@Office:#A313Textbook:

JayL.Devore,Probabilityandstatisticsforengineeringandthesciences(the6th

Edition),ChinaMachinePress,2011References:

1.MillerandFreund,“ProbabilityandStatisticsforEngineers”(the7thEdition),PublishingHouseofElectronicsIndustry,2005.2.盛骤、谢式千、潘承毅,《概率论与数理统计》第4版,高等教育出版社,2008

KaiLaiChung,“ACourseinProbabilityTheory”,(the3rdEdition),ChinaMachinePress,2010.2MATLABApowerfulsoftwarewithvarioustoolboxes,including

StatisticsToolboxImageProcessingToolboxSignalprocessingToolboxRobustControlToolboxCurveFittingToolboxFuzzyLogicToolbox

3PrerequisiteCoursesSE-101AdvancedMathematicsSE-103LinearAlgebraSuccessiveCoursesSE-328DigitalSignalProcessingSE-343DigitalImageProcessingSE-352InformationSecurityPatternRecognition&Machinelearningetc.4UncertaintyItcanbeassessedinformallyusingthelanguagesuchas“itisunlikely”or“probably”.

5WhatisUncertainty?Thissciencecameofgamblingin7thcenturyProbabilitymeasuresuncertaintyformally,quantitatively.It

isthemathematicallanguageofuncertainty.Statisticsshowsomeusefulinformationfromtheuncertaindata,andprovidethebasisformakingdecisionsorchoosingactions.

6WhyStudyProbability&Statistics?WeatherForecastApplications7InmedicaltreatmentApplications8e.g.RelationshipbetweensmokingandlungcancerBirthdayParadox(fromWikipedia)Applications9Benford’sLaw/FirstDigitLaw(fromWikipedia)Applications10AccountingForensicsMultimediaForensics…TimeSeriesAnalysisApplications11EconomicForecastingSalesForecastingBudgetaryAnalysisStockMarketAnalysisProcessandQualityControlInventoryStudiesetc.Moreinterestingapplicationsinreallife

/v_playlist/f1486775o1p0.htmlApplications121.1.Populations,Samples,andProcesses1.2.PictorialandTabularMethodsinDescriptiveStatistics1.3MeasuresofLocation1.4.MeasuresofVariability13Chapter1:Overview&DescriptiveStatisticsPopulationAninvestigationwilltypicallyfocusonawell-definedcollectionofobjects(units).Andapopulationisthesetofallobjectsofinterestinaparticularstudy.VariablesAnycharacteristicwhosevalue(categoricalornumerical)maychangefromoneobjecttoanotherinthepopulation.141.1.Populations,Samples,andProcessesPopulationUnit/ObjectVariables/CharacteristicsAllstudentscurrentlyintheclassStudentHeightWeightHoursofworkperweekRight/left–handedAllPrintedcircuitboardsmanufacturedduringamonthBoardTypeofdefectsNumberofdefectsLocationofdefeatsAllcampusfastfoodrestaurantsRestaurantNumberofemployeesSeatingcapacityHiring/nothiringAllbooksinlibraryBookReplacementcostFrequencyofcheckoutRepairsneeds1.1.Populations,Samples,andProcesses15ExamplesofPopulations,ObjectsandvariablesSampleAsubsetofthepopulation1.1.Populations,Samples,andProcesses16PopulationSampleAccordingtothenumberofthevariablesunderinvestigation,wehaveUnivariate:asinglevariable,e.g.thetypeoftransmission,automaticormanual,oncarsBivariate:twovariables,e.g.theheight&weightofthestudentsMultivariate:morethantwovariables,e.g.systolicbloodpressure,diastolicbloodpressureandserumcholesterollevelforeachpatient

1.1.Populations,Samples,andProcesses17DescriptivestatisticsAninvestigatorwhohascollecteddatamaywishsimplytosummarizeanddescribeimportantfeaturesofthedata.Thisentailsusingmethodsfromdescriptivestatistics

Graphicalmethods(Sec.1.2),

e.g.

Stem-and-Leafdisplay,Dotplot&histogramsNumericalsummarymeasures(Sec.1.3,1.4),

e.g.

means,standarddeviations&correlationscoefficients1.1.Populations,Samples,andProcesses18Example1.1.

Hereisdataconsistingofobservationsonx=O-ringtemperatureforeachtestfiringoractuallaunchoftheshuttlerocketengine.

1.1.Populations,Samples,andProcesses19844961408367456670698058686067727370576370785267536775617081767975765831NormalizedHistogram1.1.Populations,Samples,andProcesses202535455565758520%10%30%40%Thepercentageofthetemperatureslocatedinthebin[25,35]Inferentialstatistics

Usesampleinformationtodrawsometypeofconclusion(makeaninferenceofsomesort)aboutthepopulation.PointEstimation----Chapter6Hypothesistesting----Chapter8Estimationbyconfidenceinterval---Chapter7…1.1.Populations,Samples,andProcesses21Probability&Statistics

1.1.Populations,Samples,andProcesses22PopulationSampleDeductiveReasoning(Probability)InductiveReasoning(InferentialStatistics)Themathematicallanguageis“Probability”CollectingDataIfdataisnotproperlycollected,aninvestigatormaynotbeabletoanswerthequestionsunderconsiderationwithareasonabledegreeofconfidence.MethodsforcollectingdataRandomsampling:anyparticularsubsetofthespecifiedsizehasthesamechanceofbeingselectedStratifiedsampling:entailsseparatingthepopulationunitsintonon-overlappinggroupsandtakingasamplefromeachone.1.1.Populations,Samples,andProcesses23DescriptiveStatisticsVisualtechniques(Sec.1.2)Stem-and-LeafDisplaysDotplotsHistogramNumericalsummarymeasures(Sec.1.3&1.4)MeasuresoflocationMeasureofvariability24Notation

Samplesize:Thenumberofobservationsinasinglesamplewilloftenbedenotedbyn.

Givenadatasetconsistingofn

observationsonsomevariablex,theindividualobservationswillbedenotedbyx1,x2,x3,…,xn1.2PictorialandTabularMethodinDescriptiveStatistics25Stem-and-LeafDisplaysSupposewehaveanumericaldatasetx1,x2,x3,…,xnforwhicheachxiconsistsofatleasttwodigits.StepsforconstructingaStem-and-LeafDisplaySelectoneormoreleadingdigitsforthestemvalues.Thetrailingdigitsbecometheleaves.Listpossiblestemvaluesinaverticalcolumn.Recordtheleafforeveryobservationbesidethecorrespondingstemvalue.Indicatetheunitsforstemsandleavessomeplaceinthedisplay.1.2PictorialandTabularMethodinDescriptiveStatistics26Example:Observations:16%,33%,64%,37%,31%…

Stem-and-LeafDisplayStem|Leaf1|63|371[or3|137]6|4

1.2PictorialandTabularMethodinDescriptiveStatistics27Stem:tensdigitLeaf:onesdigitExample1.5ThefollowingFigureshowsastem-andleafdisplayof140values(colleges)ofx=thepercentageofundergraduatestudentswhoarebingedrinkers.1.2PictorialandTabularMethodinDescriptiveStatistics28041134567888921223456666777889999301122333445566667777788889999941112222233444455666666777888888999500111222233455666667777888899601111244455666778Stem:tensdigitLeaf:onesdigitAstem-and-leafdisplayconveysinformationaboutthefollowingaspectsofthedata:IdentificationofatypicalorrepresentativevalueExtentofspreadaboutthetypicalvaluePresenceofanygapsinthedataExtentofsymmetryinthedistributionofvaluesNumberandlocationofpeaksPresenceofanyoutlyingvalues1.2PictorialandTabularMethodinDescriptiveStatistics29Example1.61.2PictorialandTabularMethodinDescriptiveStatistics30|35643370|26270683|059414|90700098704513|90707350|00273604|510511405022|316968051365|8009|435464433470…904|051005011040…209Stem:ThousandsandhundredsdigitsLeaf:TensandonesdigitsStem:ThousandsdigitsLeaf:Hundreds,tensandonesdigitsExample1.7(repeatedstems)

1.2PictorialandTabularMethodinDescriptiveStatistics315H|55L|2423304H|7688964L|214214144443H|9696656Stem:tensdigitLeaf:onesdigit=5|2423305|214214144447688963|9696656Stem:tensdigitLeaf:onesdigitNote:L:theleafsare0,1,2,3or4H:theleafsare5,6,7,8or9DotplotthedatasetisreasonablysmallortherearerelativelyfewdistinctdatavaluesEachobservationisrepresentedbyadotabovethecorrespondinglocationonahorizontalmeasurementscale.Whenavalueoccursmorethanonce,thereisadotforeachoccurrence,andthesedotsarestackedvertically.Aswithastem-and-leafdisplay,adotplotgivesinformationaboutlocation,spread,extremes&gaps.1.2PictorialandTabularMethodinDescriptiveStatistics32Example1.81.2PictorialandTabularMethodinDescriptiveStatistics33304050607080844961408367456670698058686067727370576370785267536775617081767975765831HistogramTypesofvariables:Discretevariable:Avariableisdiscrete

ifitssetofpossiblevalueseitherisfiniteorelsecanbelistedinaninfinitesequence.Continuousvariable:Avariableiscontinuous

ifitspossiblevaluesconsistofanentireintervalonthenumberline.1.2PictorialandTabularMethodinDescriptiveStatistics34ConstructingaHistogramforDiscreteDataThreeSteps:Determinethefrequency(orrelativefrequency)ofeachxvalue.Markpossiblexvaluesonahorizontalscale.Drawarectanglewhoseheightisthefrequency(orrelativefrequency)ofthevalue.1.2PictorialandTabularMethodinDescriptiveStatistics35ExampleSupposethatourdatasetconsistsof200observationsonx=thenumberofmajordefectsinanewcarofacertaintype.If70ofthesexare1,thenfrequencyofthexvalue1:70relativefrequencyofthexvalue1:70/200=0.35

Note:1.2PictorialandTabularMethodinDescriptiveStatistics36Example1.91.2PictorialandTabularMethodinDescriptiveStatistics37Example1.91.2PictorialandTabularMethodinDescriptiveStatistics38010200.050.10ContinuousCase

p17.Supportthatwehave50observationsonx=fuelefficiencyofanautomobile(mpg),thesmallestofwhichis27.8andthelargestofwhichis31.41.2PictorialandTabularMethodinDescriptiveStatistics3927.528.028.529.029.530.030.531.031.5Classintervals:ContinuesDiscreteEqualorUnequalwidthConstructingaHistogramforContinuousData:Equal(orUnequal)ClassWidthsSimilartothediscretecase

Makesurethat:classwidth×rectangleheight(density)=relativefrequencyoftheclass1.2PictorialandTabularMethodinDescriptiveStatistics40TypicalHistogramShapes1.2PictorialandTabularMethodinDescriptiveStatistics41SymmetricUnimodalBimodalPositivelySkewedNegativeSkewedMultivariateDataTheabovementionedtechniqueshavebeenexclusivelyforsituationsinwhicheachobservationinadatasetiseitherasinglenumberorasinglecategory.PleaserefertoChapters11-14foranalyzingmultivariatedatasets.1.2PictorialandTabularMethodinDescriptiveStatistics42Ex.11,Ex.14,Ex.20,Ex.26Homework43TheMean

Samplemean:Thesamplemeanofobservationsx1,x2,…,xnisgivenbySamplemedian:Thesamplemediaisobtainedbyfirstorderingthenobservationsfromsmallesttolargest.Then1.3MeasuresofLocation44Example1.13(Samplemean)

x1=16.1x2=9.6x3=24.9x4=20.4x5=12.7x6=21.2x7=30.2x8=25.8x9=18.5x10=10.3x11=25.3x12=14.0x13=27.1x14=45.0x15=23.3x16=24.2x17=14.6x18=8.9x19=32.4x20=11.8x21=28.51.3MeasuresofLocation450H|96891L|27034046181H|61852L|49041233422H|585371853L|02243H|4L|4H|5010203040OutlyingvalueExample1.14(Median)

x1=15.2x2=9.3x3=7.6x4=11.9x5=10.4x6=9.7x7=20.4x8=9.4x9=11.5x10=16.2x11=9.4x12=8.3Thelistoforderedvaluedis10.411.511.9n=12iseven,thenthesamplemedianis(9.7+10.4)/2=10.05Note:thesamplemeanhereis139.3/12=MeasuresofLocation46Threedifferentsharpsforapopulationdistribution1.3MeasuresofLocation47SymmetricUnimodalPositivelySkewedNegativeSkeweduu~u~u=uu~u:Populationmeanu:Populationmedian~Why?OtherMeasuresofLocationQuartilesPercentiles

1.3MeasuresofLocation48…MedianQuartiles…1%MaybeoutlyingdataTrimmedMeansAtrimmedmeanisacompromisebetweensamplemean&samplemedian.A10%trimmedmean,forexample,wouldbecomputedbyeliminatingthesmallest10%andthelargest10%ofthesampleandthenaveragingwhatisleftover.1.4MeasuresofLocation49…10%10%SampleMeanExample1.15612623666744883898964970983100310161022102910581085108811221135119712011.4MeasuresofLocation5060080010001200x~x_Xtr(10)_RemovalRemovalNote:Trimmingproportion:5%~25%Ex.34,Ex.36,Ex.40Homework51Timeerrorforthreetypeofwatches9observationsforeachtype1.4MeasuresofVariability52*********-20-100+20+10123Q:Whichtypeisthebest?Andwhy?TheRange

Thedifferencebetweenthelargestandsmallestsamplevalues.Refertothepreviousexample,type1and2haveidenticalranges,however,thereismuchlessvariabilityinthesecondsamplethaninthefirst.Deviationsfromthemean

Measure1:x1-mean,x2-mean,…,xn-mean,thenforallcases

1.4MeasuresofVariability53Samplevariance

Thesamplevariance,denotedbys2,isgivenbyThesamplestandarddeviation,denotedbys,isthesquarerootofthevariances=sqrt(s2).

Q1:vs.

Q2:n-1vs.n

1.4MeasuresofVariability54Example1.161.4MeasuresofVariability55xixi-(xi-)20.6840.98410.96852.540.87190.76020.924-0.74410.55373.131.46192.13721.038-0.63010.39700.598-1.07011.14510.483-1.18511.40453.521.85193.42951.285-0.38310.14682.650.98190.96411.497-0.17110.0293PopulationvarianceWewilluseσ2todenotethepopulationvarianceandσtodenotethepopulationstandarddeviation.WhenthepopulationisfiniteandconsistsofNvalues,1.4MeasuresofVariability56Considerapopulationwithjust3elements{1,2,3}ThemeanofthepopulationisAndthevarianceSupposeallwecantakeisasampleof2elementstakenwithrepetitiontolearnaboutthepopulation.Wewouldlikethesampletoaccuratelyestimatethemeanandvariancevaluesofthepopulation.1.4MeasuresofVariability571.4MeasuresofVariability58PossibleSamplesofSizeTwoSamplemeanxs2

usingn=2s2

usingn–1=1{1,1}10/20/1{2,2}20/20/1{3,3}30/20/1{1,2}1.5.5/2=.25.5/1=.5(2,1)1.5.5/2=.25.5/1=.5{1,3}22/2=1.02/1=2(3,1)22/2=1.02/1=2{2,3}2.5.5/2=.25.5/1=.5(3,2)2.5.5/2=.25.5/1=.5AverageofSampleStatistics21/32/3Betterestimate!Analterexpressionforthenumeratorofs2

Ify1=x1+c,y2=x2+c,…,yn=xn+c,thensy2=sx2Ify1=cx1,y2=cx2,…..,yn=cxn,thensy2=c2sx2,sy=|c|sx,wheresx2isthesamplevarianceofthex’sandsy2isthesamplevarianceofthey’s.1.4MeasuresofVariability59BecareoftheroundingerrorswhenusingthetwodifferentexpressionsBoxplotsDescribeseveralofadataset’smostprominentfeatures:center;spread;extentandnatureofanydeparturefromsymmetry;identificationof“outliers

”,observationsthatli

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论