




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Chapter1.OverviewandDescriptiveStatistics
WeiqiLuo(骆伟祺)SchoolofSoftwareSunYat-SenUniversityEmail:weiqi.luo@Office:#A313Textbook:
JayL.Devore,Probabilityandstatisticsforengineeringandthesciences(the6th
Edition),ChinaMachinePress,2011References:
1.MillerandFreund,“ProbabilityandStatisticsforEngineers”(the7thEdition),PublishingHouseofElectronicsIndustry,2005.2.盛骤、谢式千、潘承毅,《概率论与数理统计》第4版,高等教育出版社,2008
KaiLaiChung,“ACourseinProbabilityTheory”,(the3rdEdition),ChinaMachinePress,2010.2MATLABApowerfulsoftwarewithvarioustoolboxes,including
StatisticsToolboxImageProcessingToolboxSignalprocessingToolboxRobustControlToolboxCurveFittingToolboxFuzzyLogicToolbox
…
3PrerequisiteCoursesSE-101AdvancedMathematicsSE-103LinearAlgebraSuccessiveCoursesSE-328DigitalSignalProcessingSE-343DigitalImageProcessingSE-352InformationSecurityPatternRecognition&Machinelearningetc.4UncertaintyItcanbeassessedinformallyusingthelanguagesuchas“itisunlikely”or“probably”.
5WhatisUncertainty?Thissciencecameofgamblingin7thcenturyProbabilitymeasuresuncertaintyformally,quantitatively.It
isthemathematicallanguageofuncertainty.Statisticsshowsomeusefulinformationfromtheuncertaindata,andprovidethebasisformakingdecisionsorchoosingactions.
6WhyStudyProbability&Statistics?WeatherForecastApplications7InmedicaltreatmentApplications8e.g.RelationshipbetweensmokingandlungcancerBirthdayParadox(fromWikipedia)Applications9Benford’sLaw/FirstDigitLaw(fromWikipedia)Applications10AccountingForensicsMultimediaForensics…TimeSeriesAnalysisApplications11EconomicForecastingSalesForecastingBudgetaryAnalysisStockMarketAnalysisProcessandQualityControlInventoryStudiesetc.Moreinterestingapplicationsinreallife
/v_playlist/f1486775o1p0.htmlApplications121.1.Populations,Samples,andProcesses1.2.PictorialandTabularMethodsinDescriptiveStatistics1.3MeasuresofLocation1.4.MeasuresofVariability13Chapter1:Overview&DescriptiveStatisticsPopulationAninvestigationwilltypicallyfocusonawell-definedcollectionofobjects(units).Andapopulationisthesetofallobjectsofinterestinaparticularstudy.VariablesAnycharacteristicwhosevalue(categoricalornumerical)maychangefromoneobjecttoanotherinthepopulation.141.1.Populations,Samples,andProcessesPopulationUnit/ObjectVariables/CharacteristicsAllstudentscurrentlyintheclassStudentHeightWeightHoursofworkperweekRight/left–handedAllPrintedcircuitboardsmanufacturedduringamonthBoardTypeofdefectsNumberofdefectsLocationofdefeatsAllcampusfastfoodrestaurantsRestaurantNumberofemployeesSeatingcapacityHiring/nothiringAllbooksinlibraryBookReplacementcostFrequencyofcheckoutRepairsneeds1.1.Populations,Samples,andProcesses15ExamplesofPopulations,ObjectsandvariablesSampleAsubsetofthepopulation1.1.Populations,Samples,andProcesses16PopulationSampleAccordingtothenumberofthevariablesunderinvestigation,wehaveUnivariate:asinglevariable,e.g.thetypeoftransmission,automaticormanual,oncarsBivariate:twovariables,e.g.theheight&weightofthestudentsMultivariate:morethantwovariables,e.g.systolicbloodpressure,diastolicbloodpressureandserumcholesterollevelforeachpatient
1.1.Populations,Samples,andProcesses17DescriptivestatisticsAninvestigatorwhohascollecteddatamaywishsimplytosummarizeanddescribeimportantfeaturesofthedata.Thisentailsusingmethodsfromdescriptivestatistics
Graphicalmethods(Sec.1.2),
e.g.
Stem-and-Leafdisplay,Dotplot&histogramsNumericalsummarymeasures(Sec.1.3,1.4),
e.g.
means,standarddeviations&correlationscoefficients1.1.Populations,Samples,andProcesses18Example1.1.
Hereisdataconsistingofobservationsonx=O-ringtemperatureforeachtestfiringoractuallaunchoftheshuttlerocketengine.
1.1.Populations,Samples,andProcesses19844961408367456670698058686067727370576370785267536775617081767975765831NormalizedHistogram1.1.Populations,Samples,andProcesses202535455565758520%10%30%40%Thepercentageofthetemperatureslocatedinthebin[25,35]Inferentialstatistics
Usesampleinformationtodrawsometypeofconclusion(makeaninferenceofsomesort)aboutthepopulation.PointEstimation----Chapter6Hypothesistesting----Chapter8Estimationbyconfidenceinterval---Chapter7…1.1.Populations,Samples,andProcesses21Probability&Statistics
1.1.Populations,Samples,andProcesses22PopulationSampleDeductiveReasoning(Probability)InductiveReasoning(InferentialStatistics)Themathematicallanguageis“Probability”CollectingDataIfdataisnotproperlycollected,aninvestigatormaynotbeabletoanswerthequestionsunderconsiderationwithareasonabledegreeofconfidence.MethodsforcollectingdataRandomsampling:anyparticularsubsetofthespecifiedsizehasthesamechanceofbeingselectedStratifiedsampling:entailsseparatingthepopulationunitsintonon-overlappinggroupsandtakingasamplefromeachone.1.1.Populations,Samples,andProcesses23DescriptiveStatisticsVisualtechniques(Sec.1.2)Stem-and-LeafDisplaysDotplotsHistogramNumericalsummarymeasures(Sec.1.3&1.4)MeasuresoflocationMeasureofvariability24Notation
Samplesize:Thenumberofobservationsinasinglesamplewilloftenbedenotedbyn.
Givenadatasetconsistingofn
observationsonsomevariablex,theindividualobservationswillbedenotedbyx1,x2,x3,…,xn1.2PictorialandTabularMethodinDescriptiveStatistics25Stem-and-LeafDisplaysSupposewehaveanumericaldatasetx1,x2,x3,…,xnforwhicheachxiconsistsofatleasttwodigits.StepsforconstructingaStem-and-LeafDisplaySelectoneormoreleadingdigitsforthestemvalues.Thetrailingdigitsbecometheleaves.Listpossiblestemvaluesinaverticalcolumn.Recordtheleafforeveryobservationbesidethecorrespondingstemvalue.Indicatetheunitsforstemsandleavessomeplaceinthedisplay.1.2PictorialandTabularMethodinDescriptiveStatistics26Example:Observations:16%,33%,64%,37%,31%…
Stem-and-LeafDisplayStem|Leaf1|63|371[or3|137]6|4
1.2PictorialandTabularMethodinDescriptiveStatistics27Stem:tensdigitLeaf:onesdigitExample1.5ThefollowingFigureshowsastem-andleafdisplayof140values(colleges)ofx=thepercentageofundergraduatestudentswhoarebingedrinkers.1.2PictorialandTabularMethodinDescriptiveStatistics28041134567888921223456666777889999301122333445566667777788889999941112222233444455666666777888888999500111222233455666667777888899601111244455666778Stem:tensdigitLeaf:onesdigitAstem-and-leafdisplayconveysinformationaboutthefollowingaspectsofthedata:IdentificationofatypicalorrepresentativevalueExtentofspreadaboutthetypicalvaluePresenceofanygapsinthedataExtentofsymmetryinthedistributionofvaluesNumberandlocationofpeaksPresenceofanyoutlyingvalues1.2PictorialandTabularMethodinDescriptiveStatistics29Example1.61.2PictorialandTabularMethodinDescriptiveStatistics30|35643370|26270683|059414|90700098704513|90707350|00273604|510511405022|316968051365|8009|435464433470…904|051005011040…209Stem:ThousandsandhundredsdigitsLeaf:TensandonesdigitsStem:ThousandsdigitsLeaf:Hundreds,tensandonesdigitsExample1.7(repeatedstems)
1.2PictorialandTabularMethodinDescriptiveStatistics315H|55L|2423304H|7688964L|214214144443H|9696656Stem:tensdigitLeaf:onesdigit=5|2423305|214214144447688963|9696656Stem:tensdigitLeaf:onesdigitNote:L:theleafsare0,1,2,3or4H:theleafsare5,6,7,8or9DotplotthedatasetisreasonablysmallortherearerelativelyfewdistinctdatavaluesEachobservationisrepresentedbyadotabovethecorrespondinglocationonahorizontalmeasurementscale.Whenavalueoccursmorethanonce,thereisadotforeachoccurrence,andthesedotsarestackedvertically.Aswithastem-and-leafdisplay,adotplotgivesinformationaboutlocation,spread,extremes&gaps.1.2PictorialandTabularMethodinDescriptiveStatistics32Example1.81.2PictorialandTabularMethodinDescriptiveStatistics33304050607080844961408367456670698058686067727370576370785267536775617081767975765831HistogramTypesofvariables:Discretevariable:Avariableisdiscrete
ifitssetofpossiblevalueseitherisfiniteorelsecanbelistedinaninfinitesequence.Continuousvariable:Avariableiscontinuous
ifitspossiblevaluesconsistofanentireintervalonthenumberline.1.2PictorialandTabularMethodinDescriptiveStatistics34ConstructingaHistogramforDiscreteDataThreeSteps:Determinethefrequency(orrelativefrequency)ofeachxvalue.Markpossiblexvaluesonahorizontalscale.Drawarectanglewhoseheightisthefrequency(orrelativefrequency)ofthevalue.1.2PictorialandTabularMethodinDescriptiveStatistics35ExampleSupposethatourdatasetconsistsof200observationsonx=thenumberofmajordefectsinanewcarofacertaintype.If70ofthesexare1,thenfrequencyofthexvalue1:70relativefrequencyofthexvalue1:70/200=0.35
Note:1.2PictorialandTabularMethodinDescriptiveStatistics36Example1.91.2PictorialandTabularMethodinDescriptiveStatistics37Example1.91.2PictorialandTabularMethodinDescriptiveStatistics38010200.050.10ContinuousCase
p17.Supportthatwehave50observationsonx=fuelefficiencyofanautomobile(mpg),thesmallestofwhichis27.8andthelargestofwhichis31.41.2PictorialandTabularMethodinDescriptiveStatistics3927.528.028.529.029.530.030.531.031.5Classintervals:ContinuesDiscreteEqualorUnequalwidthConstructingaHistogramforContinuousData:Equal(orUnequal)ClassWidthsSimilartothediscretecase
Makesurethat:classwidth×rectangleheight(density)=relativefrequencyoftheclass1.2PictorialandTabularMethodinDescriptiveStatistics40TypicalHistogramShapes1.2PictorialandTabularMethodinDescriptiveStatistics41SymmetricUnimodalBimodalPositivelySkewedNegativeSkewedMultivariateDataTheabovementionedtechniqueshavebeenexclusivelyforsituationsinwhicheachobservationinadatasetiseitherasinglenumberorasinglecategory.PleaserefertoChapters11-14foranalyzingmultivariatedatasets.1.2PictorialandTabularMethodinDescriptiveStatistics42Ex.11,Ex.14,Ex.20,Ex.26Homework43TheMean
Samplemean:Thesamplemeanofobservationsx1,x2,…,xnisgivenbySamplemedian:Thesamplemediaisobtainedbyfirstorderingthenobservationsfromsmallesttolargest.Then1.3MeasuresofLocation44Example1.13(Samplemean)
x1=16.1x2=9.6x3=24.9x4=20.4x5=12.7x6=21.2x7=30.2x8=25.8x9=18.5x10=10.3x11=25.3x12=14.0x13=27.1x14=45.0x15=23.3x16=24.2x17=14.6x18=8.9x19=32.4x20=11.8x21=28.51.3MeasuresofLocation450H|96891L|27034046181H|61852L|49041233422H|585371853L|02243H|4L|4H|5010203040OutlyingvalueExample1.14(Median)
x1=15.2x2=9.3x3=7.6x4=11.9x5=10.4x6=9.7x7=20.4x8=9.4x9=11.5x10=16.2x11=9.4x12=8.3Thelistoforderedvaluedis10.411.511.9n=12iseven,thenthesamplemedianis(9.7+10.4)/2=10.05Note:thesamplemeanhereis139.3/12=MeasuresofLocation46Threedifferentsharpsforapopulationdistribution1.3MeasuresofLocation47SymmetricUnimodalPositivelySkewedNegativeSkeweduu~u~u=uu~u:Populationmeanu:Populationmedian~Why?OtherMeasuresofLocationQuartilesPercentiles
1.3MeasuresofLocation48…MedianQuartiles…1%MaybeoutlyingdataTrimmedMeansAtrimmedmeanisacompromisebetweensamplemean&samplemedian.A10%trimmedmean,forexample,wouldbecomputedbyeliminatingthesmallest10%andthelargest10%ofthesampleandthenaveragingwhatisleftover.1.4MeasuresofLocation49…10%10%SampleMeanExample1.15612623666744883898964970983100310161022102910581085108811221135119712011.4MeasuresofLocation5060080010001200x~x_Xtr(10)_RemovalRemovalNote:Trimmingproportion:5%~25%Ex.34,Ex.36,Ex.40Homework51Timeerrorforthreetypeofwatches9observationsforeachtype1.4MeasuresofVariability52*********-20-100+20+10123Q:Whichtypeisthebest?Andwhy?TheRange
Thedifferencebetweenthelargestandsmallestsamplevalues.Refertothepreviousexample,type1and2haveidenticalranges,however,thereismuchlessvariabilityinthesecondsamplethaninthefirst.Deviationsfromthemean
Measure1:x1-mean,x2-mean,…,xn-mean,thenforallcases
1.4MeasuresofVariability53Samplevariance
Thesamplevariance,denotedbys2,isgivenbyThesamplestandarddeviation,denotedbys,isthesquarerootofthevariances=sqrt(s2).
Q1:vs.
Q2:n-1vs.n
1.4MeasuresofVariability54Example1.161.4MeasuresofVariability55xixi-(xi-)20.6840.98410.96852.540.87190.76020.924-0.74410.55373.131.46192.13721.038-0.63010.39700.598-1.07011.14510.483-1.18511.40453.521.85193.42951.285-0.38310.14682.650.98190.96411.497-0.17110.0293PopulationvarianceWewilluseσ2todenotethepopulationvarianceandσtodenotethepopulationstandarddeviation.WhenthepopulationisfiniteandconsistsofNvalues,1.4MeasuresofVariability56Considerapopulationwithjust3elements{1,2,3}ThemeanofthepopulationisAndthevarianceSupposeallwecantakeisasampleof2elementstakenwithrepetitiontolearnaboutthepopulation.Wewouldlikethesampletoaccuratelyestimatethemeanandvariancevaluesofthepopulation.1.4MeasuresofVariability571.4MeasuresofVariability58PossibleSamplesofSizeTwoSamplemeanxs2
usingn=2s2
usingn–1=1{1,1}10/20/1{2,2}20/20/1{3,3}30/20/1{1,2}1.5.5/2=.25.5/1=.5(2,1)1.5.5/2=.25.5/1=.5{1,3}22/2=1.02/1=2(3,1)22/2=1.02/1=2{2,3}2.5.5/2=.25.5/1=.5(3,2)2.5.5/2=.25.5/1=.5AverageofSampleStatistics21/32/3Betterestimate!Analterexpressionforthenumeratorofs2
Ify1=x1+c,y2=x2+c,…,yn=xn+c,thensy2=sx2Ify1=cx1,y2=cx2,…..,yn=cxn,thensy2=c2sx2,sy=|c|sx,wheresx2isthesamplevarianceofthex’sandsy2isthesamplevarianceofthey’s.1.4MeasuresofVariability59BecareoftheroundingerrorswhenusingthetwodifferentexpressionsBoxplotsDescribeseveralofadataset’smostprominentfeatures:center;spread;extentandnatureofanydeparturefromsymmetry;identificationof“outliers
”,observationsthatli
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2 回延安2024-2025学年八年级下册语文同步教学设计(统编版)
- 4不做“小马虎”(教学设计)-2023-2024学年道德与法治一年级下册统编版
- 商业合作合同示范文本
- 2-1《立在地球边上放号》教学设计 2024-2025学年统编版高中语文必修上册
- 2025年电商大数据项目合作计划书
- 果山租赁合同范本
- 厂房加固合同范本
- 链家自如合同范本
- 8古诗二首 登鹳雀楼 教学设计-2024-2025学年语文二年级上册统编版
- 汽配产品合同范本
- 郑州2025年河南郑州市公安机关招聘辅警1200人笔试历年参考题库附带答案详解
- 2025年语文高考复习计划解析
- 微电网运行与控制策略-深度研究
- 中职高教版(2023)语文职业模块-第五单元:走近大国工匠(一)展示国家工程-了解工匠贡献【课件】
- 物业管理车辆出入管理制度
- 《从零到卓越- 创新与创业导论》教案
- 《数学课程标准》义务教育2022年修订版(原版)
- DL∕T 5210.4-2018 电力建设施工质量验收规程 第4部分:热工仪表及控制装置
- 2024年江苏农牧科技职业学院单招职业适应性测试题库附答案
- GB/T 14800-2010土工合成材料静态顶破试验(CBR法)
- 危险废物利用和处置方式代码表
评论
0/150
提交评论