统计学专业英语教程全套完整教学课件_第1页
统计学专业英语教程全套完整教学课件_第2页
统计学专业英语教程全套完整教学课件_第3页
统计学专业英语教程全套完整教学课件_第4页
统计学专业英语教程全套完整教学课件_第5页
已阅读5页,还剩234页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Unit1Statistics1.1WhatisStatistics?1.1.1MeaningsofStatisticsPluralSense:Inpluralsense,thewordstatisticsrefertonumericalfactsandfigurescollectedinasystematicmannerwithadefinitepurposeinanyfieldofstudy.SingularSense:Insingularsense,itreferstothesciencecomprisingmethodswhichareusedincollection,analysis,interpretationandpresentationofnumericaldata.Thesemethodsareusedtodrawconclusionaboutthepopulationparameter.PluralofWord“Statistic”:Thewordstatisticsisusedasthepluraloftheword“Statistic”whichreferstoanumericalquantitylikemean,median,varianceetc…,calculatedfromsamplevalue.全套可编辑PPT课件1.1.2DefinitionofStatisticsStatisticsbothinthesingularandthepluralsensehasbeencombinedinthefollowingdefinitionwhichisacceptedasthemoderndefinitionofstatistics.“statisticsarethenumericalstatementoffactscapableofanalysisandinterpretationandthescienceofstatisticsisthestudyoftheprinciplesandthemethodsappliedincollecting,presenting,analysisandinterpretingthenumericaldatainanyfieldofinquiry.”1.1.3TypesofStatisticsThefieldofstatisticscanberoughlysubdividedintotwoareas:descriptivestatisticsandinferentialstatistics.“Descriptivestatistics”

iswhatmostpeoplethinkofwhentheyhearthewordstatistics.Itincludesthecollection,presentation,anddescriptionofsampledata.Theterm“inferentialstatistics”referstothedescriptivetechniquesandmakingdecisionanddrawingconclusionsaboutthepopulation.1.1.4ApplicationsofStatistics1.2ThelanguageofStatistics1.2.1PopulationandSampleThepopulationisthecompletecollectionofindividualorobjectsareofinteresttothesamplecollector.Thepopulationofconcernmustbecarefullydefinedandisconsideredfullydefinedonlywhenitsmembershiplistofelementsisspecified.Thesetof“allstudentswhohaveeverattendedaU.S.college”isanexampleofawell-definedpopulation.Largepopulationsaredifficulttostudy;therefore,itiscustomarytoselectasample,orasubsetofapopulation,andstudydatainsample.Asampleconsistsoftheindividuals,objects,ormeasurementsselectedformthepopulationbythesamplecollector.1.2.2KindsofVariablesHerearebasicallytwokindsofvariables:(1)qualitativevariablesresultininformationthatdescribesorcategorizesanelementofapopulation,and(2)quantitativevariablesresultininformationthatquantifiesanelementofapopulation.Qualitativevariablesmaybecharacterizedasnominalorordinal.Anominalvariableisaqualitativevariablethatcharacterizes(ordescribes,ornames)anelementofapopulation.Anordinalvariableisaqualitativevariablethatincorporatesanorderedposition,orranking.Quantitativecanbesubdividedintotwoclassifications:discretevariablesandcontinuousvariables.Adiscretevariableisaquantitativevariablethatcanassumeacountablenumberofvalues.Acontinuousvariableisaquantitativevariablethatcanassumeanuncountablenumberofvalues.1.3MeasurabilityandVariablityOneoftheprimaryobjectivesofstatisticalanalysisismeasuringvariability.Forexample,inthestudyofqualitycontrol,measuringvariabilityisabsolutelyessential.Controlling(orreducing)thevariabilityinamanufacturingprocessisafieldallitsownnamely,statisticalprocesscontrol.1.4DataCollectionItisimportanttoobtain“gooddata”becausetheinferencesultimatelymadewillbebasedonthestatisticsobtainedfromthesedata.1.4.1TheDataCollectionProcessThecollectionofdataforstatisticalanalysisisaninvolvedprocessandincludesthefollowingsteps:Definetheobjectivesofthesurveyorstudy.Definethevariableandthepopulationofinterest.Definethedatacollectionanddatameasuringschemes.Collectyoursample.Reviewthesamplingprocessuponcompletionofcollection.1.4.2SamplingFrameandElementsIdeally,thesamplingframeshouldbeidenticaltothepopulation,witheveryelementofthepopulationincludedonceandonlyonce.Oncearepresentativesamplingframehasbeenestablished,weproceedwithselectingthesampleelementsfromthesamplingframe.Thisselectionprocessiscalledthesampledesign.Therearemanydifferenttypesofsampledesigns;however,theyallfitintotwocategories:judgmentsamplesandprobabilitysamples.

Judgmentsamplesaresamplesthatareselectedonthebasisofbeingjudged“typical.”Probabilitysamplesaresamplesinwhichtheelementstobeselectedaredrawnonthebasisofprobability.Figuresamplingframe=population1.5Single-StageMethodsSingle-stagesamplingisasampledesigninwhichtheelementsofthesamplingframearetreatedequallyandthereisnosubdividingorpartitioningoftheframe.1.5.1SimpleRandomSampleAsampleselectedinsuchawaythateveryelementinthepopulationorsamplingframehasanequalprobabilityofbeingchosen,Equivalently,allsamplesofsizenhaveanequalchanceofbeingselected.1.5.2SystematicSampleAsampleinwhicheverykthitemofthesamplingframeisselected,startingfromafirstelement,whichisrandomlyselectedfromthefirstkelements.1.6MultistageMethodsMultistagerandomsamplingisasampledesigninwhichtheelementsofthesamplingframearesubdividedandthesampleischoseninmorethanonestage.Multistagesamplingdesignsoftenstartbydividingaverylargepopulationintosubpopulationsonthebasisofsomecharacteristic.Thesesubpopulationsarecalledstrata.Thesesmaller,easier-to-work-withstratacanthenbesampledseparately.Onesuchsampledesignisthestratifiedrandomsamplingmethod.Astratifiedrandomsampleresultswhenthepopulation,orsamplingframe,issubdividedintovariousstrata,usuallysomealreadyoccurringnaturalsubdivision,andthenasubsampleisdrawnfromeachofthesestrata.Aclustersampleisanothermultistagedesign.Aclustersampleisobtainedbystratifyingthepopulation,orsamplingframe,andthenselectingsomeoralloftheitemsfromsome,butnotall,ofthestrata.1.7TypesofStatisticalStudyInanobservationalstudy,researchersobserveormeasurecharacteristicsofthesamplemembersbutdonotattempttoinfluenceormodifythesecharacteristics.Inanexperimentstudy,researchersapplyatreatmenttosomeorallofthesamplemembersandthenlooktoseewhetherthetreatmenthasanyeffects.Inanexperiment,itisimportantforthetreatmentandcontrolgroupstobeselectedrandomlyandtobealikeinallrespectsexceptforthetreatment.Forexperimentsinvolvingpeople,usingatreatmentandacontrolgroupmightnotbeenoughtogetreliableresults.However,aslongastheparticipantsdon’tknowwhethertheyareinthetreatmentorcontrolgroup(thatis,whethertheygottherealpillsortheplacebo),anyeffectarisingfrompsychologicalfactors—knownasaplaceboeffect—shouldaffectbothgroupsequally.Instatisticalterminology,thepracticeofkeepingpeopleinthedarkaboutwhoisinthetreatmentgroupandwhoisinthecontrolgroupiscalledblinding.Asingle-blindexperimentisoneinwhichtheparticipantsdon’tknowwhichgrouptheybelongto,buttheexperimenters(thepeopleadministeringthetreatment)doknow.Sometimes,asingle-blindexperimentcanstillbeunreliableiftheexperimenterscansubtlyinfluenceoutcomes.Thistypeofproblemcanbeavoidedbymakingtheexperimentdouble-blind,whichmeansneithertheparticipantsnortheexperimentersknowwhobelongstoeachgroup.1.8TheProcessofaStatisticalStudyBasicStepsinaStatisticalStudy:Statethegoalofyourstudyprecisely.Thatis,determinethepopulationyouwanttostudyandexactlywhatyou’dliketolearnaboutit.Choosearepresentativesamplefromthepopulation.Collectrawdatafromthesampleandsummarizethesedatabyfindingsamplestatisticsofinterest.Usethesamplestatisticstoinferthepopulationparameters.Drawconclusions:Determinewhatyoulearnedandwhetheryouachievedyourgoal.Unit2DescriptiveAnalysisofSingle-VariableData2.1Graphs,ParetoDiagrams,andStem-and-LeafDisplays2.1.1QualitativeDataCirclegraphs(piediagrams):Graphsthatshowtheamountofdatabelongingtoeachcategoryasaproportionalpartofacircle.Bargraphs:Graphsthatshowtheamountofdatabelongingtoeachcategoryasaproportionallysizedrectangulararea.Paretodiagram:Abargraphwiththebarsarrangedfromthemostnumerouscategorytotheleastnumerouscategory.Itincludesalinegraphdisplayingthecumulativepercentagesandcountsforthebars.2.1.2QuantitativeDataDistribution:Thepatternofvariabilitydisplayedbythedataofavariable.Thedistributiondisplaysthefrequencyofeachvalueofthevariable.Twopopularmethodsfordisplayingdistributionofqualitativedataarethedotplotandthestem-and-leafdisplay.Dotplotdisplay:Displaysthedataofasamplebyrepresentingeachdatawithadotpositionedalongascale.Thisscalecanbeeitherhorizontalorvertical.Thefrequencyofthevaluesisrepresentedalongtheotherscale.Stem-and-leafdisplay:Adisplayofthedataofasampleusingtheactualdigitsthatmakeupthedatavalues.Eachnumericalvalueisdividedintotwoparts:Theleadingdigit(s)becomesthestem,andthetrailingdigit(s)becomestheleaf.Thestemsarelocatedalongthemainaxis,andaleafforeachdatavalueislocatedsoastodisplaythedistributionofthedata.2.2FrequencyDistributionsandHistograms2.2.1FrequencyDistributionFrequency:Thenumberoftimesthevaluexoccursinthesample.Frequencydistribution:Alisting,oftenexpressedinchartform,whichpairsvaluesofavariablewiththeirfrequency.BasicGuidelinesforConstructingaGroupedFrequencyDistribution:Eachclassshouldbeofthesamewidth.Classes(sometimescalledbins)shouldbesetupsothattheydonotoverlapandsothateachdatavaluebelongstoexactlyoneclass.Fortheexamplesandexercisesassociatedwiththistextbook,5to12classesaremostdesirable,sinceallsamplescontainfewerthan125datavalues.(Thesquarerootofnisareasonableguidelineforthenumberofclasseswithsamplesoffewerthan125data.)Useasystemthattakesadvantageofanumberpatterntoguaranteeaccuracy.(Thisisdemonstratedbelow.)Whenitisconvenient,aneven-numberedclasswidthisoftenadvantageous.2.2.2HistogramsOnewaystatisticiansvisuallyrepresentfrequencycountsofaquantitativevariableistouseabargraphcalledahistogram.

Ahistogramismadeupofthreecomponents:Atitle,whichidentifiesthepopulationorsampleofconcern.Averticalscale,whichidentifiesthefrequenciesinthevariousclasses.Ahorizontalscale,whichidentifiesthevariablex.Valuesfortheclassboundariesorclassmidpointsmaybelabeledalongthex-axis.Usewhichevermethodoflabelingtheaxisbestpresentsthevariable.Classmidpoint(classmark):Thenumericalvaluethatisexactlyinthemiddleofeachclass.Histogram:Abargraphthatrepresentsafrequencydistributionofaquantitativevariable.Briefly,thetermsusedtodescribehistogramsareasfollows:Symmetrical:Bothsidesofthisdistributionareidentical(halvesaremirrorimages).Normal(triangular):Asymmetricaldistributionismoundedupaboutthemeanandbecomessparseattheextremes.(Additionalpropertiesarediscussedlater.)Uniform(rectangular):Everyvalueappearswithequalfrequency.Skewed:Onetailisstretchedoutlongerthantheother.Thedirectionofskewnessisonthesideofthelongertail.J-shaped:Thereisnotailonthesideoftheclasswiththehighestfrequency.Bimodal:Thetwomostpopulousclassesareseparatedbyoneormoreclasses.Thissituationoftenimpliesthattwopopulationsarebeingsampled.Notes:Themodeisthevalueofthedatathatoccurswiththegreatestfrequency.(ModewillbediscussedinUnit2.3)Themodalclassistheclasswiththehighestfrequency.Abimodaldistributionhastwohigh-frequencyclassesseparatedbyclasseswithlowerfrequencies.Itisnotnecessaryforthetwohighfrequenciestobethesame.2.2.3CumulativeFrequencyDistributionandOgivesAnotherwaytoexpressafrequencydistributionistouseacumulativefrequencydistributiontopaircumulativefrequencieswithvaluesofthevariable.Cumulativefrequencydistribution:Afrequencydistributionthatpairscumulativefrequencieswithvaluesofthevariable.Ogive:Alinegraphofacumulativefrequencyorcumulativerelativefrequencydistribution.Anogivehasthefollowingthreecomponents:Atitle,whichidentifiesthepopulationorsample.Averticalscale,whichidentifieseitherthecumulativefrequenciesorthecumulativerelativefrequencies.(Figure2.11showsanogivewithcumulativerelativefrequencies.)Ahorizontalscale,whichidentifiestheupperclassboundaries.Untiltheupperboundaryofaclasshasbeenreached,youcannotbesureyouhaveaccumulatedallthedatainthatclass.Therefore,thehorizontalscaleforanogiveisalwaysbasedontheupperclassboundaries.2.3MeasuresofCentralTendency2.3.1FindingtheMeanNote:thatthepopulationmean,#(lowercasemu,Greekalphabet),isthemeanofallxvaluesfortheentirepopulation.FYI:themeanisthemiddlepointbyweight.

2.3.2FindingtheMedianFindingthemedianinvolvesthreebasicsteps.First,youneedtorankthedata.Thenyoudeterminethedepthofthemedian.Thedepth(numberofpositionsfromeitherend),orposition,ofthemedianisdeterminedbytheformula:Themedian’sdepth(orposition)isfoundbyaddingthepositionnumbersofthesmallestdata(1)andthelargestdata(n)anddividingthesumby2(nisthenumberofpiecesofdata).Finally,youmustdeterminethevalueofthemedian.FYI:Thevalueofisthedepthofthemedian,NOTthevalueofthemedian

2.3.3FindingtheModeThemodeisthevalueofxthatoccursmostfrequently.Inthesetofdataweusedtofindthemedianforoddn,{3,3,5,6,8},themodeis3.

2.3.4FindingtheMidrangeThenumberexactlymidwaybetweenalowestdatavalueLandahighestdatavalueHiscalledthemidrange.

Summary:Thefourmeasuresofcentraltendencyrepresentfourdifferentmethodsofdescribingthemiddle.Thesefourvaluesmaybethesame,butmorelikelytheywillbedifferent.Forthesampledataset{6,7,8,9,9,10},themeanis8.2,themedianis8.5,themodeis9,andthemidrangeis8.2.4MeasuresofDispersionThemeasuresofdispersionincludetherange,variance,andstandarddeviation.Thesimplestmeasureofdispersionisrange,whichisthedifferenceinvaluebetweenthehighestdatavalue(H)andthelowestdatavalue(L).

2.5MeasuresofPositionQuartilesandpercentilesaretwoofthemostpopularmeasuresofposition.Othermeasuresofpositionincludemidquartiles,5-numbersummaries,andstandardscores,orz-scores.2.5.1QuartilesQuartiles:Valuesofthevariablethatdividetherankeddataintoquarters;eachsetofdatahasthreequartiles.2.5.2PercentilesPercentiles:Valuesofthevariablethatdivideasetofrankeddatainto100equalsubsets;eachsetofdatahas99percentiles.Theprocedurefordeterminingthevalueofanykthpercentile(orquartile)involvesfourbasicsteps:2.5.3OtherMeasuresofPositionMidquartile:Thenumericalvaluemidwaybetweenthefirstquartileandthethirdquartile.5-Numbersummary:Thepresentationof5numbersthatgiveastatisticalsummaryofadataset:thesmallestvalueinthedataset,thefirstquartile,themedian,thethirdquartile,andthelargestvalueinthedataset.Interquartilerange:Thedifferencebetweenthefirstandthirdquartiles.Itistherangeofthemiddle50%ofthedata.Box-and-whiskersdisplay:Agraphicrepresentationofthe5-numbersummary.

2.6InterpretingandUnderstandingStandardDeviation2.6.1TheEmpiricalRuleandTestingforNormalityTheempiricalrulestatesthatifavariableisnormallydistributed,then:withinonestandarddeviationofthemeantherewillbeapproximately68%ofthedata;withintwostandarddeviationsofthemeantherewillbeapproximately95%ofthedata;andwithinthreestandarddeviationsofthemeantherewillbeapproximately99.7%ofthedata.EmpiricalRule:Ifavariableisnormallydistributed,then:withinonestandarddeviationofthemeantherewillbeapproximately68%ofthedata;withintwostandarddeviationsofthemeantherewillbeapproximately95%ofthedata;andwithinthreestandarddeviationsofthemeantherewillbeapproximately99.7%ofthedata.Theempiricalrulecanbeusedtodeterminewhetherornotasetofdataisapproximatelynormallydistributed.2.6.2Chebyshev’sTheoremChebyshev’sTheorem:Theproportionofanydistributionthatlieswithinkstandarddeviationsofthemeanisatleast,wherekisanypositivenumbergreaterthan1.Forexample,k=2.Chebyshev’sTheoremwithk=2

Unit3DescriptiveAnalysisofBivariateData3.1BivariateDataNotallsampledatacanbegraphicaldisplayedwithonevariable.Tographicallydisplayandnumericallydescribesampledatathatinvolvetwopairedvariablesweneedyoutosuebivariatedata.Bivariatedata:Thevaluesoftwodifferentvariablesthatareobtainedfromthesamepopulationelement.3.1.1TwoQualitativeVariablesWhenbivariatedataresultfromtwoqualitative(attributeorcategorical)variables,thedataareoftenarrangedonacross-tabulationorcontingencytable.Toseehowthisworks,let’suseinformationongenderandcollegemajor.Thirtystudentsfromourcollegewererandomlyidentifiedandclassifiedaccordingtotwovariables:gender(M/F)andmajor(liberalarts,businessadministration,technology).These30bivariatedatacanbesummarizedona2~3cross-tabulationtable,wherethetworowsrepresentthetwogenders,maleandfemale,andthethreecolumnsrepresentthethreemajorcategoriesofliberalarts(LA),businessadministration(BA),andtechnology(T).PercentagesBasedontheGrandTotal(entiresample):ThefrequenciesinthecontingencytableshowninTable3.3caneasilybeconvertedtopercentagesofthegrandtotalbydividingeachfrequencybythegrandtotalandmultiplyingtheresultby100.Forexample,6becomes20%:

Fromthetableofpercentagesofthegrandtotal(seeTable3.4onthenextpage),wecaneasilyseethat60%ofthesampleweremale,40%werefemale,30%weretechnologymajors,andsoon.Table3.4andFigure3.1showthedistributionofmaleliberalartsstudents,femaleliberalartsstudents,malebusinessadministrationstudents,andsoon,relativetotheentiresample.PercentagesBasedonRowTotals:Thefrequenciesinthesamecontingencytable,Table3.3,canbeexpressedaspercentagesoftherowtotals(orgender)bydividingeachrowentrybythatrow’stotalandmultiplyingtheresultsby100.Table3.5isbasedonrowtotals.FromTable3.5weseethat28%ofthemalestudentsweremajoringinliberalarts,whereas50%ofthefemalestudentsweremajoringinliberalarts.PercentagesBasedonColumnTotals:Thefrequenciesinthecontingencytable,Table3.3,canbeexpressedaspercentagesofthecolumntotals(ormajor)bydividingeachcolumnentrybythatcolumn’stotalandmultiplyingtheresultby100.Table3.6isbasedoncolumntotals.FromTable3.6weseethat45%oftheliberalartsstudentsweremale,whereas55%oftheliberalartsstudentswerefemale.3.1.2OneQualitativeandOneQuantitativeVariableWhenbivariatedataresultfromonequalitativeandonequantitativevariable,thequantitativevaluesareviewedasseparatesamples,eachsetidentifiedbylevelsofthequalitativevariable.EachsampleisdescribedusingthetechniquesfromUnit2,andtheresultsaredisplayedsidebysideforeasycomparison.Toseehowaside-by-sidecomparisonworks,let’susetheexampleofstoppingdistance.Thedistancerequiredtostopa3000-poundautomobileonwetpavementwasmeasuredtocomparethestoppingcapabilitiesofthreetiretreaddesigns(seeTable3.7onthenextpage).Tiresofeachdesignweretestedrepeatedlyonthesameautomobileonacontrolledpatchofwetpavement.Thedesignofthetreadisaqualitativevariablewiththreelevelsofresponse,andthestoppingdistanceisaquantitativevariable.ThedistributionofthestoppingdistancesfortreaddesignAistobecomparedwiththedistributionofstoppingdistancesforeachoftheothertreaddesigns.Thiscomparisonmaybemadewithbothnumericalandgraphictechniques.SomeoftheavailableoptionsareshowninFigure3.2,Table3.8,andTable3.9.3.1.3TwoQuantitativeVariablesWhenthebivariatedataaretheresultoftwoquantitativevariables,itiscustomarytoexpressthedatamathematicallyasorderedpairs(x,y),wherexistheinputvariable(sometimescalledtheindependentvariable)andyistheoutputvariable(sometimescalledthedependentvariable).Thedataaresaidtobeorderedbecauseonevalue,x,isalwayswrittenfirst.Theyarecalledpairedbecauseforeachxvalue,thereisacorrespondingyvaluefromthesamesource.Scatterdiagram(orscatterplot):Aplotofalltheorderedpairsofbivariatedataonacoordinateaxissystem.Toillustrate,let’sworkwithdatafromMr.Chamberlain’sphysicalfitnesscourseinwhichseveralfitnessscoresweretaken.Thefollowingsamplecontainsthenumbersofpush-upsandsit-upsdoneby10randomlyselectedstudents:(27,30)(22,26)(15,25)(35,42)(30,38)(52,40)(35,32)(55,54)(40,50)(40,43)Table3.10showsthesesampledata,andFigure3.3showsascatterdiagramofthedata.ThescatterdiagramfromMr.Chamberlain’sphysicalfitnesscourseshowsadefinitepattern.Notethatasthenumberofpush-upsincreasedsodidthenumberofsit-ups.3.2LinearCorrelationTheprimarypurposeoflinearcorrectionanalysisistomeasurethestrengthofalinearrelationshipbetweentwovariables.3.2.1CalculatingtheLinearCorrelationCoefficient,rThecoefficientoflinearcorrelation,r,isthenumericalmeasureofthestrengthofthelinearrelationshipbetweentwovariables.Thecoefficientreflectstheconsistencyoftheeffectthatachangeinonevariablehasontheother.DefinitionFormula:

ComputationalFormula:

3.2.2CausationandLurkingVariablesLurkingvariable:Avariablethatisnotincludedinastudybuthasaneffectonthevariablesofthestudyandmakesitappearthatthosevariablesarerelated.Herearesomepitfallstoavoid:Inadirectcause-and-effectrelationship,anincrease(ordecrease)inonevariablecausesanincrease(ordecrease)inanother.Supposethereisastrongpositivecorrelationbetweenweightandheight.Doesanincreaseinweightcauseanincreaseinheight?Notnecessarily.Ortoputitanotherway,doesadecreaseinweightcauseadecreaseinheight?Manyotherpossiblevariablesareinvolved,suchasgender,age,andbodytype.Theseothervariablesarecalledlurkingvariables.Inthefeatureonpage59,anegativecorrelationexistedbetweenthepercentageofstudentswhoreceivedfreeorreduced-pricelunchesandthepercentageofstudentswhopassedthereadingproficiencytest.Shallweholdbackonthefreelunchessothatmorestudentspassthereadingtest?Athirdvariableisthemotivationforthisrelationship,namely,povertylevel.Don’treasonfromcorrelationtocause:Justbecauseallpeoplewhomovetothecitygetolddoesn’tmeanthatthecitycausesaging.Thecitymaybeafactor,butyoucan’tbaseyourargumentonthecorrelation.

ObservedandPredictedValuesofyTheLineofBestFit

Notes:Remembertokeepatleastthreeextradecimalplaceswhiledoingthecalculationstoensureanaccurateanswer.Whenroundingoffthecalculatedvaluesofb0andb1,alwayskeepatleasttwosignificantdigitsinthefinalanswer.

Theequationshouldbeusedonlywithinthesampledomainoftheinputvariable.Weknowthedatademonstratealineartrendwithinthedomainofthexdata,butwedonotknowwhatthetrendisoutsidethisinterval.Hence,predictionscanbeverydangerousoutsidethedomainofthexdata.Onoccasion,youmightwishtousethelineofbestfittoestimatevaluesoutsidethedomainintervalofthesample.Thiscanbedone,butyoushoulddoitwithcautionandonlyforvaluesclosetothedomaininterval.Ifthesamplewastakenin2006,donotexpecttheresultstohavebeenvalidin1929ortoholdin2010.Unit4IntroductiontoProbability4.1SampleSpaces,EventsandSets4.1.1IntroductionSinceStatisticsinvolvesthecollectionandinterpretationofdata,wemustfirstknowhowtounderstand,displayandsummarizelargeamountsofquantitativeinformation,beforeundertakingamoresophisticatedanalysis.Examples

SurvivalofcancerpatientsAcancerpatientwantstoknowtheprobabilitythathewillsurviveforatleast5years.Bycollectingdataonsurvivalratesofpeopleinasimilarsituation,itispossibletoobtainanempiricalestimateofsurvivalrates.Wecannotknowwhetherornotthepatientwillsurvive,orevenknowexactlywhattheprobabilityofsurvivalis.However,wecanestimatetheproportionofpatientswhosurvivefromdata.4.1.2SampleSpacesTherearelotsofphenomenainnature,liketossingacoinortossingadie,whoseoutcomescannotbepredictedwithcertaintyinadvance,butthesetofallthepossibleoutcomesisknown.Thesearewhatwecallrandomphenomenaorrandomexperiments.Probabilitytheoryisconcernedwithsuchrandomphenomenaorrandomexperiments.Considerarandomexperiment.ThesetofallthepossibleoutcomesiscalledthesamplespaceoftheexperimentandisusuallydenotedbyS.AnysubsetEofthesamplespaceSiscalledanevent.Thesamplespaceischosensothatexactlyoneoutcomewilloccur.Thesizeofthesamplespaceisfinite,countablyinfiniteoruncountablyinfinite.

4.1.3EventsAsubsetofthesamplespace(acollectionofpossibleoutcomes)isknownasanevent.Eventsmaybeclassifiedintofourtypes:thenulleventist

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论