统计推断课件第一部分part_第1页
统计推断课件第一部分part_第2页
统计推断课件第一部分part_第3页
统计推断课件第一部分part_第4页
统计推断课件第一部分part_第5页
已阅读5页,还剩106页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Topic 3: Point EstimationOptimal Criterion of EstimationMaximum Likelihood Estimation Uniformly Minimum Variance Unbiased Estimate Method of MomentsOLS and Regularized Estimation Kernel Density EstimationSection 3: Uniformly Minimum Variance Unbiased Estimate Definition of UMVUE (一致最小方差无偏估计)Methods

2、to construct UMVUECramer-Rao inequalityUMVUEUMVUEOnly consider the case of unbiased estimator exits The smaller the variance, the better the estimatorIf an unbiased estimator has the smallest variance among all the unbiased estimators, then it is called UMVUE (一致最小方差无偏估计).UMVUE (definition)Methods t

3、o construct UMVUEZero unbiased estimate methodSufficient and complete statistic methodCramer-Rao inequalityLemma 1Remark 1. Lemma 1 provides a method to improve the unbiased estimateRemark 2. UMVUE must be a function of sufficient statistic if exitsProof of Lemma 1Proof of Lemma 1Example 15Is h(T) a

4、 UMVUE?Zero Unbiased Estimate MethodRemark 2. In fact, the condition in Theorem 7 is sufficient and necessaryRemark 1. l(X) is an “unbiased estimate” of 0Remark 3. It can be used to verify whether a given statistic is UMVUE, but cannot be used to construct UMVUEZero Unbiased Estimate MethodZero Unbi

5、ased Estimate MethodExample 15 (continue)Example 15 (continue)Example 4 (continue)Example 10 (continue)Example 10 (continue)RemarkSufficient and Complete Statistic Method Sufficient and Complete Statistic MethodExamples Example 15 (cont)Example 16Example 16Example 17Cramer-Rao InequalityProvide a lo

6、wer bound for the variance of unbiased estimator C-R lower boundProved by C. R. Rao in 1945 and H. Cramer in 1946Drawback: The variance of UMVUE can be larger than the C-R lower boundUsed in the definition of efficiency, effective estimation, Fisher informationC-R Regularity Distribution FamilySingl

7、e Parameter C-R InequalityFisher InformationProofProofRemarkIf an unbiased estimator obtains the C-R lower bound of variance, then it is the UMVUEThe regularity conditions hold for exponential familyThe variance of UMVUE can be larger than the C-R lower boundThe equality holds only if the distributi

8、on family is an exponential familyEven for exponential family, only a few unbiased estimates can obtained the C-R lower boundExample 18Example 19Example 20Example 21C-R Inequality for Multidimensional Parameters Example 22Example 22Efficiency and Effective EstimationRemark. To talk about effective e

9、stimate, the distribution should satisfy regularity conditions required in Theorem 9. Section 4: Method of MomentDefinition and examplesPropertiesMethod of MomentMethod of MomentExample 23Remark.Example 24Example 25Example 26PropertiesPropertiesSection 5: Ordinary Least Squares (OLS) and Regularized

10、 EstimateLinear regression modelOLS estimateRegularized estimate (Ridge estimate)Linear Regression ModelExample 27: Ames Housing DataInformation from the Ames Assessors Office used in computing assessed values for individual residential properties sold in Ames, Iowa (爱荷华州) from 2006 to 20102000 obse

11、rvations, 82 variables 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables, 2 IDAmes Housing DataAmes Housing DataAmes Housing DataLinear Regression ModelIntercept Population slopes Random error Linear Regression ModelLinear Regression ModelBasic AssumptionsBasic AssumptionsMLE under No

12、rmal AssumptionSimple Linear RegressionSome ConceptsPopulation regression equation (line)Samples or data Sample regression equation (line)Fitted valuesResidualsExample: Little Women“Little Women” (Berkeley course on Data Science) Each row: one chapterGoal: predict the number of characters based on t

13、he number of periodsLittle Women (r = 0.92)Regression LineRegression LineRegression LineRegression LineMean Squared Error (Residual Sum of Squares)Mean Squared Error (Residual Sum of Squares)Mean Squared Error (Residual Sum of Squares)What we need now is one overall measure of the rough size of the

14、errors (residuals)errors are likely to be positive and others negativeTo avoid cancellation when measuring the rough size of the errors, we will take the mean of the squared errors rather than the mean of the errors themselvesCalculate Squared Error (or Residual Sum of Squares): Ordinary Least Squar

15、es (OLS)OLS: minimizing the MSEThe Least Squares Line (unique)Why squared error?Explicit formula, easy to computeLeast Absolute DeviationOLS for Multivariate RegressionOLS for Multivariate RegressionExample: Ames House Datalm(SalePrice Gr_Liv_Area + Lot_Area + Full_Bath + Bedroom_AbvGr + Central_Air

16、, data = HousePrice_train)Properties of OLSMulticollinearityMulticollinearity: SourcesIf all predictors are orthogonal, then multicollinearity is not a problemFour primary sourcesThe data collection method employed house size vs. electricity consumptionConstraints on the model or in the populationFa

17、mily e (x1) = salary (x2) + bonus and other e (x3)Dummy variable: red + blue + greenModel specificationPolynomial termsAn over-defined model High-dimensional: # predictors # observationsLongleys Economic Regression DataA macroeconomic data set which provides a well-known example for a highly colline

18、ar regressionJ. W. Longley (1967) An appraisal of least-squares programs from the point of view of the user.Journal of the American Statistical Association62, 819841.7 economical variables, observed yearly from 1947 to 1962 (n=16)GNP.deflator (国民生产总值平均物价指数)GNP (国民生产总值)Unemployed (失业人数)Armed.Forces (

19、从军人数)Population (不小于14岁的非住院人口)Year (年份)Employed (就业人数)Longleys Economic Regression Data e MulticollinearityRidge RegressionHoerl and Kennard (1970)Goal: improve the estimation and prediction accuracy of the OLS when there exits multicollinearityill-conditioned design matrix, e.g., too many predictor

20、s Bias-variance trade-off (OLS is UMVLUE)Bias-Variance Trade-offRidge RegressionRidge estimate is the solution of the following convex optimizationRidge RegressionTuning ParameterRidge Coefficient Paths (Solution paths)Figure: Ridge coefficient path for the Longleys economic regression data lambdaco

21、efficientGNPUnemployedArmed.ForcesPopulationYearEmployedRidge RegressionCross ValidationMore on Training and TestingIdeally, we would separate our available data into both training and test setsOf course, this is not always possible, especially if we have a few observationsHope to come up with the b

22、est-trained algorithm (estimate) that will stand up to the testHow can we try to find the best-trained algorithm?ExampleRemarkHigh Dimensional DataHigh-dimensional statistics and sparse modeling has been active research areas for the last two decadesHigh-dimensional refers to the situation where the

23、 number of parameters (or covariates) is comparable to or much larger than the sample sizeInformation technology, bioinformatics, astronomy, High Dimensional DataHigh Dimensional DatafMRI (functional magnetic resonance imaging;功能性磁共振成像)long-term interdisciplinary project by the Gallant Neuroscience Lab and Prof. Bin Yus group at UC Berkeley that studies primate visual pathwaysFor a particular voxel (2 2 2.5 millimeters) in a hum

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论