版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、ECMT5001 Principles of EconometricsLecture 10Semester 2 2008Dr Deborah OMaraDummy VariablesDummy Variables: Why use them?lWhen we want to include qualitative/ categorical variables in our model as independent or predictor variableslAlso a very useful way to include skewed data and attitudinal data i
2、n a regression equationlExamples:Unemployed/employedGender M/FPromotion in store/No promotion in storeSeason of yearWhether there is a Celebrity on the cover of a magazineType of carDummy Variables: Binary VariableslThe most common type of Dummy Variable is an indicator codinglCoded as a Binary Vari
3、ablelBinary variables only take values of 0 or 1lAlso called a dichotomous variablelDefined as:D = 1- if some event occurs or a characteristic is presentD = 0- non occurrence, not present, otherwiseD = 1 if person is employed; 0 otherwiseD = 1 if person is female; 0 otherwiseD = 1 if house is owned;
4、 0 otherwiseD = 1 if year is a war year; 0 otherwiseD = 1 if government is Labor; 0 otherwiseD = 1 hold an opinion; 0 do not hold that opinionD = 1 Unit of analysis (eg Postcode) has more than 40% employment rate, 0 Unit of Analysis has 40% or less employment rateDummy Variables examples of indicato
5、r coding Dummy Variables: CodinglConsider the variable gender which can take two values Male(M) and Female(F)lAssign one category to 0 and the other category to 1 Gender = 0 if MaleGender = 1 if FemalelThe choice of which is assigned to 0/1 does not matter but will impact interpretationlNeed to have
6、 relevance to between 15% and 85% of the sample otherwise insufficient variation to warrant analysisDummy Variables: CodingObservation (i)GenderGen_D1M02M03F14M05F16F17M08F19F110F1Gender variableGender variable coded as a Dummy variableSample of 10 peopleUsing Dummy Variables in RegressionslOnce the
7、 categorical variable is recoded into a numerical (0/1) variable it can be utilised in a regression frameworklThe dummy enters the regression equation in the same as any other quantitative X variablelGenerally label the Dummy variable with D rather than X to emphasise that it is a binary variableReg
8、ression model with a Dummy Variables lRegression with a single dummy variable D and a normal X variable:Yi = B1 + B2Xi + B3Di + uiTherefore if Di = 0 we haveYi = B1 + B2Xi + uiAnd if Di = 1 we haveYi = (B1 + B3) + B2Xi + uiDummy Variables in RegressionlThe intercept differs depending on whether Di =
9、 0 or 1lThe intercept term for the category assigned to 0 will be B1 (base category)lThe intercept term for the category assigned to 1 will be (B1 + B3)lThe coefficient of the dummy variable, B3, measures the difference in intercept terms between the two groupsUsing Dummy Variables in RegressionlIf
10、we want to test whether there is a difference between the two categories we can conduct a t-test for H0: B3 = 0(standard t-test)lNote also that the slope coefficient for X, B2, is the same for both categorieslThat is, the slope coefficient for the X variables is unaffected by DExample: Executive sal
11、arieslThe following slide provides a hypothetical data set of 20 observations with the variables Executive salaries ($000), Experience (years), and GenderlResearch Question: Can salary be explained and predicted by years of experience and gender?Example: Executive salariesSalaryExperience Gender D S
12、alaryExperience Gender D1005M01206F11207M01306M0805F1804F11308F1704M01106M0603M0906F11005M0804M01107F11307M01208M01205M01409M01006F11208F1Example: Executive salariesExample: Executive salarieslOur regression model is then:Yi = B1 + B2*Di + B3*Xi+ uilwhere Yi = annual salary of executiveDi = 0 if mal
13、e = 1 if femaleXi = years of experiencelMale is the base caseExample: Executive salarieslThe predicted value for a male executive (the base case), is given by E(Yi|Di=0,Xi) = i = b1 + b2*0 + b3Xi = b1 + b3XilThe predicted value for a female executive is given by E(Yi|Di=1,Xi) = i = b1 + b2*1 + b3Xi
14、= (b1 + b2) + b3*XiExample: Executive salariesExample 1: Executive salarieslThe predicted value for a male executive (the base case), is E(Yi|Xi) = i = 35.31 + 12.41*XilThe predicted value for a female executive is i = (35.31 - 9.12) + 12.41*Xi = 26.19 + 12.41*XilInterpreting b2 we can say that on a
15、verage, females receive $9,120 less salary than males with the same level of experienceExample: Executive salariesRegression line for malesRegression line for femalesNote that both regression lines have the same slope!Example: Executive salarieslThe dummy coefficient, b2, indicates a large discrepan
16、cy in salaries between genders lCould simply be due to sample variationlConduct a t-test for following hypotheses H0: B2 = 0 H1: B2 0lNB use a 1-sided alternative because we expected a priori that males might on average have higher salariesln = 20, K=3, b2 = -9.121, se(b2) = 1.696Example: Executive
17、salarieslRejection Region:Choose = 0.05 tn-K, = t17,0.05 = 1.740lDecision RuleReject H0 if t* -1.740, else do not reject(or Reject H0 if p-value/2 1.740, do not reject H0 (or since p-value/2=0.1081/2=0.054 0.05, do not reject H0)lConclusion: There is insufficient evidence to conclude that female sal
18、aries are on average lower than male salaries at the 5% level of significance 696. 1378. 50120. 9*333bseBbtDummy variable trap!lIt may be tempting to include a dummy variable for both males and femaleslHowever, if a constant term is included in the model, this will lead to perfect collinearity betwe
19、en the explanatory variables and the regression estimation will fail.lReferred to as the “Dummy Variable Trap”Dummy variable trap!lThe constant term is simply an X variable which takes the value 1 for all observationslDefine a dummy variable DM taking the value 1 for male and 0 for femalelAnd simila
20、rly, define a dummy variable DF taking the value 1 for female and 0 for maleDummy variable trap!ConstantDMDFGender110M110M101F101F110MlIf we include DM, DF and a constant in our regression equation, we haveConstant = DM + DF Dummy variable trap!lThat is, we have perfect collinearity between our X va
21、riables and the regression cannot be estimatedlTherefore, if you include a constant in your model you must not have a dummy variable for each category!Dummy Variable trap in EviewslIf you do by mistake include a constant term as well as a dummy variable for each category you will get the following e
22、rror message in EviewsQualitative Variables with more than 2 categorieslWhat if we want to include a categorical variable with more than two categories?lExample: Education level in 3 categoriesLess than secondary educationCompleted secondary education onlyCompleted tertiary educationQualitative Vari
23、ables with more than 2 categorieslWhen there are more than two categories we could assign a dummy variable to each category:D1 = 1 if less than secondary education= 0 otherwiseD2 = 1 if completed secondary education only= 0 otherwiseD3 = 1 if completed tertiary education = 0 otherwiseQualitative Var
24、iables with more than 2 categorieslTo avoid the dummy variable trap, we cannot include all three dummy variables in a regression equation with a constant termlIf we have a categorical variable with m categories we should only include (m-1) dummy variables, to avoid the dummy variable traplWe therefo
25、re include two of the dummy variables for the 3 levels of educationlThe category for which no dummy variable is included is then the base caseExample: Health ExpenditurelSuppose we want to regress the annual health care expenditure by an individual on the income and education of the individuallAssum
26、e that we have the three education categories just described:Less than secondary educationCompleted secondary education onlyCompleted tertiary educationExample: Health ExpenditurelRegression Model:Yi = B1 + B2D2i + B3D3i + B4Xi + uiwhereYi = annual health expenditureD2 = 1 if completed secondary edu
27、cation only = 0 otherwiseD3 = 1 if completed tertiary education = 0 otherwiseXi = annual incomeExample: Health ExpenditurelBy not including D1, we are arbitrarily treating the less than secondary education category as the base caselTherefore the intercept B1 will reflect the intercept for this categ
28、orylThe differential intercepts B2 and B3 tell us by how much the intercepts of the other two categories differ from that of the base category Example: Health ExpenditurelFor the base category we have: E(Yi|Xi,D2=0,D3=0) = B1 + B4*XilFor the completed secondary education category we have E(Yi|Xi,D2=
29、1,D3=0) = (B1 + B2) + B4*XilFor the completed tertiary education category we have E(Yi|Xi,D2=0,D3=1) = (B1 + B3) + B4*XiExample: Health ExpenditureIncomeHealth ExpenditureB1B2B3Secondary educationLess than Secondary EducationTertiary EducationEffect codinglUsed in marketinglThe base group is given a
30、 -1 codeMethod of travelRelative FrequencyCodingD1 CARD2 WalkCar44%110Walk37%-101Other19%001TOTAL100%Effects codingIndicator codingEffect codinglThe coefficients therefore represent the differences for any group from the mean of all groups rather than from the ommited grouplIndicator and effects cod
31、ing will give the same predictive resultscoefficient of determinationregression coefficients for continuous variableslThe coefficients therefore represent the differences for any group from the mean of all groups rather than from the ommited grouplThe interpretation of the coefficients will be diffe
32、rent with indicator and effects coding lOnly considering indicator coding in ECMT5001Multiple qualitative variableslOur regression analysis can easily be extended to handle more than one qualitative variablelReturning to the executive salary example . let us now assume that, in addition to experienc
33、e and gender, nationality of the executive is also importantlFor simplicity let us assume that nationality has two categories: Australian, Non-AustralianExample: Executive salarieslIncluding a dummy variable for nationality our regression model is nowYi = B1 + B2D1i + B3D2i + B4Xi + uiwhere Yi = ann
34、ual salary of executiveXi = years of experienceD1i = 0if male = 1if femaleD2i = 0if non-Australian = 1if AustralianlNote that each of the two qualitative variables, gender and nationality, has two categories and hence needs one dummy variable for eachlAlso note that the base category is “non-Austral
35、ian males”Example: Executive salariesExample: Executive salarieslThe estimated regression equation is:i = 27.32 - 4.4*D1i + 15.5*D2i + 12.0*XilMean salary for non-Australian males isSmn = 27.32 + 12.0*Xi(D1=0, D2=0)lMean salary for non-Australian females isSfn = (27.32 - 4.4) + 12.0*Xi= 22.92 + 12.0
36、*Xi (D1=1, D2=0)lMean salary for Australian males isSma = (27.32 + 15.5) + 12.0*Xi= 42.82 + 12.0*Xi (D1=0, D2=1)lMean salary for Australian females isSfa = (27.32 4.4 + 15.5) + 12.0*Xi= 38.42 + 12.0*Xi (D1=1, D2=1)Interpreting the coefficients:B1 non-Australian males with no experience on average re
37、ceive $27,316B2 Holding all else constant females on average receive $4,396 less than malesB3 Holding all else constant Australians on average receive $15,503 more than non-AustraliansB4 Holding all else constant a 1 year increase in experience on average leads to a salary increase of $12,003Signifi
38、cance of coefficients:l Considering p-values the coefficients for Nationality (0.0017) and Experience (0.0000) are less than 0.01 and thus are significant at the 1% level of significance.lThe p-value for Gender (0.3133) indicates the coefficient is not significant even at the 10% levelSeasonality an
39、d Dummy VariableslOne common application for dummy variables is to allow for seasonality in time-series datalFor example, if we have quarterly data we may assign 3 dummy variables as:D1 = 1 if period is in Quarter 1, else 0D2 = 1 if period is in Quarter 2, else 0D3 = 1 if period is in Quarter 3, els
40、e 0lQuarter 4 is the base case (or reference quarter)Seasonality and Dummy VariableslA common model for time series data is to estimate the level of a series, Yt, in period t, as the sum of a trend component and a seasonal component:Yt = Tt + StlThe trend component indicates the average level of the
41、 series in period t and the seasonal component allows for seasonal fluctuations around this trendSeasonality and Dummy VariableslIn a regression framework we can model a seasonal time series asYt = B1 + B2t + B3D1t + B4D2t + B5D3t + utlNote the use of the subscript t rather than i is commonly used f
42、or time series data where each separate observation is a different time periodSeasonality and Dummy VariableslThe trend component here isTt = B1 + B2tThus the trend of the series increases by the amount B2 each time periodlThe predicted value of Yt is: = b1 + b2t + b3if t is in Q1 = b1 + b2t + b4if
43、t is in Q2 = b1 + b2t + b5if t is in Q3 = b1 + b2tif t is in Q4Seasonality and Dummy VariableslWe can test whether there is a trend in the series by testing the hypothesis H0: B2 = 0lWe can test whether seasonality is present in the series Yt by testing the hypothesis H0: B3 = B4 = B5 = 0Example: Au
44、stralian Retail Sales lThe file “RETAIL-SALES-AUS-5001.xls” contains 93 quarterly observations from 1983(1) to 2006(1), that is March 1983 March 2006lRun a regression including trend and seasonality components and assess the regression output. Then interpret the seasonal coefficients, test them for
45、statistical significance, and predict future sales for 2006(2).Example: Australian Retail SalesExample: Australian Retail SaleslOLS: (t-statistics in brackets) = 13965 + 413t - 5168D1 - 4785D2 - 4574D4 (24.6) (-8.72) (-7.99) (-7.64) (52.6)S = 2030.60R2 = 0.97F = 722.7lHigh R2, all t-values highly si
46、gnificant, F-statistic very significantlConclude there is a good fit to the dataInterpreting seasonal coefficientslThe seasonal components tell us that, on average, holding all else constant, compared to sales in Q4Sales in Q1 are $5168 million lessSales in Q2 are $4785 million lessSales in Q3 are $
47、4574 million lesslClearly Q4 is a relatively good quarter for retail salesExample: Australian Retail SaleslTesting seasonal components for significance H0: B3 = B4 = B5 = 0H1: At least one seasonal coefficient not equal to zerolDistributionAssuming all regression assumptions hold and uI N(0,2)lRejection RegionChoose = 0.05 F3,88,0.05 8.55lDecision Rule Reject H0 if F* 8.55, otherwise do not rejectKnJUURFKnRSSJRSSRSSF,*Example: Australian Retail SalesRSSRExample: Australian Retail SaleslTest Stati
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 专利使用权转让合同样本
- 个人借款合同格式示例在线编辑
- 购销合同书写格式
- 设计勘察分包合同协议
- 房屋买卖定金合同判决的借鉴意义
- 高质量手术合同
- 版商品房买卖合同
- 购销合同签订的要求
- 服务合同范本使用攻略
- 金属配件交易协议
- 景观设计初学者实战宝典-园林规划设计智慧树知到期末考试答案2024年
- (2024年)周黑鸭营销策划课件
- 2023年北京市中考英语试卷(附答案)
- 股权划转方案
- 2023-2024学年宜宾市数学九年级上册期末考试试题(含解析)
- 清华大学《大学物理》习题库试题及答案-08-电学习题答案
- 专家顾问聘用合同协议书范本(通用)(带目录)
- -年级组长述职报告(四篇合集)
- 2024年全国初中数学联合竞赛试题参考答案及评分标准
- 2024年医保知识题库及答案(通用版)
- 个人分析报告优势与劣势
评论
0/150
提交评论