ECMT 5001 Year 2008 Semester 2_ECMT 5001 Lecture 10_第1页
ECMT 5001 Year 2008 Semester 2_ECMT 5001 Lecture 10_第2页
ECMT 5001 Year 2008 Semester 2_ECMT 5001 Lecture 10_第3页
ECMT 5001 Year 2008 Semester 2_ECMT 5001 Lecture 10_第4页
ECMT 5001 Year 2008 Semester 2_ECMT 5001 Lecture 10_第5页
已阅读5页,还剩70页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、ECMT5001 Principles of EconometricsLecture 10Semester 2 2008Dr Deborah OMaraDummy VariablesDummy Variables: Why use them?lWhen we want to include qualitative/ categorical variables in our model as independent or predictor variableslAlso a very useful way to include skewed data and attitudinal data i

2、n a regression equationlExamples:Unemployed/employedGender M/FPromotion in store/No promotion in storeSeason of yearWhether there is a Celebrity on the cover of a magazineType of carDummy Variables: Binary VariableslThe most common type of Dummy Variable is an indicator codinglCoded as a Binary Vari

3、ablelBinary variables only take values of 0 or 1lAlso called a dichotomous variablelDefined as:D = 1- if some event occurs or a characteristic is presentD = 0- non occurrence, not present, otherwiseD = 1 if person is employed; 0 otherwiseD = 1 if person is female; 0 otherwiseD = 1 if house is owned;

4、 0 otherwiseD = 1 if year is a war year; 0 otherwiseD = 1 if government is Labor; 0 otherwiseD = 1 hold an opinion; 0 do not hold that opinionD = 1 Unit of analysis (eg Postcode) has more than 40% employment rate, 0 Unit of Analysis has 40% or less employment rateDummy Variables examples of indicato

5、r coding Dummy Variables: CodinglConsider the variable gender which can take two values Male(M) and Female(F)lAssign one category to 0 and the other category to 1 Gender = 0 if MaleGender = 1 if FemalelThe choice of which is assigned to 0/1 does not matter but will impact interpretationlNeed to have

6、 relevance to between 15% and 85% of the sample otherwise insufficient variation to warrant analysisDummy Variables: CodingObservation (i)GenderGen_D1M02M03F14M05F16F17M08F19F110F1Gender variableGender variable coded as a Dummy variableSample of 10 peopleUsing Dummy Variables in RegressionslOnce the

7、 categorical variable is recoded into a numerical (0/1) variable it can be utilised in a regression frameworklThe dummy enters the regression equation in the same as any other quantitative X variablelGenerally label the Dummy variable with D rather than X to emphasise that it is a binary variableReg

8、ression model with a Dummy Variables lRegression with a single dummy variable D and a normal X variable:Yi = B1 + B2Xi + B3Di + uiTherefore if Di = 0 we haveYi = B1 + B2Xi + uiAnd if Di = 1 we haveYi = (B1 + B3) + B2Xi + uiDummy Variables in RegressionlThe intercept differs depending on whether Di =

9、 0 or 1lThe intercept term for the category assigned to 0 will be B1 (base category)lThe intercept term for the category assigned to 1 will be (B1 + B3)lThe coefficient of the dummy variable, B3, measures the difference in intercept terms between the two groupsUsing Dummy Variables in RegressionlIf

10、we want to test whether there is a difference between the two categories we can conduct a t-test for H0: B3 = 0(standard t-test)lNote also that the slope coefficient for X, B2, is the same for both categorieslThat is, the slope coefficient for the X variables is unaffected by DExample: Executive sal

11、arieslThe following slide provides a hypothetical data set of 20 observations with the variables Executive salaries ($000), Experience (years), and GenderlResearch Question: Can salary be explained and predicted by years of experience and gender?Example: Executive salariesSalaryExperience Gender D S

12、alaryExperience Gender D1005M01206F11207M01306M0805F1804F11308F1704M01106M0603M0906F11005M0804M01107F11307M01208M01205M01409M01006F11208F1Example: Executive salariesExample: Executive salarieslOur regression model is then:Yi = B1 + B2*Di + B3*Xi+ uilwhere Yi = annual salary of executiveDi = 0 if mal

13、e = 1 if femaleXi = years of experiencelMale is the base caseExample: Executive salarieslThe predicted value for a male executive (the base case), is given by E(Yi|Di=0,Xi) = i = b1 + b2*0 + b3Xi = b1 + b3XilThe predicted value for a female executive is given by E(Yi|Di=1,Xi) = i = b1 + b2*1 + b3Xi

14、= (b1 + b2) + b3*XiExample: Executive salariesExample 1: Executive salarieslThe predicted value for a male executive (the base case), is E(Yi|Xi) = i = 35.31 + 12.41*XilThe predicted value for a female executive is i = (35.31 - 9.12) + 12.41*Xi = 26.19 + 12.41*XilInterpreting b2 we can say that on a

15、verage, females receive $9,120 less salary than males with the same level of experienceExample: Executive salariesRegression line for malesRegression line for femalesNote that both regression lines have the same slope!Example: Executive salarieslThe dummy coefficient, b2, indicates a large discrepan

16、cy in salaries between genders lCould simply be due to sample variationlConduct a t-test for following hypotheses H0: B2 = 0 H1: B2 0lNB use a 1-sided alternative because we expected a priori that males might on average have higher salariesln = 20, K=3, b2 = -9.121, se(b2) = 1.696Example: Executive

17、salarieslRejection Region:Choose = 0.05 tn-K, = t17,0.05 = 1.740lDecision RuleReject H0 if t* -1.740, else do not reject(or Reject H0 if p-value/2 1.740, do not reject H0 (or since p-value/2=0.1081/2=0.054 0.05, do not reject H0)lConclusion: There is insufficient evidence to conclude that female sal

18、aries are on average lower than male salaries at the 5% level of significance 696. 1378. 50120. 9*333bseBbtDummy variable trap!lIt may be tempting to include a dummy variable for both males and femaleslHowever, if a constant term is included in the model, this will lead to perfect collinearity betwe

19、en the explanatory variables and the regression estimation will fail.lReferred to as the “Dummy Variable Trap”Dummy variable trap!lThe constant term is simply an X variable which takes the value 1 for all observationslDefine a dummy variable DM taking the value 1 for male and 0 for femalelAnd simila

20、rly, define a dummy variable DF taking the value 1 for female and 0 for maleDummy variable trap!ConstantDMDFGender110M110M101F101F110MlIf we include DM, DF and a constant in our regression equation, we haveConstant = DM + DF Dummy variable trap!lThat is, we have perfect collinearity between our X va

21、riables and the regression cannot be estimatedlTherefore, if you include a constant in your model you must not have a dummy variable for each category!Dummy Variable trap in EviewslIf you do by mistake include a constant term as well as a dummy variable for each category you will get the following e

22、rror message in EviewsQualitative Variables with more than 2 categorieslWhat if we want to include a categorical variable with more than two categories?lExample: Education level in 3 categoriesLess than secondary educationCompleted secondary education onlyCompleted tertiary educationQualitative Vari

23、ables with more than 2 categorieslWhen there are more than two categories we could assign a dummy variable to each category:D1 = 1 if less than secondary education= 0 otherwiseD2 = 1 if completed secondary education only= 0 otherwiseD3 = 1 if completed tertiary education = 0 otherwiseQualitative Var

24、iables with more than 2 categorieslTo avoid the dummy variable trap, we cannot include all three dummy variables in a regression equation with a constant termlIf we have a categorical variable with m categories we should only include (m-1) dummy variables, to avoid the dummy variable traplWe therefo

25、re include two of the dummy variables for the 3 levels of educationlThe category for which no dummy variable is included is then the base caseExample: Health ExpenditurelSuppose we want to regress the annual health care expenditure by an individual on the income and education of the individuallAssum

26、e that we have the three education categories just described:Less than secondary educationCompleted secondary education onlyCompleted tertiary educationExample: Health ExpenditurelRegression Model:Yi = B1 + B2D2i + B3D3i + B4Xi + uiwhereYi = annual health expenditureD2 = 1 if completed secondary edu

27、cation only = 0 otherwiseD3 = 1 if completed tertiary education = 0 otherwiseXi = annual incomeExample: Health ExpenditurelBy not including D1, we are arbitrarily treating the less than secondary education category as the base caselTherefore the intercept B1 will reflect the intercept for this categ

28、orylThe differential intercepts B2 and B3 tell us by how much the intercepts of the other two categories differ from that of the base category Example: Health ExpenditurelFor the base category we have: E(Yi|Xi,D2=0,D3=0) = B1 + B4*XilFor the completed secondary education category we have E(Yi|Xi,D2=

29、1,D3=0) = (B1 + B2) + B4*XilFor the completed tertiary education category we have E(Yi|Xi,D2=0,D3=1) = (B1 + B3) + B4*XiExample: Health ExpenditureIncomeHealth ExpenditureB1B2B3Secondary educationLess than Secondary EducationTertiary EducationEffect codinglUsed in marketinglThe base group is given a

30、 -1 codeMethod of travelRelative FrequencyCodingD1 CARD2 WalkCar44%110Walk37%-101Other19%001TOTAL100%Effects codingIndicator codingEffect codinglThe coefficients therefore represent the differences for any group from the mean of all groups rather than from the ommited grouplIndicator and effects cod

31、ing will give the same predictive resultscoefficient of determinationregression coefficients for continuous variableslThe coefficients therefore represent the differences for any group from the mean of all groups rather than from the ommited grouplThe interpretation of the coefficients will be diffe

32、rent with indicator and effects coding lOnly considering indicator coding in ECMT5001Multiple qualitative variableslOur regression analysis can easily be extended to handle more than one qualitative variablelReturning to the executive salary example . let us now assume that, in addition to experienc

33、e and gender, nationality of the executive is also importantlFor simplicity let us assume that nationality has two categories: Australian, Non-AustralianExample: Executive salarieslIncluding a dummy variable for nationality our regression model is nowYi = B1 + B2D1i + B3D2i + B4Xi + uiwhere Yi = ann

34、ual salary of executiveXi = years of experienceD1i = 0if male = 1if femaleD2i = 0if non-Australian = 1if AustralianlNote that each of the two qualitative variables, gender and nationality, has two categories and hence needs one dummy variable for eachlAlso note that the base category is “non-Austral

35、ian males”Example: Executive salariesExample: Executive salarieslThe estimated regression equation is:i = 27.32 - 4.4*D1i + 15.5*D2i + 12.0*XilMean salary for non-Australian males isSmn = 27.32 + 12.0*Xi(D1=0, D2=0)lMean salary for non-Australian females isSfn = (27.32 - 4.4) + 12.0*Xi= 22.92 + 12.0

36、*Xi (D1=1, D2=0)lMean salary for Australian males isSma = (27.32 + 15.5) + 12.0*Xi= 42.82 + 12.0*Xi (D1=0, D2=1)lMean salary for Australian females isSfa = (27.32 4.4 + 15.5) + 12.0*Xi= 38.42 + 12.0*Xi (D1=1, D2=1)Interpreting the coefficients:B1 non-Australian males with no experience on average re

37、ceive $27,316B2 Holding all else constant females on average receive $4,396 less than malesB3 Holding all else constant Australians on average receive $15,503 more than non-AustraliansB4 Holding all else constant a 1 year increase in experience on average leads to a salary increase of $12,003Signifi

38、cance of coefficients:l Considering p-values the coefficients for Nationality (0.0017) and Experience (0.0000) are less than 0.01 and thus are significant at the 1% level of significance.lThe p-value for Gender (0.3133) indicates the coefficient is not significant even at the 10% levelSeasonality an

39、d Dummy VariableslOne common application for dummy variables is to allow for seasonality in time-series datalFor example, if we have quarterly data we may assign 3 dummy variables as:D1 = 1 if period is in Quarter 1, else 0D2 = 1 if period is in Quarter 2, else 0D3 = 1 if period is in Quarter 3, els

40、e 0lQuarter 4 is the base case (or reference quarter)Seasonality and Dummy VariableslA common model for time series data is to estimate the level of a series, Yt, in period t, as the sum of a trend component and a seasonal component:Yt = Tt + StlThe trend component indicates the average level of the

41、 series in period t and the seasonal component allows for seasonal fluctuations around this trendSeasonality and Dummy VariableslIn a regression framework we can model a seasonal time series asYt = B1 + B2t + B3D1t + B4D2t + B5D3t + utlNote the use of the subscript t rather than i is commonly used f

42、or time series data where each separate observation is a different time periodSeasonality and Dummy VariableslThe trend component here isTt = B1 + B2tThus the trend of the series increases by the amount B2 each time periodlThe predicted value of Yt is: = b1 + b2t + b3if t is in Q1 = b1 + b2t + b4if

43、t is in Q2 = b1 + b2t + b5if t is in Q3 = b1 + b2tif t is in Q4Seasonality and Dummy VariableslWe can test whether there is a trend in the series by testing the hypothesis H0: B2 = 0lWe can test whether seasonality is present in the series Yt by testing the hypothesis H0: B3 = B4 = B5 = 0Example: Au

44、stralian Retail Sales lThe file “RETAIL-SALES-AUS-5001.xls” contains 93 quarterly observations from 1983(1) to 2006(1), that is March 1983 March 2006lRun a regression including trend and seasonality components and assess the regression output. Then interpret the seasonal coefficients, test them for

45、statistical significance, and predict future sales for 2006(2).Example: Australian Retail SalesExample: Australian Retail SaleslOLS: (t-statistics in brackets) = 13965 + 413t - 5168D1 - 4785D2 - 4574D4 (24.6) (-8.72) (-7.99) (-7.64) (52.6)S = 2030.60R2 = 0.97F = 722.7lHigh R2, all t-values highly si

46、gnificant, F-statistic very significantlConclude there is a good fit to the dataInterpreting seasonal coefficientslThe seasonal components tell us that, on average, holding all else constant, compared to sales in Q4Sales in Q1 are $5168 million lessSales in Q2 are $4785 million lessSales in Q3 are $

47、4574 million lesslClearly Q4 is a relatively good quarter for retail salesExample: Australian Retail SaleslTesting seasonal components for significance H0: B3 = B4 = B5 = 0H1: At least one seasonal coefficient not equal to zerolDistributionAssuming all regression assumptions hold and uI N(0,2)lRejection RegionChoose = 0.05 F3,88,0.05 8.55lDecision Rule Reject H0 if F* 8.55, otherwise do not rejectKnJUURFKnRSSJRSSRSSF,*Example: Australian Retail SalesRSSRExample: Australian Retail SaleslTest Stati

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论