应用统计学卡方检验PPT课件_第1页
应用统计学卡方检验PPT课件_第2页
应用统计学卡方检验PPT课件_第3页
应用统计学卡方检验PPT课件_第4页
应用统计学卡方检验PPT课件_第5页
已阅读5页,还剩28页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、This week lecture will cover. Analysing categorical data (nominal) Chi-square test of differences between proportions Chi-square test of independence第1页/共33页SPSS单样本非参数检验单样本非参数检验 总体分布的总体分布的chi-square检验检验(1)目的目的: 根据样本数据推断总体的分布与某个已知分布是否有显著根据样本数据推断总体的分布与某个已知分布是否有显著差异差异-吻合性检验。吻合性检验。适用于分类资料的统计推断适用于分类资料的统计

2、推断第2页/共33页SPSS单样本非参数检验单样本非参数检验l总体分布的chi-square检验(2)基本假设: H0:总体分布与理论分布无显著差异(3)基本方法 根据已知总体的构成比计算出样本中各类别的期望频数,计算实际观察频数与期望频数的差距,即:计算卡方值 P大于a,不能拒绝H0,第3页/共33页SPSS单样本单样本卡卡方检验方检验 总体分布的总体分布的chi-square检验检验(4)基本操作步骤基本操作步骤: 菜单:analyze-nonparametric test-chi square 选定待检验变量入test variable list 框 确定待检验个案的取值范围(expec

3、ted range) get from data:全部样本 use specified range:用户自定义个案范围 指定期望频数(expected values) all categories equal:所有类别有相同的构成比 value:用户自定义构成比第4页/共33页Categorical variable Variables that describe categories of entities Dealing with them all the time in statistics Making comparisons among variables For example,

4、whether consumers prefer a particular brand of a product among other competing brands. Checking whether there is a relationship between two categorical variables Gender and preference for a product, whether the preference for a product is independent from gender第5页/共33页Chi-square test for difference

5、s between proportions This test involves with nominal data produced by multinomial experiment It is a generalisation of a binomial experiment These test the null hypothesis that data in the target population has a particular probability distribution. Example 1 We might test whether consumers are ind

6、ifferent to which of four materials (glass, plastic, steel or aluminium) that could be used to make soft drink containers. The null hypothesis is that they are indifferent (or that equal numbers prefer glass, plastic, steel and aluminium).第6页/共33页Example 1 Data Let pG be the probability that an indi

7、vidual selected at random will nominate glass as his/her preference if required to make a choice. Similarly for pP (plastic), pS (steel) and pA (aluminium) Hypotheses HO: pG = pP = pS = pA = 0.25. HA: at least one pi 0.25. The alternative is that at least one material is more preferred (or less pref

8、erred) than the others.第7页/共33页Example 1cont. Procedure: Select a random sample of, say, 100 consumers and determine their preferences. Under the null hypothesis We expect 25 consumers to nominate glass, 25 to nominate plastic, 25 to nominate steel and 25 to nominate aluminium These are the expected

9、 frequencies, Ei. Ei = n pi. We compare the expected frequencies with the sample results or the observed frequencies, Oi. If they are approximately the same we would conclude that the null hypothesis is true. Oi Ei HO is probably true.第8页/共33页Example 1cont., Chi squareE)EO(i221GiiWe require a test s

10、tatistic to decide whether the difference is large enough to reject the null hypothesis.We use chi square with G - 1 degrees of freedom where G is the number of groups.Suppose in our example, 39 prefer glass, 16 prefer plastic, 20 prefer steel and 25 prefer aluminium. Recall that the expected freque

11、ncies were all 25.08.1225)2525(25)2520(25)2516(25)2539(23222223第9页/共33页 Obtain the critical value of chi square Critical 23 = 7.82. Obtain the critical value at 5% significance level at 3 d.f., (Table E4, page 742, Berenson et.al. 2013) i.e. there is only a 5 percent chance or less that 23 7.82 if H

12、O is true. Comparison of chi square values 23 = 12.08 7.82 reject HO. Conclusion: at the 5% significance level there is sufficient evidence to reject the null hypothesis. At least one of the probabilities (pi) is different. The sample results indicate that the materials are not equally preferred by

13、consumers in the target population. Thus, at least preferences for two materials are different.第10页/共33页Chi square test using SPSS Example : Suppose that we want to test whether or not customers have a colour preference for packaging. Three different colours, Blue, Green & Purple, are considered

14、. The null hypothesis is that they dont have colour preference. Use Analyse/Nonparametric tests /Chi-Square. The default is that the probabilities are equal.第11页/共33页Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualNumbers of consumers actually choosi

15、ng particular colours.Numbers of consumers expected to choose particular colours if the null is true.第12页/共33页Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualDifferent but differentenough to reject the null? 第13页/共33页Test Statistics2.4672.291Chi-Squa

16、readfAsymp. Sig.Main DisplayColour0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 30.0.a. Degrees of freedom,groups - 1Chi-square statistic第14页/共33页Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColourCheck this to test the null.Check the sig

17、value to test Ho Cannot reject the null (Ho) that all three colours are equally preferredbecause Sig 0.05.Conclusion: At 5% significance level there is no sufficient evidence to conclude that consumers in the target population have preference for at least one of three colours of packaging. 第15页/共33页

18、Tests of independence Chi-squared test of a contingency table This test satisfies two different problem objectives : Are two nominal variables related? Are there differences among two or more population of nominal variables? Consider the following 3 features Height in centimetres, Weight in kilogram

19、s & Colour of eyes. Whilst some people are tall and thin, on average taller people weigh more than shorter people. Weight and height are not independent. It seems unlikely that people with blue eyes weigh more, on average, than people with brown eyes. Weight and eye colour are almost certainly i

20、ndependent.第16页/共33页交叉分组下的频数分析交叉分组下的频数分析 目的 了解不同变量在不同水平下的数据分布情况 例:学习成绩与性别有关联吗?(两变量) 例:职业、性别、爱逛商店有关联吗?(三变量) 分析的主要步骤 产生交叉列联表 分析列联表中变量间的关系第17页/共33页产生产生交叉列联表交叉列联表收入 职称 高(人) 中(人) 低(人) 高工 工程师 助工 技术员 合计 什么是列联表列变量行变量地区控制变量频数第18页/共33页产生产生交叉列联表交叉列联表基本操作步骤(1)菜单选项: analyze-descriptive statistics- crosstabs(2)选择

21、一个变量作为行变量到row框.(3)选择一个变量作为列变量到column框.(4)可选一个或多个变量作为控制变量到layer框.控制变量的层次设置:同层为水平数加水平数加;不同层为水平数积水平数积.(5)是否显示各分组的棒图(display clustered bar charts )第19页/共33页产生产生交叉列联表交叉列联表 进一步计算 cells选项:选择在频数分析表中输出各种百分比. row:行百分比(Row pct); column:列百分比(Col pct); total:总百分比(Tot pct); 第20页/共33页分析列联表中变量间的关系分析列联表中变量间的关系 目的: 通

22、过列联表分析,检验行列变量之间是否独立。 方法: 卡方检验:对品质数据的相关性进行度量第21页/共33页分析列联表中变量间的关系分析列联表中变量间的关系 卡方检验 年龄与工资收入交叉列联表 低 中 高 青 400 0 0 中 0 5000 老 0 0 600 低 中 高 青 0 0 500 中 0 6000 老 400 0 0第22页/共33页分析列联表中变量间的关系分析列联表中变量间的关系卡方检验基本步骤(1)H0:行列变量之间无关联或相互独立(2)构造卡方统计量统计量服从(r-1)*(c-1)个自由度的卡方分布count:观察(实际)频数expected count:期望频数(期望频数反映

23、的是H0成立情况下的数据分布特征)Residual:剩余(观察频数-期望频数)优良中及格总数男1055323女8124125总数1817944837.535.418.88.3100eeofff22)(第23页/共33页不患肺癌不患肺癌患肺癌患肺癌总计总计不吸烟不吸烟7775427817吸烟吸烟2099492148总计总计98749199651、列联表2、三维柱形图3、二维条形图不患肺癌患肺癌吸烟不吸烟不患肺癌患肺癌吸烟不吸烟080007000600050004000300020001000从三维柱形图能清晰看出从三维柱形图能清晰看出各个频数的相对大小。各个频数的相对大小。从二维条形图能看出,吸

24、烟者中从二维条形图能看出,吸烟者中患肺癌的比例高于不患肺癌的比例。患肺癌的比例高于不患肺癌的比例。通过图形直观判断两个分类变量是否相关:通过图形直观判断两个分类变量是否相关:第24页/共33页Tests of independence cont Example 2 Suppose we interviewed 400 people & asked them which of three age groups they are in (under 25, 25 to 60, and over 60). We also ask their response to the statement

25、 that “All imports of automobiles should be banned in order to protect the local industry” (agree, no view either way, disagree).attitudes towards banning importsagreeno viewdisagree Total age groupunder 2519 53 25 9725 - 6046 94 47 187over 6030 56 30 116Total95203102 400第25页/共33页Tests of independen

26、ce cont Example 2 cont. Null hypothesis: The null hypothesis is that answers to the two questions are independent. Under the null: Probover 60 and agree = Probover 60 Probagree Multiplication rule for independent events Expected frequency= Probover 60 Probagree sample size.第26页/共33页nCRnnCnREjijiijPr

27、ocedureWe set up a cross-tabulation showing the observed frequencies of answers to the two questions.We calculate the expected frequencies.TestOur test is based on a comparison of the observed and expected frequencies.Short-cut for expected frequencies第27页/共33页Age *attitude to banning imports Cross

28、tabulationCountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalCalculation for expectedfrequency of agree and over 60,95 116 / 400第28页/共33页Age *attitude to banning imports Cross tabulationCountExp

29、ected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalThe count (observed) and the expected are different, but different enough to reject the null?第29页/共33页Chi-squared test for independenceE)EO(ij22)1c()1r (ijijRationale:Oij Eij HO is probably true.Te

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论