1、Statistics in LinguisticsSyllabusSyllabusSyllabus SyllabusAssessment30% classroom participation70% final examCourse Books李绍山,2001,语言研究中的统计学,西安:西安交通大学出版社。Butler, C. 1985. Statistics in Linguistics. New York: Basil Blackwell.Woods, A., Fletcher, P. & Hughes, A. 1986. Statistics in Language Studies. Ca

2、mbridge: Cambridge University Press.ReferencesBrown, J. D. 1988. Understanding Research in Second Language Learning. Cambridge: Cambridge University Press.Brown, J. D. 2002. Doing Second Language Research. Oxford: Oxford University Press.Hatch, E. & Lazaraton, A. 1991. The Research Manual: Design an

3、d Statistics for Applied Linguistics. New York: Newbury House Publishers.Hinton, P. 2004. Statistics Explained. London: Routledge.Muijs, D. 2004. Doing Quantitative Research in Education with SPSS. London: Sage Publications.Statistical SoftwareSPSS (Statistical Package for the Social Sciences)Excel1

4、. IntroductionNeed for statistics in linguisticsReasonsIn language study, many kinds of work require the collection and analysis of quantitative data. Statistics turns the raw data into meaningful information for decision-making.In terms of research methods, articles of an argumentative, or speculat

5、ive nature are not convincing.“个人经验与看法”在四个刊物中的总体比例从1978年的32%下降到1997年的12%(高一虹 等,1992)。Some knowledge of statistics will make us understand academic articles better.量化研究呈上升趋势,现代外语的量化研究在80年代中才出现,到90年代经常在20%-30%之间,从1997年第三期改版以来,量化研究的比例达到50%;外语教学与研究在80年代基本上徘徊在10%上下,而90年代基本上在30%-40%之间。(选自高一虹等,1999)Reasons

6、2. Describing VariablesPopulations, Samples, and Random SamplingPopulation(总体)Any collection of entities, of whatever kind, that is the object of investigation (Butler, C. Statistics in Linguistics, 1985)finite population: the number of entities is fixed and countable(有限总体)infinite population: the n

7、umber of entities is potentially infinite,at least in theory(无限总体)Sample (样本)Entities selected from a population for investigation Populations, Samples, and Random SamplingSampling (抽样)The process of selecting/drawing samples (样本) from the population concernedReasons for sampling 1)For infinite popu

8、lations, it is impossible to make exhaustive investigations; for finite populations in which the number of entities is too large, exhaustive investigations are theoretically possible but impracticable. 2) Sampling cuts down the labor, time and cost involved in obtaining data. 3) Sampling minimizes t

9、he errors that can be easily made during the processing of large numbers of statistical data.By sampling, we hope that the results obtained from the sample will be generalisable to the population. 总体、样本和随机抽样总体作为研究对象的任何个体的集合或目标群体(李绍山,2008)样本从总体中抽取的用以研究的一部分个体随机抽样抽签随机数表计算机生成的随机数字 系统随机抽样(或准随机抽样) 团体抽样分层随

10、机抽样: 比例分层随机抽样、非比例分层随机抽样多级抽样总体抽样框架随机抽样Populations, Samples, and Random Samplingrandom sampling(随机抽样): every unit (entity) in the population has an equal chance of being represented in the sample(总体中的每个个体有同等的机会在样本中得到体现,即总体中的每个个体被抽中的概率是相同的。)drawing: simple random samplingrandom digits tablerandom digit

11、s generated by computer (Excel) Variation: systematic /quasi-random sampling Variation within systematic sampling: block samplingstratified random sampling proportional stratified random sampling disproportional stratified random samplingMulti-stage samplingPopulationSampling frameRandom samplingPop

12、ulations, Samples, and Random SamplingRandom samplingVariation of simple random sampling: systematic /quasi-random samplingThe first unit in the sample is selected by truly random methods, and then the other units are taken at equal intervals throughout the numbered population, the intervals being t

13、aken to give the desired number in the final sample. (interval=number of population/size of sample)Not truly random, since the second and subsequent units in the sample are not selected truly independently of the first unit. This does not matter seriously if there is no periodicity in the population

14、.Populations, Samples, and Random SamplingRandom samplingVariation of simple random sampling: systematic /quasi-random samplingVariation within systematic sampling: block sampling The position of the beginning of the sample is determined randomly, but the next N items are taken, where N is the size

15、of the sample.The position of the beginning of the sample is determined randomly, but the next N items are taken, where N is the size of the sample.The true randomness of such a sample is not guaranteed.It is important to realize that the selection of a sample by methods designed to achieve true ran

16、domness does not guarantee that the sample arrived at will indeed be representative of the population.Probability (Butler, C. Statistics in Linguistics, 1985 :7) Populations, Samples, and Random SamplingPopulations, Samples, and Random SamplingRandom samplingstratified random sampling: stratum-strat

17、a-stratify If the various subgroups (strata) in the population are known, then random sampling can be undertaken with each stratum, and the resulting subsamples can be combined to give an overall portional stratified random sampling: The proportion of each subsample in the overall sample i

18、s equal to the proportion of each stratum in the population. It enables a stratum that has a small overall proportion in the population to be represented in the sample and allows comparisons to be made between the subsamples. disproportional stratified random sampling: The proportion of each subsamp

19、le in the overall sample is not equal to the proportion of each stratum in the population. It provides the optimal condition for comparisons to be made. Populations, Samples, and Random SamplingRandom samplingmulti-stage sampling: The sampling is undertaken by successive stages, treating the sample

20、at one stage as the population for the next stage 目前我国高校英语专业包括专科在内有1000多个办学点最多的每年招收1000多名学生(王金生,朱黎辉,2008:45),其中本科英语专业教学点600多个,这其中有200多个设置在理工类院校(秦秀白,2006)。Populations, Samples, and Random Sampling Population parameter(参数) vs. sample statistic(统计量) parameter: one of the properties of a population, sym

21、bolized by Roman letters or English letters (李绍山, 2008) statisitc: one of the properties of a sample, symbolized by Greek lettersPopulations, Samples, and Random Samplingpopulation parametersample statistic/estimate(估计值)meanvariancestandard deviationcorrelation总体、样本和随机抽样总体参数 vs. 样本统计量 Population Sam

22、ple Parameter Statistic平均数 方差 2 s2标准差 s相关系数 rHomework 11. Define the following terms with an example. 1) population 2) sample 2. Why should we be content with a sample for our study in most cases?3. What is random sampling? Why should we select a sample randomly?4. 假如一所学校有15个系,每个系由20个自然班,每个班有25人左右,共

23、7300人。要从中选取一个100人的样本,那么: 1)最好用什么方法进行抽样?为什么?请说出具体步骤。 2)如果要保证样本中男、女生各占一半,又如何进行抽样? 3)试用随机数表以简单随机抽样的方式从7300名学生中选取 该样本。VariablesVariable : an attribute (属性) of a person, a piece of text, or an object, which varies from person to person, text to text, object to object, or from time to time.In statistics,

24、variables refer to measurable attributes, as these typically vary over time or between individuals. (From Wikipedia)VariablesIn a research project, we may wish to look at levels(水平) within a variable. For example, we might want to know how well ESL students are able to do some task. If the study is

25、designed to compare the performance of ESL students who are foreign students with the performance of those who are immigrant students, then the variable is the circumstance in which the ESL students learn English, and it will have two levels. If the study is concerned with geographic area, i.e. if t

26、he study is to compare the performance of students from different areas, then the variable is geographic area, and there might be such levels as South American, European, Middle Eastern, or Asian so that comparisons among these levels of ESL student can be made. The variable would consist of four le

27、vels. Or, for the purposes of the study, we might want to know if there is a difference in performance between advanced, intermediate, and beginner ESL students. The variable, then, is the proficiency level, which has three levels.The variable is ESL student. That variable may be divided into levels

28、 for the purposes of the study.(?)Variables Classification of variablesaccording to the function a variable has in a study independent variable & dependent variable according to the level of measurement nominal variable, ordinal variable, interval variable & ration variableAccording to whether data

29、obtained on a variable is continuous or not continuous variable & discrete (discontinuous) variableVariablesclassification of variablesaccording to the function a variable has in a study independent variable & dependent variableindependent variable(自变量): variable that the investigator deliberately m

30、anipulates/ variable that the investigator can vary - IVdependent Variable(因变量): variable whose response to the IV the investigator is measuring-DV Variablesclassification of variablesaccording to the level of measurement Nominal variable(名称变量/称名变量): purely qualitative, not quantitative. Entities ma

31、y be the same or different, but not “more” or “less”.Naming, not measurement, no arithmetic valuesex, social status, mother tongue ,marital status according to the level of measurement Ordinal variable(顺序变量): rank the values of the entities on a scale of “more” or “less”. The value of one entity may

32、 be more or less, higher or lower than that of another entity.Ordering, cant tell the size of the differenceVery impoliteimpolitepolitevery politeThe lessons are boring 1 2 3 4 5 6 7 8 9 have arithmetic value, but the value is not preciseClass rank:1289Variableclassification of variablesaccording to

33、 the level of measurementInterval variable(等距变量)There are truly equal intervals between points on the scale of “more” or “less”.Test scores:95907065Temperatures:Ratio variable(比率变量): there is an absolute zero in the variable, and ratios may be taken.Have an absolute zero point: height, time, distanc

34、e, but not temperature(绝对零度-273.15摄氏度)Seldom used in linguistic studiesVariablesClassification of variables according to the level of measurement It is very important for investigators to know which type of variable they are dealing with, because different statistical procedures are appropriate for

35、different types of variable. The level of measurement of a variable influences the choice of a measure of central tendency and variability, and influences the choice of procedures for hypothesis testing. Parametric tests are used for ratio and interval variables; non-parametric tests are used for no

36、minal and ordinal variables.英语期末考试分数数据的变化:Variable IntervalOrdinalnominalStudents rankgroupingTest scoreHigh group, low groupranksVariablesClassification of variablesAccording to whether data obtained on a variable is continuous or not Continuous variable(连续变量): may take any value within a given rangeFor a continuous variable, there are an infinite number of possible values that can fall between any two observed values. It is divisible into an infinite number of fractional parts, e.g. test scores, time spent on uttering a sentence, peopl


