版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、 Criteria of Tests测试的标准 Validity 效度 Reliability 信度 Power/Difficulty 难度 Discrimination 区分度 Practicality 实用性 Backwash effects 后效作用Criteria of testsValidityThe validity of a test is the extent to which it measures what it is supposed to measureand nothing else.效度是指一套测试所考的是否就是设计人想要考的内容,或者说,在多大程度上考了想要考的。
2、Discuss on the following items: “Is photography an art or a science?” Discuss. “The mind is in its own place, and itself can make a Heaven of Hell, a Hell of a Heaven.” (Milton) Discuss. Use the following words in sentences: courageous, choosy, acceptable, complicated, etc.A. John is a very courageo
3、us boy.B. John, the captain of our team, is courageous.C. I have a courageous father.Factors of validity Face validity 表面效度 Content validity 内容效度 Construct validity 结构效度 Empirical validity 实验效度 Concurrent validity 共时效度 Predictive validity 预测效度Face validity If a test item looks right to other testers
4、, teachers, moderators, and testees, it can be described as having at least face validity. 表面效度指考试表面的可信度或公众的可接受程度。 邹申:一个考试看上去具有了拟定的技能或能力测试。(测语音语调用笔头考试来测则表面效度低。)Content validity A test is said to have content validity if its content constitutes a representative sample of the language skill, structure
5、s, etc. with which it is meant to be concerned. 内容效度指测试是否考了考试大纲规定要考的,或者说考试的题目在多大程度上能代表它所要测量的目标。(1)Is the content of a test related to the objective or purpose of it?(2)Are the test items representative?(3)Is the content appropriate or suitable for the testees?Construct validity If a test has constru
6、ct validity, it is capable of measuring certain specific characteristics in accordance with a theory of language behavior and learning. 结构(构卷)效度指测试是否以有效的语言观(包括语言学习观和语言运用观)为依据。这里的结构并不是指试卷的结构或题目的编排,而是指整个考试的理论基础。Empirical validity This validity is obtained as a result of comparing the results of the te
7、st with the results of some criterion measure. 实验(统计)效度是将考试结果与其它测量结果相比较而得来的。它又可分为共时效度和预测效度。Concurrent validity If the results of the test are compared with the results of some criterion measure such as: an existing test, known or believed to be valid and given; or the teachers ratings or any other s
8、uch form of independent assessment givenat the same time, then results obtained by either of the above two methods are measures of the tests concurrent validity in respect of the particular criterion used. In other words, concurrent validity is established when the test and the criterion are adminis
9、tered at about the same time. 共时效度是将一次测试的结果同另一次同时或时间相近的测试的结果相比较,或同教师对学生的评估相比较而得出的系数。例如拿期末考试成绩与刚刚结束的四级考试成绩相比,假若得分情况相似,则说明期末测试有较高的共时效度。(前提:四级考试效度很高。)Predicative validity If the results of the test are compared with the results of some criterion measure such as: the subsequent performance of the testee
10、s on a certain task measured by some valid test; or the teachers ratings or any other such form of independent assessment given later,then results obtained by either of these two methods are measures of the tests predicative validity in respect of the particular criterion used. In other words, predi
11、cative validity concerns the degree to which a test can predict the testers future performance or success. 预测效度涉及测试的预测能力,即测试结果到底在多大程度上能够预测出某些将来会发生的可能性,或者说考试是否具有预测学生未来表现或成绩的功能。 A Test is said to be reliable if it is consistent in its measurements. 信度是指考试结果的可靠性和稳定性。例如 拿一份卷子对同一组学生实施两次或多次测 试,如果结果很一致,则说明
12、该测试的信度 较高。Reliability验证测试信度的方法 考后复考法 (test/retest method) 试题分半法 (split-half method) 平行试题法 (parallel forms method)test/retest methodThis method is to re-administer the same test after a lapse of time. It is often impracticable since certain students will benefit more than others by a familiarity with
13、 the type and format of the test. Moreover, in addition to changes in performance resulting from the memory factor, personal factors such as motivation and differential maturation will also account for differences in the performances of certain students.split-half methodThis method estimates a diffe
14、rent kind of reliability from that estimated by test/re-test procedure. It is based on the principle that, if an accurate measuring instrument were broken into two equal parts, the measurements obtained with one part would correspond exactly to those obtained with the other. parallel forms methodThi
15、s method is to administer parallel forms of the test to the same group. This assumes that two similar versions of a particular test can be constructed: such tests must be identical in the nature of their sampling, difficulty, length, rubrics, etc. only after a full statistical analysis of the tests
16、and all the items contained in them can the tests safely be regarded as parallel. If the correlation between the two tests is high, then the tests can be termed reliable.影响考试信度的因素 题量 题目性质 题目区分度 成绩分布 题目难度 评分是否客观 考试的时间 Power/Difficulty难度是指一套试题中每个题目的难易程度。分析一套试卷的质量如何,除了看其信度和效度这两个重要指标之外,还要研究试题的难度指数(index
17、 of difficulty/facility value),即试题的难易度。难度值的计算公式 题目的难度通常用P来表示,P值实际上指的是答对题目的比率。假设有10名考生,某道题有8人答对,那么该题的难度值为:适用于主观性试题的公式 假设某写作题的满分为20分,所有考生在这道题上的得分的平均分为16分,则该题的难度值为:正态分布图 Discrimination Discrimination of a test is its capability to discriminate among the different candidates and to reflect the differenc
18、es in the performance of the individuals in the group. 区分度指一个题目区分考生能力的程度。计算题目区分度的方法 公式法 点双列相关系数法 双列相关系数法Practicality A good test is practical. It is within the means of financial limitations, time constraints, ease of administration, and scoring and interpretation. 实用性是指试题是否便于使用以及实施 起来是否可行。Factors affecting practicality the length of time available for the administration of the test the answer sheet and the stationery used the test situation the necessary e
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论