版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、validity and validation methods workshop flow the construct of mkt gain familiarity with the construct of mkt examine available mkt instruments in the field assessment design gain familiarity with the evidence-centered design approach begin to design a framework for your own assessment assessment de
2、velopment begin to create your own assessment items in line with your framework assessment validation learn basic tools for how to refine and validate an assessment plan next steps for using assessments domain modeling (design pattern) (define test specs) domain analysis define item template define
3、item specs develop pool of items collect/ analyze validity data refine items refine items assemble test document technical info assessment development process validity: the cardinal virtue of assessment the degree to which empirical evidence and theoretical rationales support the adequacy and approp
4、riateness of inferences and actions based on test scores or other modes of assessment. - mislevy, steinberg, and almond, 2003 validation is a process of accumulating evidence to provide a scientifically sound validity argument to support the intended interpretation of test scores - standards for edu
5、cational and psychological testing (aera / apa / ncme, 1999) jargon note: two kinds of “evidence” assessment reliability the extent to which an instrument yields consistent, stable, and uniform results over repeated administrations under the same conditions each time figure obtained from the website
6、: http:/ steps of item validation stepmethod 1. expert panel review (formative) alignment and ratings of items 2. feasibility of itemsthink-alouds 3. field testingtesting with a large sample 4. expert panel review (summative) alignment and ratings of items iterative refinement 1. expert panel review
7、 (formative) are the items aligned with the test specifications? content covered in the curriculum? state or national standards? is the complexity level aligned with intended use (e.g., target population, grade-level)? are the items prompts and rubrics aligned? 2. feasibility of items (think- alouds
8、) does the item make sense to the teacher? does the item elicit the cognitive processes intended? can the item be completed in the available time? can respondents use the diagrams, charts, tables as intended? is the language clear? are there differences in approaches by experts and novices (or teach
9、ers exposed or not to the relevant instruction)? simcalc example: think-alouds simcalc expected proportional reasoning: 3.5 white x white - = - 3 dark 5 dark found: just draw the bars! proportional reasoning problem #3 conducting think-alouds sample n: you learn the most in the first 3-6 who experts
10、 and novices low, medium, and high achievers varying in proficiency in english data capture and analysis data can be extremely rich analyzed with varying levels of detail often sufficient to do real-time note-taking videotaping can be helpful document problems with item clarity (language, graphics)
11、response processes what strategies are they using? 3. field testing item-level concerns are there ceiling or floor effects? what is the range of responses we can expect from a variety of teachers? is the amount of variation in responses sufficient to support statistical analysis? what is the distrib
12、ution of responses across distracters? do the items discriminate among teachers performing at different levels? assessment-level concerns are there biases among subgroups? does the assessment have high internal reliability? what is the factor structure of the test? key item statistic: percent correc
13、t what percent of people get it correct? gives us a sense of: the item difficulty the range of responses alerts you to potential problems: floor = roughly 0-10% ceiling = roughly 85-100% simcalc example: exploratory results for item #20 1 2 3 4 ability level 12345 distracter 0 50 100 150 count of te
14、achers who chose distracter n=179 simcalc quartiles of total test score simcalc example: exploratory results for item #43 1 2 3 4 ability level 12345 distracter 10 20 30 40 count of teachers who chose distracter n=179 skip simcalc simcalc example: exploratory results for item #6 responsecount correc
15、t (12)160 (70%) additive error (8)42 (18%) other20 (9%) skip8 (3%) simcalc conducting a field test test under conditions as close to “real” as possible analogous population of teachers administration conditions formatting scoring gather and use demographic data determine sample size based on the num
16、ber of teachers you can get the kinds of statistical analyses you decide to conduct e.g., 5-10 respondents per item for fancy statistics can use simple and fancy statistics field testing with teachers by mail purchasing national mailing lists of teachers http:/ http:/ best practices mailing sequence
17、 (cook et al., 2000) an introductory postcard announcing that a survey will be sent about a week later, a packet containing the survey about two weeks later, a reminder postcard about two weeks later, a second packet containing the survey and a reminder letter about three weeks later, a third appeal
18、 postcard steps of item validation stepmethod 1. expert panel review (formative) alignment and ratings of items 2. feasibility of itemsthink-alouds 3. field testing for psychometric information testing with a large sample 4. expert panel review (summative) alignment and ratings of items iterative re
19、finement 4. expert panel review (summative) similar questions as in step 1 (formative review) same or different panel of experts ratings and alignment collected after items are fully refined results of summative expert panel review provide evidence of alignment of items with standards/curriculum, co
20、ntent validity, and grade-level appropriateness this could be reported in technical documentation steps of item validation stepmethod 1. expert panel review (formative) alignment and ratings of items 2. feasibility of itemsthink-alouds 3. field testing for psychometric information testing with a lar
21、ge sample 4. expert panel review (summative) alignment and ratings of items iterative refinement creating a validity argument integrates all evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for a sound validity ar
22、gument, at minimum, pay attention to sources of evidenceprocedures 1. test contentconduct alignment of items to standards/curriculum by content experts 2. response processeshave at least one or two teachers do think-alouds administer test to at least one group 3. relationships to other variables if
23、possible, conduct one or more of the following: conduct instructional sensitivity study correlate with existing measures correlate with construct-irrelevant variables 4. internal structureestablish internal reliability (alpha) assess inter-scorer reliability, if there is a rubric 5. consequences of
24、testingbe aware of the limitations of your test, not going beyond intended purposes and its intended role on your project activity #5 conduct think-aloud break into groups of 3 and select roles 1 interviewer 1 interviewee 1 observer to complete observation recording sheet select set of 2 items condu
25、ct think-alouds. interviewer and observers take notes on the form in the protocol. repeat two more times, switching roles, with new items. revise your own items. following, we will have a discussion about insights about development of assessment items questions and challenges be the observer for you
26、r own items! activity #5 think-aloud pointers find out how long problems take to do uncover issues of item clarity and level of difficulty derive a model of the knowledge and thinking that the students engage when solving each problem. in observation notes, describe: how problems are solved, focusin
27、g on the underlying knowledge, skills, and structures of item performance actions, thought processes, and strategies activity #5 think-aloud pointers interviewers should prompt the teacher to keep talking ask clarifying questions about what teachers are saying (but not as scaffolding) interviewers s
28、hould not help teachers in any way during the interview (e.g., no hints, tips, or scaffolding). be sure to avoid unintentional hints by being more encouraging when answers are correct. steps of item validation stepmethod 1. expert panel review (formative) alignment and ratings of items 2. feasibilit
29、y of itemsthink-alouds 3. field testing for psychometric information testing with a large sample 4. expert panel review (summative) alignment and ratings of items iterative refinement some useful references validation aera, apa, & ncme (1999). standards for educational and psychological testing. was
30、hington, dc: aera. baxter, g. p., shavelson, r. j., herman, s. j., brown, k. a., & valadez, j. r. (1993). mathematics performance assessment: technical quality and diverse student impact. journal for research in mathematics education, 24(3), 190-216. cronbach, l. j. (1971). test validation. in r. l.
31、 thorndike (ed.), educational measurement (2nd ed., pp. 443-507). washington, dc: american council on education. hoag, r. d., meginbir, l., khan, y., & weatherall, d. (1985). a multitrait- multimethod analysis of the preschool behavior questionnaire. journal of abnormal child psychology, 13, 119-127
32、. mehta, p. d., foorman, b. r., branum-martin, l., & taylor, w. p. (2005). literacy and a unidimensional multilevel construct: validation, sources of influence, and implications in a longitudinal study in grades 1 to 4. scientific studies of reading, 9, 85-116. some useful references validation cont
33、d messick, s. (1989). validity.(in r. l. linn (ed.), educational measurement (3rd ed., pp. 13103). messick, s. (1994). the interplay of evidence and consequences in the validation of performance assessments. educational researcher, 23(2), 13-23. pellegrino, j., chudowsky, n., glaser, r. (eds.). (2001). knowing what students know: the science and design of educational assessment. washington, dc: national academy press. tremblay, r. e., vitaro, f., gagnon, c., piche, c. & royer, n. (1992). a prosocial scale
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024至2030年青铜艺术品项目投资价值分析报告
- 2024至2030年环形远红外卤素发热灯管项目投资价值分析报告
- 2024至2030年拉压扭板异形弹簧项目投资价值分析报告
- 2024年玻璃钢明装箱项目可行性研究报告
- 2024年校准器项目可行性研究报告
- 2024年多路精密微电信号测量仪项目可行性研究报告
- 2024年双回路数字/光柱显示控制仪项目可行性研究报告
- 《线性回归直线方程》名师课件
- 医学教材 安步乐克对糖尿病下肢血管病变的治疗作用
- 石化货物贸易合同三篇
- 《追求有效教学》课件
- 郑州大学《新能源概论》2022-2023学年第一学期期末试卷
- 专题04 整本书阅读(题型归纳、知识梳理)(考点串讲)-七年级语文上学期期末考点大串讲(统编版2024·五四学制)
- 《跨境电商直播(双语)》课件-4.1跨境直播脚本设计
- 教师职业病教育
- 2024年云南省公务员录用考试《行测》真题及答案解析
- 2024-2030年中国粉末冶金制造行业“十四五”发展动态与发展方向建议报告
- 2024-2030年中国小苏打行业发展前景预测及投资潜力分析报告
- 17 难忘的泼水节(第一课时)公开课一等奖创新教学设计
- 一年级数学20以内加减法口算混合练习题
- 矿山安全生产培训
评论
0/150
提交评论