Test uation.doc

上传人：f*** IP属地：河南上传时间：2020-01-25 格式：DOC 页数：19 大小：111KB 积分：20 举报 版权申诉

已阅读5页，还剩14页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Introduction21 Reliability32 Validity42.1 Construct Validity42.2 Content Validity62.2.1 Test Syllabus62.2.2 What ability has been assessed?72.2.3 What topics have been covered?73 Authenticity, Interactiveness and Practicality83.1 Authenticity83.2 Interactiveness103.3 Practicality114 Impact11Conclusion12References12Appendix 1 Listening Subtest A (December, 2005)13Appendix 2 Listening Subtest B (June, 2009)16IntroductionCollege English Test Band 4 (CET-4) in China is administered by the National College English Testing Committee on behalf of the Higher Education Department of the Ministry of Education. The test is held twice a year since 1987. Its test-takers are all undergraduates in China majoring in any discipline except English. Usually they are second-year university students who have completed the college English course Band 1 to 4. The purpose of the test is to provide an objective evaluation of college students overall English proficiency and the English teaching quality of universities so as to exert a positive impact on the college English education in China.However, CET has long been accused of mainly examining grammar and vocabulary instead of focusing on communicative ability. The biggest criticism against CET is that it produces students who are only good at paper-based test with unsatisfactory command of practical use. Facing such criticism, from 1996 onwards, a series of reform had undergone by the CET committee to attach more importance to students productive skills in the assessment, such as introducing CET spoken English test; employing new response formats; reporting the average graded scores; launching the web-based CET. The most recent reform took place in 2005, resulting in a new CET test across the country in 2006. The changes were made in various aspects: increasing the weighting of listening from 20% to 35%; removing the vocabulary and structure section; introducing new test content, such as skimming and scanning and translation; introducing new test formats, such as banked close, true or false, short answer questions; introducing an online marking system of subjective items; changing the scoring system.This paper will mainly centre on the reform made in the listening section by comparing two listening tests taken place in December 2005 (Test A) and June 2009 (Test B), which were held before and after the reform. The evaluation will be based on Bachman and Palmers (1996) framework of test usefulness which is defined as “a function of several different qualities, all of which contribute in unique but interrelated ways to the overall usefulness of a test (1996: 18).” The qualities contributing to test usefulness are reliability, construct validity, authenticity, interactiveness, impact and practicality. The two listening tests will be evaluated and compared on the basis of this framework.1. ReliabilityIn this section, instrument-related reliability will be examined of both tests. In Test A, all questions are objective multiple choices, while Test B also contains compound dictation, so we will only discuss assessor-related reliability of Test B.In Test A, there are 10 short conversations and 3 short passages which all together comprise 20 multiple choice questions. In Test B, there are 8 short conversations, 2 long conversations, 3 short passages and 1 passage for compound dictation composing all together 25 multiple choice questions and 11 blanks. The introduction of long conversations and compound dictation is a striking change in the listening section. The longer conversation can incorporate meaning negotiation and discourse features into the texts, thus engaging test-takers into a broader context of listening.In compound dictation section, 8 blanks are required to be filled with exact words students have just heard, and 3 others can be filled with either exact words or the main points in their own words. For the former 8 blanks the missing information is just one word, whereas the other 3 blanks miss longer clauses. Apparently, the introduction of dictation minimizes the chance of guessing and cheating, so Test B has higher instrument-related reliability than Test A in this aspect. However, filling with exact words might impose a potential danger of only testing learners short-term memory and spelling, which is mitigated by the follow 3 larger chunks of missing information. The examiners are trained and standardized to accept any semantically acceptable form, so any answer that demonstrates understanding of the dictated utterance is awarded a mark. We can say that assessor-related reliability can be assured in Test B. Moreover, the missing clauses are long enough and the time allowance is short enough to make it almost impossible for any test-taker to memorize every exact word of the clauses. The increased amount of missing information places more demand on working memory and linguistic knowledge of learners to replace forgotten words. The use of long conversations and dictation as well as the training of test examiners make Test B more reliable to evaluate the decoding of linguistic information and interpreting the meaning in a wider communicative context.2. ValidityThe most important question of all test evaluation is: does the test test what it is supposed to test? Henning (1987) defines validity as the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure. Validity can be established in a number of different ways, through building content validity, criteria validity, construct validity, consequential validity, face validity and so on. In this section, we will only discuss two most essential types of validity of the two testsconstruct validity and content validity.2.1 Construct ValidityGronlund (1985: 58) describes construct validation as measuring “How well test performance can be interpreted as a meaningful measure of some characteristics or quality.” In this case, the two listening tests intend to measure test-takers listening competence, so they should be developed to reflect the quality of the listening comprehension process. Weir (2005) proposes a componential model for the listening process which includes both linguistic and psycholinguistic elements. In this model, the comprehension process is categorized into executive resources and executive processing. With respect to executive resources, the listener mainly draws on language knowledge to relate language to concepts, which is further divided into grammatical knowledge, discousal knowledge, pragmatic knowledge and sociolinguistic knowledge. Table 1 shows a comparison of the two tests regarding the employment of language knowledge in the multiple choice section.Table 1. Language Knowledge Involved in the Two TestsLanguage KnowledgeTest ATest BItem No.%Item No.%Grammatical1,3,5,7,9,11,12,13,15,16,17,18,19652,9,10,11,13,16,17,18,21,2240Discoursal2,8,14155,12,19,2016Pragmatic10,20101,4,8,2516Sociolinguistic4,6103,6,7,14,15,23,2428As can be seen from the above table, less weight is attached to the grammatical knowledge, dropping from 65% in Test A to 40% in Test B. At the same time, greater emphasis is put to pragmatic and sociolinguistic knowledge, especially the later, which increases from 10% to 28%. Even if some questions in Test B still require grammatical knowledge, they mainly test the understanding of larger units of language, such as “the right quantity of manuals” in question 13. In comparison, the majority of questions in Test A examine the understanding of particular key words or phrases, like “swings” in question 5, “spare rooms” in Question 7, “life threatening” in question 11 and “hostile” in question 19.Apart from multiple choices, Test B also has compound dictation section. The former 8 blanks are to be filled with the key words of the sentence. The chance of guessing is eliminated and spelling and grammar are tested through productive tasks. The other 3 blanks have to be filled with the concluding sentences of the passage, which require global comprehension skills of candidates.We can say that Test B demands higher level of linguistic competence of test-takers, because the units of understanding raise from single words to bigger chunks of language, the knowledge needed for comprehension enhances from basic grammatical and discoursal to complex pragmatic and sociolinguistic and the degree of comprehension deepens from individually focused items to rather global understanding of texts.To test listening, we want to measure how well the candidates form a clear concept in memory for every referent used by the speaker, and the referent could convey grammatical cue, rhetoric structures, indirect speech acts and pragmatic implications and cultural references. Therefore, to establish construct validity by reflecting all these components of listening process, we need a more well-balanced listening test. Comparatively speaking, Test B demonstrates more holistic arrangement of assessment constructs.2.2 Content ValidityContent validity is the representativeness or sampling adequacy of the contextthe substance, the matter, the topicsof a measuring instrument (Alderson, 1995). It is concerned with both the representativeness of test tasks themselves and the broader administrative setting.2.2.1 Test SyllabusAccording to the National Syllabus (2006), the objective of college English is to develop students ability to use English in a well-rounded way, especially in listening and speaking, so that in their future studies and careers as well as social interactions they will be able to communicate effectively.The requirement for listening comprehension is to understand classroom English, daily conversation and lectures on general topics, and most of the English programs broadcasted at a slow speed. The CET-4 listening comprehension intends to assess students ability to acquire oral information in the aspects of comprehending main idea and important facts, comprehending implied meaning and comprehending listening materials with the aid of language features. The content of these two subtests will be analyzed through comparing them with the statements in the test syllabus.2.2.2 What ability has been assessed?Based on the test syllabus, the listening test intends to examine students ability to acquire information from audio messages. Table 2 below indicates what ability has been assessed in the two subtests.Table 2. Abilities Assessed in the Two TestsLanguageAbilityTest ATest BItem No.%Item No.%Important Facts11,12,13,14,15,16,17,18,19456,8,9,10,11,12,14,15,16,17,18,20,21,22,24,26-3364Main Ideas3,6,8,20203,7,13,19,23,34-3622Implied Meaning1,2,4,5,7,9,10351,2,4,5,2514From a broader view, the weighting of listening subtest increased from 20% to 35% of the whole CET test, which coheres with current National Syllabus which puts special emphasis on developing students listening and speaking skills. Besides, as shown in the table, due to the reduced number of short conversations and increased number of long conversations and short passages, less emphasis has been attached to inferring the implied meaning, while obtaining important facts weights much more in Test B than in Test A. This is consistent with the current social environment where the ability to effectively and efficiently pick out key information is vital to succeed in life. 2.2.3 What topics have been covered?In addition to the types of listening skills being tested, the topics of the speech event also determine the content validity. According to the test syllabus, the requirement for listening comprehension is to understand classroom English, daily interaction, lectures on general topics and slow speed English programs. The following table shows how much the two subtests represent this requirement.Table 3. Topics CoveredTopicsTest ATest BItem No.Item No.Study & Campus3,4,6,8,10Other Daily Interaction1,2,5,7,91,2,3,4,5,6,7,8,9-12,13-15Research Findings11-1316-18,26-36General Topics14-16,17-2019-21,22-25It is striking that in Test A, half of the short conversations are about study or campus life, whereas there is none in Test B. If the information given by the speaker is outside the candidates experience, the cognitive complexity involved in the listening process will soar. Test-takers for CET are mainly second year university students, so they are likely to be familiar with campus or classroom English. In this sense, the difficulty of Test B is higher than that of Test A. In fact, there are a number of speech events in Test B that are unlikely to be familiar with university students, such as complaining about the problems with a commodity purchased to a company staff in one of the long conversations in Test B. By means of excluding test-takers from drawing on ready-made or pre-packaged solutions, we can ensure that it is only the listening ability rather than the background knowledge that has been assessed.There is not much difference between the two tests in listening to short passages part, which has a good balance between passages on research findings and on general topics. However, it is worth noting that the newly added compound dictation in Test B is about a new research finding, and that writing down the key words and summarizing concluding sentences also greatly raise the difficulty of the listening test. Through measuring how well candidates obtain useful information from abundant messages by way of employing listening skills only, it is possible to test whether they can use English in a well-rounded way.3 Authenticity, Interactiveness and Practicality3.1 AuthenticityThe listening material is based on standard American English or standard British English, and the speed of speaking is approximately 130 words per minute. In both Test A and Test B, the listening materials are read by native speakers based on pre-written scripts rather than recordings of naturally-occurring conversations. Therefore, the two tests are not authentic in terms of authentic language. In fact, there are few instances of natural linguistic features unique to listening, such as hesitations, ellipsis, false start, pauses and variable speeds in those two listening materials. In this case, we will only analyze the representativeness of psychological features unique to listening. Rost (2002) describes a set of psychological characteristics representative of oral English, which are negotiating mode (interacting with speaker to clarify and expand meaning), constructive mode (working out a meaning relevant to the situation) and transformative mode (influencing the speakers ideas). The following table demonstrates how much the two tests reflect the unique psychological features of listening in the short conversation part.Table 4. Psychological FeaturesPsychological FeaturesTest ATest BItem No.%Item No.%Negotiating Mode1,3,4,5,6501,3,837.5Constructive Mode10102,6,737.5Transformative Mode2,7,8,9404,525As can be seen, Test B calls slightly more attention to the constructive feature of listening process which involves noticing what is said and reframing the speakers message in a way that is relevant to the listener himself. This feature is one step further comparing to negotiating meaning that has occupied half of short conversations in Test A, because listeners are required not only to respond and understand what the speaker has said, but also to work out why the speaker is talking to them. The comprehension of constructive interaction reflects the new demands for creativity in the context of classroom and workplace. Generally speaking, the short conversations in the two tests are authentic in terms of the psychological features unique to listening since they all demonstrate the interativeness of listening activities in one way or another. In the two long conversations in Test B, we see more examples of linguistics features of spoken language, such as tag questions, back-channeling, fillers, hedging and simpler sentence structures as well as a combination of several psychological features. Therefore, the introduction of long conversations into the listening test greatly increases the authenticity of listening materials.3.2 InteractivenessInteractiveness is another important element in the test usefulness framework proposed by Bachman and Palmer, which refers to “the extent and type of involvement of the test takers individual characteristics in accomplishing a test task (1996: 25).” The characteristics that are most relevant for language testing include candidates language ability, topical knowledge and affective schemata. Referring back to our previous discussion about language knowledge required by the task based on Table 1, Test B involves understanding of larger chunks of language and demands more complex pragmatic and sociolinguistic knowledge to complete the tasks. From this perspective, test-takers have more chance to make use of their personal linguistic experience and social perceptions in accomplishing Test B than Test A. Regarding to the topical knowledge, both tests presuppose an appropriate area of knowledge, with general topics occupying the overwhelming majority. However, as we have discussed, Test B contextualizes candidates into a number of situations that are not familiar to university students. Consequently, less pre-packaged solutions are involved in accomplishing Test B which in turn demands higher listening ability to find out answers to the questions. The involvement of candidates affective schemata is the most hard to measure with the absence of information about their opinion towards the test, but judging from the introduction of a variety of listening tasks and n

人人文库> 全部分类> 应用文书 > 技术指导

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

Test uation.doc

文档简介

温馨提示

最新文档

评论

Test uation.doc

文档简介

温馨提示

最新文档

评论

相关文档