Standardized Tests
April 6, 2008
|
Assigned Readings Schultz, K.S. & Whitney, D.J. (2005) Measurement Theory in Action. Thousand Oaks, CA: Sage Publications. Ch. 11-13, pp. 171-213. e-reserve Dehaene, S., Izard, V., Pica, P. & Spelke, E. (2006) Core knowledge of geometry in an Amazonian indigene group. Science 311 (5759), 381-384. |
|
Class Preparation This week you will complete some exercises prior to coming to class. Please bring the completed exercises to class with you. I will want you to submit them for my review. Look for your name below to see which exercises are your responsibility. Exercise 11.1 on pages 184-185 of Schultz & Whitney -- Everyone. Complete prior to class. Exercise 11.2 on pages 185-186 of Schultz & Whitney. You will be assigned to complete one of four tests. Each individual should complete the exercise ALONE. This is NOT a group exercise. Complete the exercise prior to class and be prepared to discuss your decisions in class.
Exercise 11.3 on pages 187 of Schultz & Whitney. In-class exercise. Exercise 12.1 on pages 207-211 of Schultz and Whitney. First, complete the computations required using the website indicated by the authors (Part 1 of the exervise). Follow the instructions exactly for entering the data. Bring your output to class. You will complete Part 2 of the exercise in class. Scoring Exercise. Everyone should complete both methods of test scoring (Angoff & Nedelsky) on the test scoring sheet. For the Angoff method: Enter in decimal form the percentage (from 0.00 to 1.00) of "reasonably competent undergraduate students in FYC" that you think should be able to answer the question correctly. E.g., if you think a question is "easy" you might enter 0.90. If you think a question is difficult, you might enter 0.20. For the Nedelsky method: There are four possible answers for each question. As is typical of multiple choice questions, three of the four are "distractor" responses. That is, they are incorrect. Some distractors are very obvious. For example, I ask a multiple choice question about the president of the United States in 1990. The choices are George Bush, Abraham Lincoln, Bill Clinton and Barack O'Bama. The Abraham Lincoln and Barack O'Bama choices are easy to eliminate as a possible correct answers for many people. This means that the chance that a respondent will guess the right answer just went from 1 in 4 (0.25) to 1 in 2(0.50). On the other hand if my choices are George W. Bush, Bill Clinton, Jimmy Carter and George H.W. Bush, it's a lot harder to eliminate choices. In this case, for many people. the probability of guessing a correct answer may remain at 1 in 4 (0.25). For each item on the test, indicate 0.25, 0.33, 0.50 or 1.00 -- 0.25 if you think it's hard to eliminate any of the responses as clearly an incorrect distractor, 0.33 if you think most FYC students could eliminate one distractor easily, 0.50 if you think most FYC students could eliminate two distractors easily, and 1.00 if you think most FYC students could eliminate three distractors easily. Return your completed test scoring sheet to me by 5:00 p.m. on Friday, April 4. We'll complete the exercise in class Additional Readings Boodoo, G.M. (1998) Addressing cultural context in the development of performance-based assessments and computer-adaptive testing: preliminary validity considerations. Journal of Negro Education 67 (3), 211-219. Downing, S.M. (2003). Item response theory: applications of modern test theory in medical education. Medical Education 37 (8), 739-745. Downing, S.M. & Haladyna, T.M. (2004) Validity threats: overcoming interference with proposed interpretations of assessment data. Medical Education 38 (3), 327-333. Hambleton, R.K. & Patsula, L. (1998). Adapting tests for use in multiple languages and cultures. Social Indicators Research 45, 153-171. Messick, S. (1995). Standards of validity and the validityof standards in performance assessment. Educational Measurement: Issues & Practice 14(4), 5-8. Van de Vijver, F. & Hambleton, R.K. (1996). Translating tests: some practical guidelines. European Psychologist 1 (2), 89-99. |
|
Due Today None |
|
Additional Resources Sommer, R. & Sommer, B. (2002) A Practical Guide to Behavioral Research. Tools and Techniques. New York, Oxford University Press. Read Ch. 16, pp. 224-233. How to find (and buy, mostly) just about any test ever developed. Computer Assisted Assessment Center This site provides a lot of detail about question development. It includes specific sections on many different question formats -- multiple choice, true false, short answer, etc. The Faculty Development Center at the University of Pittsburgh has a good discussion of how to write and grade essay questions and tests. Kehoe, J. (1995). Basic item analysis for multiple-choice tests. Practical Assessment, Research & Evaluation, 4(10). Retrieved February 8, 2006 from http://PAREonline.net/getvn.asp?v=4&n=10 . Zurawski, R.M. (1998) Making the most of exams. Procedures for item analysis. National Teaching & Learning Forum 7(6). Retrieved February 8, 2006 from http://www.ntlf.com/html/pi/9811/exams_1.htm The Scoring Office of Michigan State University provides an excellent discussion of item analysis. Free, on-line software from California State University for computing item analysis statistics. The Professional Testing Corporation provides a brief, but excellent discussion of different approaches to establishing passing scores on tests. Research Articles Bailey, A.J. (2006) What kind of assessment for what kind of geography? Advanced placement human geography. The Professional Geographer 58 (1), 70-77. Cahan, S. (2001) Schooling and the norming of intelligence test scores. Educational Measurement: Issues & Practice 19 (3), 26-33. Impara, J.C. & Palke, B.S. Standard setting: an alternative approach. Journal of Educational Measurement 34(4), 353-366. Jordan, E.R., Atkins, S., van Niekerk, A. & Seedat, M. (2005) The development of an instrument measuring unintentional injuries in young children in low-income settings to serve as an evaluation tool for a childhood home injury prevention program. Journal of Safety Research 36 (3), 269-280. Kritikos, V., Pharm, B., Pharm, M.,Krass, I. et al. (2005) The validity and reliability of two asthma knowledge questionnaries. Journal of Asthma 42 (9), 795-801. Landa, R.J. (2005) Assessment of social communication skills in preschoolers. Mental Retardation and Developmental Disabilities Research Reviews 11 (3) 247-252. LeFebre, J., Smith-Chant, B.L., Fast, L., Skwarchuk, S. et al. (2005) What counts as knowing? The development of conceptual and procedural knowledge of counting from kindergarten through Grade 2. Journal of Experimental Child Psychology 93(4), 285-303. Lumley, T. & O'Sullivan, B. (2005) The effect of test-taker gender, audience and topic on task performance in tape-mediated assessment of speaking. Language Testing 55, 415-437. Siingh-Manoux, A., Richards, M. & Marmot, M. (2005) Socioeconomic position across the lifecourse: how does it relate to cognitive function in mid-life? Sireci, S.G., Scarpati, S.E. & Li, S. (2005) Test accommodations for students with disabilities: an analysis of the interaction hypothesis. Review of Educational Research 75(4), 457-490. Stiggins, R.J. (2001) The unfulfilled promise of classroom assessment. Educational Measurement: Issues & Practice 20 (3), 5-15. |