How do we assess student learning to determine if they are learning the things we think they’re learning, and to determine what the most common misconceptions are?
See also Item response theory, Pedagogy, Statistical misconceptions, Think-aloud interviews, Cognitive task analysis.
Morrison, J. A., & Lederman, N. G. (2003). Science teachers’ diagnosis and understanding of students’ preconceptions. Science Education, 87(6), 849–867. doi:10.1002/sce.10092
Interviews and classroom observations of four “exemplary” high school science teachers. All acknowledged that understanding student misconceptions is important, but used no formal tool to do so; all tried to ask questions in class, but several teachers essentially ignored the answers to the questions and didn’t do anything with the misconceptions revealed by incorrect answers. The transcripts of classroom discussions, particularly Bill’s, are interesting.
Klymkowsky, M. W., Taylor, L. B., Spindler, S. R., & Garvin-Doxas, R. K. (2006). Two-dimensional, implicit confidence tests as a tool for recognizing student misconceptions. Journal of College Science Teaching, 36(3), 44–48.
Suggests using “two-dimensional tests” to make multiple-choice exams more useful for assessing conceptual understanding. Have students indicate confidence in their answers along with the answers, and assign points based on both confidence and correctness, giving them incentive to admit when they have no idea what the answer is (lest they get negative points for picking incorrectly with confidence). When they do pick incorrectly with confidence, you know there’s something systematically wrong with their understanding. Works best when the distracter answers are well-chosen to represent real misunderstandings.
James, M. C., & Willoughby, S. (2011). Listening to student conversations during clicker questions: What you have not heard might surprise you! American Journal of Physics, 79(1), 123–132. doi:10.1119/1.3488097
Took classes where students are encouraged to discuss clicker questions before selecting their answers, and recorded the discussions. 12.5% of discussions revealed “unanticipated student ideas” not represented by any of the wrong answers in the questions, including fairly serious misconceptions (“When they say mass, they don’t mean gas because gas would not have any mass because it’s not solid”). Students also frequently arrived at answers using “cues that were tangential or irrelevant to the concepts highlighted in answer alternatives”, such as words the instructor used recently or cues in the question phrasing. Suggests not grading purely for correctness, lest students just passively accept whatever a classmate says as the right answer.
Bowen, C. W. (1994). Think-aloud methods in chemistry education: Understanding student thinking. Journal of Chemical Education, 71(3), 184. doi:10.1021/ed071p184
Practical discussion of how to use think-aloud interviews to assess student understanding, by having students solve open-ended questions while explaining their reasoning aloud. The method seems to be fairly widely used. See also Think-aloud interviews.
Wren, D., & Barbera, J. (2013). Gathering evidence for validity during the design, development, and qualitative evaluation of thermochemistry concept inventory items. Journal of Chemical Education, 90(12), 1590–1601. doi:10.1021/ed400384g
Practical discussion on designing concept inventory questions, checking validity, using think-aloud interviews to improve questions, and designing distracter answers.
Adams, W. K., & Wieman, C. E. (2011). Development and validation of instruments to measure learning of expert-like thinking. International Journal of Science Education, 33(9), 1289–1312. doi:10.1080/09500693.2010.512369
In-depth systematic discussion on strategies for selecting topics to include on a concept inventory, interviewing faculty, conducting think-aloud interviews with students, validating questions, and selecting questions. Includes a bit of psychology about picking questions that convince instructors that both their teaching works (students improve dramatically on some questions) and that it can be improved (students don’t improve on others that the instructor thought were easy). Also gives tips on administering the resulting test to students.
Wilson, M., & Scalise, K. (2006). Assessment to improve learning in higher education: The bear assessment system. Higher Education, 52(4), 635–663. doi:10.1007/s10734-004-7263-y
Describes an integrated method for developing assessments that are used as the course progresses, not administered in one shot. Involves breaking down the course concepts, figuring out what constitutes expert thinking in each topic, building assessment questions (including open-ended and written questions), and embedding these questions in the course so instructors get regular feedback about student learning.
Jorion, N., Gane, B. D., James, K., Schroeder, L., DiBello, L. V., & Pellegrino, J. W. (2015). An analytic framework for evaluating the validity of concept inventory claims. Journal of Engineering Education, 104(4), 454–496. doi:10.1002/jee.20104
A comprehensive demonstration of the analyses one might do to validate a concept inventory psychometrically, using three inventories as examples (the Concept Assessment Tool for Statics, the Statistics Concept Inventory, and the Dynamics Concept Inventory). Shows how to use item response theory, factor analyses, and other data analysis to evaluate whether a concept inventory can be used to provide reliable evidence of student misconceptions or student learning.
Wilcox, B. R., Zwickl, B. M., Hobbs, R. D., Aiken, J. M., Welch, N. M., & Lewandowski, H. (2016). Alternative model for administration and analysis of research-based assessments. Physical Review Physics Education Research, 12(1). doi:10.1103/physrevphyseducres.12.010139
Discusses implementation of a centralized assessment system, where course instructors at any institution can easily sign up, get a unique link to have their students take an assessment, automatically get summary reports on their class’s performance, and compare their results against a dataset of results at many institutions. This makes it easy for many instructors to adopt an assessment, instead of it only being used at one or two institutions.