See also Pedagogy.
Student course evaluations are a standard feature of college instruction. Faculty know they suffer from various biases – particularly selection bias, where only the particularly angry or amazingly happy students fill out their evaluations. The usual advice is to take evaluations with some skepticism, and to understand that they are only useful to identify broad trends, not rank individual instructors.
However, there turn out to be other biases and issues beyond selection bias.
Mitchell, K. M. W., & Martin, J. (2018). Gender bias in student evaluations. PS: Political Science & Politics, 1–5. doi:10.1017/s104909651800001x
Compares course evaluations between two sections of an online course, one taught by a man and the other by a woman. The “lectures, assignments, and content were exactly the same in all sections”; the only difference was grading and interaction with the instructor. Student evaluations rated the female instructor worse, but also rated the course itself and its technology lower, even though they were identical. Free-form comments on RateMyProfessors also showed gender bias, focusing on the female instructor’s personality and appearance more than for the male instructor.
The paper opens with a truly horrifying student email to the female instructor: “I want you personally to know I have hated every day in your course, and if I wasn’t forced to take this, I never would have. Anytime you mention this course to anyone who has ever taken it, they automatically know that you are a horrific teacher, and that they will hate every day in your class.”
MacNell, L., Driscoll, A., & Hunt, A. N. (2014). What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291–303. doi:10.1007/s10755-014-9313-4
A clever experiment, also with an online course. The course was split into discussion groups, each taught by either a male or female assistant instructor. “Each instructor was responsible for grading the work of students in their group and interacting with those students on course discussion boards. Each assistant instructor taught one of their groups under their own identity and the second group under the other assistant instructor’s identity.” They found “there is a significant difference in how students rated the perceived male and female instructors, but not the actual male and female instructors.” But comparing by identify, “the male identity received significantly higher scores on professionalism, promptness, fairness, respectfulness, enthusiasm, giving praise, and the student ratings index.” (These differences were about half a point on a 0-5 Likert scale.) They “contend that female instructors are expected to exhibit such traits and therefore are not rewarded when they do so, while male instructors are perceived as going above and beyond expectations when they exhibit these traits.”
From pedagogy we learn that students find it difficult to evaluate their own learning and often do not develop expert-like thinking in a course. Are students able to give course evaluations which accurately reflect how well the course taught them?
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22–42. doi:10.1016/j.stueduc.2016.08.007
“Our up-to-date meta-analysis of all multisection studies revealed no significant correlations between the SET ratings and learning.”
This contrasts against previous studies and meta-analyses, which did find correlations. The authors suggest that “small-to-moderate SET/learning correlations may be an artifact of small sample sizes of most of the primary studies and small sample bias.” (See Statistical power and underpowered research.)
Uttl, B., White, C. A., & Morin, A. (2013). The numbers tell it all: Students don’t like numbers! PLoS ONE, 8(12), e83443. doi:10.1371/journal.pone.0083443
Surveying freshmen in intro psychology classes before they took any quantitative courses, “the mean interest in statistics courses was nearly 6 SDs below the mean interest in non quantitative courses. Moreover, women were less interested in taking quantitative courses than men.” Ouch. Suggests that judging faculty teaching quantitative courses by the same student evaluation standards is a bad idea, since student interest in the courses is so dramatically different. Also, “the lack of interest in quantitative and research methods courses among undergraduate students also threatens the very existence of psychology as well as other fields as a science.”