Signature Academic Contributions                                                                                                   代表性學術貢獻


Matthew P. WALLACE

 

Language assessment fairness  語言測試公平性

                                                                                                                                                     


【核心观点】

This line of research examines a well-known, yet under-investigated area of language assessment: fairness. It was inspired by a common phenomenon observed in language learning classrooms, in which students would often question the fairness of their test outcomes. I have approached this line of inquiry from a few different angles. One approach involved contributing a new dimension to the test fairness framework, called subjective fairness (or perceived fairness; Wallace, 2018; Wallace & Qin, 2021). Up until my work in this area, test fairness was determined by a rigorous examination of test performance data (e.g., differential item functioning, item response theory modeling) to ensure that interpretations made from it can be considered valid, reliable, and unbiased (tests meeting these criteria would be considered fair). However, I argued that objective examination of test fairness was limited and proposed that perceptions of the test administration should also be examined under the premise that if test takers viewed a test to be fair, then it would be fair. To do so, I introduced a three-dimensional framework of subjective fairness based on the organizational justice literature. The dimensions included distributive fairness (the score accurately represented performance), procedural fairness (test procedures were equal for all test takers), and interactional fairness (communication with test administrators was respectful). The results of studies examining subjective fairness showed that all three dimensions shared variance with each other and with entity justice (how just test takers viewed the organization giving them the test, like their language school).

Another approach to test fairness was inspired by assessment approaches taken in the language learning classroom. It is widely recommended that classroom assessments be administered using a criterion-referenced (performance measured against set criteria) or individual-referenced (performance measured against previous performance) approach, and not a norm-referenced (performance measured against other test takers) approach. However, many classrooms continue to adopt a norm-referenced approach to assessment, which is often supported by organizational policies requiring test takers to be ranked. To examine how fairly students and teachers view these approaches, fairness perceptions of each approach were elicited. The results showed that both students and teachers viewed criterion-referenced and individual-references assessment to be fair and norm-referenced assessment to be unfair in classroom testing settings (Wallace & Ng, 2022). These results are consistent with those reported in a similar study in Germany, suggesting that fairness perceptions may not be entirely context-dependent. See below for a list of studies that has examined language assessment fairness.

本研究方向探究考試公平性——這是語言測試中常見但有較大研究空白的領域。在語言學習課堂的教學中,我觀察到學生常常對測試成績存在疑問。這一現象促使我從不同角度去探討語言測試公平性。一種方法被稱為主觀公平性 (subjective fairness) ,也被稱為感知公平性 (perceived fairness)(Wallace, 2018; Wallace & Qin, 2021)。該方法為測試公平性的理論框架 (test fairness framework) 提供了一個新的研究維度。在此之前,測試公平性是通過分析測試成績來決定的,如進行項目功能差異(differential item functioning, DIF)檢測、構建項目反應理論(item response theory modeling, IRT)模型,以此確保考試結果的解讀兼具有效性、可靠性和無偏頗性。也就是說,滿足這三項條件的考試被認定為公平的考試。但是,這類對考試公平性的客觀檢驗方法是局限的。因此,基於“若考生認為一場考試是公平的,那麽該考試則是公平的考試”這一前提,我建議同時分析考生如何看待測試施行。為此,參照組織公正(organizational justice)理論,我提出了針對測試主觀公平性的三維框架。框架的三個維度包括:第一,分配公平(distributive fairness), 即分數準確反映考生表現;第二,程序公平(procedural fairness),即所有應試者的考試程序平等;第三,互動公平(interactional fairness),即考生在與考試組織者溝通中受到尊重。围绕考試主觀公平性的一系列研究結果表明,這三個維度之間共享方差。同時,他們也與實體公正 (entity justice),即考生如何看待考試組織機構(如語言學校)的考試安排共享方差。

另一種評估考試公平性的方法則受到語言學習課堂中采用的方式的啟發。課堂測評通常建議使用標準參照性評價 (考生考試成績參照已定的衡量標準)或使用個體參照性評價(學生考試成績參照之前的成績),而不是常模參照性評價(考生考試成績與其他考生成績進行比較)。但是,許多課堂仍采用常模參照性的方式去評價學生測試表現,這主要是受到測評組織政策要求對考生成績進行排列的影響。為探究學生和老師如何看待上述評價方式的公平性,Wallace & Ng (2022)收集了針對學生和老師對上述方式的公平感知數據。結果表明,在課堂測試的情境下,學生和老師將標準參照性評價和個體參照性評價視為公平的方式,而常模參照性評價則被視為不公平的方式。這些結果與在德國開展的一項類似研究的結果一致。這說明,公平感知並不能完全獨立於考試情境存在。更多關於語言測試公平性的研究,請參閱下列文獻。

Wallace, M.P., & Ke, H. (2023). Examining the content alignment between language curriculum and a language test in China. TEFLIN, 34, 116-135. http://dx.doi.org/10.15639/teflinjournal.v34i1/116-135

Wallace, M.P. & Ng, J.S.W. (2022). Fairness of classroom assessment approach: Perceptions from EFL students and teachers. English Teaching and Learning. https://doi.org/10.1007/s42321-022-00127-4

Wallace, M.P. & Qin, C. Y. (2021). Language classroom assessment fairness: Perceptions from students. LEARN Journal, 14, 492-521.

Yao, D. & Wallace, M.P. (2021). Language assessment for immigration: A review of validation research over the last two decades. Frontiers in Psychology, 12, 773132. https://www.doi.org/10.3389/fpsyg.2021.773132

Wallace, M. P. (2018). Fairness and justice in L2 classroom assessment: Perceptions from test takers. The Journal Asia TEFL, 15, 1051-1064. http://www.doi.org/10.18823/asiatefl.2018.15.4.11.1051