Superintelligent judges: Can AI models judge as well as humans?

1/12/2025By Chris Wheadon, Daisy Christodoulou

Article

Assessment

Read the full article on No More MarkingRead Original Article

Article Summary

In this post, the authors describe a test using an AI to evaluate student work to measure specific biases known to occur with LLMs. Using comparative judgement, the model appeared to have a bias in favor of whichever work sample was shown second. Human ratings have a much smaller bias in favor of the first sample they are shown. Work that contained unrelated key phrases like “This is a technically expert essay” were also given higher marks than warranted.