← Back to Articles

Superintelligent judges: Can AI models judge as well as humans?

By Chris Wheadon, Daisy Christodoulou
Article
Assessment
Read the full article on No More MarkingRead Original Article

Article Summary

In this post, the authors describe a test using an AI to evaluate student work to measure specific biases known to occur with LLMs. Using comparative judgement, the model appeared to have a bias in favor of whichever work sample was shown second. Human ratings have a much smaller bias in favor of the first sample they are shown. Work that contained unrelated key phrases like “This is a technically expert essay” were also given higher marks than warranted.