Back in the days of 2021
there was a lovely evaluation paper:
Automatically identifying label errors
Improving score's reliability
Finding example's difficulty
Active Learning
https://aclanthology.org/2021.acl-long.346/
@par @hoyle
#machinelearning #evaluation #IRT #LLM #deepRead