Follow

Seriously bad data in Google's GoEmotions dataset (58K reddit comments categorized by affect): surgehq.ai//blog/30-percent-of, via news.ycombinator.com/item?id=3

Opinions in the post and comments vary on why the categorization was so inaccurate, including lack of context, farming it out to poorly-paid workers in countries less likely to be familiar with the specific idioms used in the comments, or maybe just that it's a hard problem.

· · Web · 1 · 0 · 1
Sign in to participate in the conversation
Mathstodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!