Seriously bad data in Google's GoEmotions dataset (58K reddit comments categorized by affect):, via

Opinions in the post and comments vary on why the categorization was so inaccurate, including lack of context, farming it out to poorly-paid workers in countries less likely to be familiar with the specific idioms used in the comments, or maybe just that it's a hard problem.

· · Web · 1 · 0 · 1
Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!