Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

This Research paper evaluates taxonomic biases in language models such as GPT-3.

The main findings are:

  • GPT-3 based quality filters don’t correlate with human set levels of quality or factuality.
  • GPT-3 based quality filters disciminate against writing styles that fall outside styles dominant on the web (news and wiki).

No clear solutions for fixing these problems are presented.

Backlinks

No backlinks yet


The whole shebang