Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
This Research paper evaluates taxonomic biases in language models such as GPT-3.
The main findings are:
- GPT-3 based quality filters don’t correlate with human set levels of quality or factuality.
- GPT-3 based quality filters disciminate against writing styles that fall outside styles dominant on the web (news and wiki).
No clear solutions for fixing these problems are presented.
Backlinks
No backlinks yet