The paper is written in a period when NLP practitioners are producing bigger (# of parameters; size of training data) language models (LMs), and pushing the top scores on benchmarks.
Environmental Risks Large LMs consume a lot of resources, e.g. training a single BERT base model on GPUs was estimated to use as much energy as a trans-American flight.
Marginalized communities are doubly punished. They are least likely to benefit from LMs, e....