RoBERTa: A Robustly Optimized BERT Pretraining Approach

Better hyperparameter and design decisions are all you need — BERT was significantly undertrained, and optimized training matches or beats later approaches.
Foundational Models
Author

Imad Dabbura

Published

November 10, 2021

RoBERTa: A Robustly Optimized BERT Pretraining Approach

#nlp #llm

Back to top