Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

LLM
NLP
DL
Author

Imad Dabbura

Published

May 2, 2025

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

#nlp #llm