Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) ze...

By · · 1 min read

Source: syncedreview.com

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.