
ALiBi: The Simple Bias Adjustment That Enables Transformers to Handle Long Contexts Without Losing Track
Consider a scenario in extending a model’s context window for tasks like analyzing extensive legal transcripts that could reach 100,000 tokens. Various approaches might be tried, such as scaling rotary


