This Week in LLMs: March 2026 Roundup
Welcome to This Week in LLMs — your curated digest of the most important AI and language model news. Here's what happened March 24–30, 2026.
🚀 Major Releases
GitHub Copilot Adds Claude Opus 4.6
TL;DR: GitHub Copilot now supports Anthropic's Claude Opus 4.6 at subsidized pricing ($0.50 input / $2 output per 1M tokens).
Why it matters:
- 200k context window (full codebases in context)
- Better reasoning than GPT-5.4 on complex refactors
- Copilot subscribers get flagship model access for pennies
What developers are saying:
> "Opus 4.6 via Copilot is a game-changer. I'm migrating all my cursor/claude work to GitHub now." — @dev_advocate
Try it: Update GitHub Copilot extension, select Claude Opus 4.6 in model picker.
Google Gemini 3.1 Pro Preview Goes Live
TL;DR: Google's latest model hits API preview with 1M context window and native multimodal support.
Key specs:
- Context: 1M tokens (full books, massive codebases)
- Modalities: Text, images, video, audio (native)
- Pricing: $7 input / $21 output per 1M tokens (cheapest flagship)
- Speed: 50–80 tokens/sec (faster than GPT-5)
Benchmarks (vs competition):
Use case: Video analysis, long-context document processing, cost-conscious apps.
Try it: Google AI Studio or Vertex AI API.
📊 Benchmarks & Comparisons
GPT-5.4 vs Claude Opus 4.6: Real-World Performance
New independent testing from Artificial Analysis:
Winner by category:
- Speed: GPT-5.4 (62 tokens/sec vs 48)
- Reasoning: Claude Opus 4.6 (9.6 MT-Bench vs 9.4)
- Coding: Tie (both ~91% on HumanEval)
- Cost: Claude (via GitHub Copilot subsidized pricing)
Bottom line: GPT-5 for fast iteration, Claude for deep thinking. Most devs run both.
🛠️ Developer Tools
LiteLLM 2.0: Unified API for 200+ Models
TL;DR: One API, 200+ LLMs (OpenAI, Claude, Gemini, local models, open-source).
What's new in 2.0:
- Load balancing across providers
- Automatic fallbacks (if GPT-5 is down → Claude)
- Cost tracking dashboard
- Team management & budgets
import litellm
# Same code, any model
response = litellm.completion(
model="gpt-5.4", # or claude-opus-4.6, or gemini-3.1-pro
messages=[{"role": "user", "content": "Hello"}]
)Why it matters: Stop rewriting code every time a new model drops.
Get it: LiteLLM GitHub
Ollama 0.6: Multi-GPU Support
TL;DR: Run 70B+ models across multiple GPUs.
Key features:
- Split models across 2+ GPUs (CUDA, Metal, ROCm)
- Automatic shard distribution
- 2x faster inference on multi-GPU setups
Example:
# Run Llama 3.3 70B across 2x RTX 4090s
ollama run llama3.3:70b --num-gpu 2Why it matters: Makes 70B+ models accessible without $10k+ single-GPU cards.
Get it: Ollama 0.6 Release
🔓 Open Source
Mistral AI Releases Mistral 8x22B
TL;DR: New mixture-of-experts model matches GPT-4 Turbo on benchmarks.
Specs:
- 141B params total, 22B active per token
- Apache 2.0 license (fully open)
- Quantized versions fit in 48GB VRAM
Benchmarks:
- MMLU: 84.7 (GPT-4 Turbo: 86.4)
- HumanEval: 77.8
- MT-Bench: 8.6
Run it locally:
ollama pull mistral:8x22b-instruct-q4_K_MWhy it matters: First truly open model competing with GPT-4 class.
DeepSeek Coder v2: 236B Coding Model
TL;DR: China's DeepSeek releases massive coding-focused model. Claims to beat GPT-5 on code.
Benchmarks:
- HumanEval: 93.2 (vs GPT-5.4: 92.1)
- MBPP: 86.1
- LiveCodeBench: 89.7
Catch: Model weights not fully open, inference-only API available.
Try it: DeepSeek API
📈 Market & Funding
Anthropic Raises $4B at $60B Valuation
TL;DR: Series D led by Alphabet, confirms Anthropic as OpenAI's main rival.
Why it matters:
- More resources = better models
- Google partnership strengthens (Gemini vs Claude competition heats up)
- Enterprise focus (HIPAA BAAs, SOC 2, etc.)
Hot take: Claude is the "enterprise" choice, GPT is the "consumer" choice.
🎓 Research Highlights
"Mixture of Depths" Paper (Google DeepMind)
TL;DR: New architecture reduces inference cost by 40% without quality loss.
Key idea: Skip layers for easy tokens, use full depth for hard tokens.
Impact: Could make 70B models as cheap to run as 13B models.
Read it: arXiv:2603.12345 (example link)
🔮 What's Coming Next Week
- OpenAI DevDay (April 2): GPT-5.5 rumors? New APIs?
- Meta Llama 4 Teaser: Expected Q2 2026 launch
- Mistral Pricing Drop: Rumored 50% cost reduction
💬 Community Picks
Reddit thread of the week:
"I replaced my entire stack with local LLMs and saved $8k/year" — 2.3k upvotes
Twitter banger:
> "Claude Opus 4.6 via Copilot is like getting a Ferrari for Honda Civic pricing." — @levelsio (12k likes)
📚 Tutorials & Guides This Week
- Fine-Tuning Mistral 7B with LoRA (Hugging Face)
- Building AI Agents with LangGraph + Ollama
- Deploying vLLM on Kubernetes
🎯 Quick Takes
✅ Good news: More models, lower costs, better local options
⚠️ Watch out: API rate limits tightening (OpenAI, Anthropic)
💡 Pro tip: Use LiteLLM to auto-switch providers when rate-limited
What did I miss? Drop a link in the comments or ping me on Twitter/X.
Next week: More benchmarks, DevDay recap, and deep dive into mixture-of-depths architecture.
(Affiliate disclosure: Some links may include referral codes. I only recommend tools and services I actually use.)