# Research Index — SSMs vs Transformers
> Central hub for all research notes. Updated continuously.
> **[[../STEERING.md]]** — project steering document
> Last updated: 2026-05-04 01:01 PDT
## Notes Inventory
| Note | Lines | Summary | Status |
|------|-------|---------|--------|
| [[transformers-basics]] | 544 | History, 5 analogies, QKV, positional encoding, products, O(n²), 22 footnotes | ✅ |
| [[ssm-basics]] | 669 | Kalman→HiPPO→S4→Mamba-2, 5 metaphors, 3 modes, ABCD, 34 footnotes | ✅ |
| [[historical-narrative]] | ~140 | Story arc: RNN→LSTM→Transformer→SSM with timeline, quotes, context | ✅ |
| [[sequence-processing-comparison]] | 324 | Parallel vs sequential, filing cabinet vs notebook, LRA benchmarks | ✅ |
| [[computational-complexity]] | 266 | O(n²) handshake/O(n) scribe, GPU memory tables, real benchmarks | ✅ |
| [[strengths-and-weaknesses]] | 187 | Quick-reference trade-off table, Pareto frontier | ✅ |
| [[zoology-associative-recall]] | ~80 | Zoology paper: 82% quality gap = associative recall; 70M attn > 1.4B SSM | ✅ |
| [[analogies-and-intuitions]] | 406 | 15+ scored analogies (access/accuracy/memory ratings), master framework | ✅ |
| [[real-world-products]] | 404 | Models table, Falcon Mamba 7B benchmarks, NVIDIA 8B hybrid study | ✅ |
| [[hybrid-models]] | 315 | Jamba, Griffin, Hawk, RWKV-6, Zamba, RetNet, when-to-use table | ✅ |
| [[current-landscape-2025]] | 290+ | 2025–2026 debate: SSM case, Transformer countercase, Jamba, Griffin, xLSTM, Titans, **Mamba-3 (ICLR 2026)** | ✅ |
| [[applications-and-use-cases]] | ~110 | Genomics (HyenaDNA 1M tokens), audio, video, edge deployment | ✅ |
| [[teaching-best-practices]] | 368 | 4 pedagogy profiles, 10-step progression, anti-patterns | ✅ |
| [[anti-patterns]] | 211 | 10 documented failure modes with real examples | ✅ |
| [[diagrams-and-visuals]] | 930+ | 20 ASCII+Mermaid diagrams, 3 Mermaid ready-to-render, 4 designer specs | ✅ |
| [[efficient-transformers]] | 468 | FlashAttention, sparse/linear attn, RWKV, LRU, Griffin, convergence table | ✅ |
| [[key-facts-for-report]] | ~140 | All verified numbers, quotes, 13 key papers, LRA benchmark table (corrected) | ✅ |
| [[induction-heads-icl]] | ~150 | Induction heads, in-context learning mechanism, AR measurements, why SSMs struggle | ✅ |
| [[lra-benchmarks]] | ~160 | Full LRA table (S4: 86.09% avg vs Transformer: 54.39%), Path-X breakthrough analysis | ✅ |
| [[advanced-technical-topics]] | ~417 | Position encodings, ICL in SSMs, length generalisation, Vision SSMs (ViM, VMamba) | ✅ |
## Total Research: ~8,500+ lines / ~450KB (21 notes + pre-assembly)
## Key Facts for Report
- **Zoology**: 82% of Transformer-SSM quality gap = associative recall; 70M attention > 1.4B Hyena
- **Falcon Mamba 7B**: 64.1 HF LLM benchmark — beats LLaMA-3-8B (62.6) and Mistral-7B (61.0)
- **NVIDIA hybrid**: +2.65 pts over pure Transformer with 43% Mamba-2 + 7% attention
- **HyenaDNA**: 1M-nucleotide context, 160× faster training, SOTA on 12/18 genomic benchmarks
- **Mamba-2**: 2–8× faster than Mamba-1 (SSD unification with attention)
- **Jamba**: 256K context on single 80GB GPU; 52B total / 12B active
- **Throughput**: Mamba-3B = 5× faster at 2K context, 15× faster at 16K context vs Transformer
## Key Analogies for Report (Top 5)
1. **Cocktail party** (attention) — accessibility ★★★★★ accuracy ★★★★☆
2. **Photographic memory vs running notes** (Transformer vs SSM memory) — ★★★★★ all
3. **Handshakes** (O(n²) complexity) — accessibility ★★★★★
4. **Jazz musician who decides what to remember** (Mamba selectivity) — ★★★★★
5. **Google Search (QKV)** — accuracy ★★★★★
## Focus Areas for Final Report (Self-Generated)
1. **The Sequence Problem** — what are we solving?
2. **RNN→LSTM→Transformer** — the historical arc
3. **Transformers: Attention** — QKV, parallel, quadratic, KV cache
4. **SSMs: State** — compression, HiPPO→Mamba, selective memory
5. **5 Lenses of Difference** — memory, speed, recall, inference, training
6. **The Zoology Surprise** — 82% quality gap from one capability
7. **Real-World Products** — what uses what, why
8. **Beyond Language: Genomics, Audio, Video** — SSM's killer apps
9. **Hybrid Models** — Jamba, Griffin, the 7% attention rule
10. **Where We're Headed** — SSD unification, Mamba-2, 2025 trajectory
## Drafts
- `./drafts/pre-skeleton.md` — being written by agent (ready ~01:20 AM)
## Cross-links
- Complexity: [[computational-complexity]], [[sequence-processing-comparison]]
- Visual intuitions: [[analogies-and-intuitions]], [[diagrams-and-visuals]]
- Concrete examples: [[real-world-products]], [[current-landscape-2025]], [[applications-and-use-cases]]
- Teaching guidance: [[teaching-best-practices]], [[anti-patterns]]
- Historical story: [[historical-narrative]]