index - ramblings from the whirligig void

# Research Index — SSMs vs Transformers > Central hub for all research notes. Updated continuously. > **[[../STEERING.md]]** — project steering document > Last updated: 2026-05-04 01:01 PDT ## Notes Inventory | Note | Lines | Summary | Status | |------|-------|---------|--------| | [[transformers-basics]] | 544 | History, 5 analogies, QKV, positional encoding, products, O(n²), 22 footnotes | ✅ | | [[ssm-basics]] | 669 | Kalman→HiPPO→S4→Mamba-2, 5 metaphors, 3 modes, ABCD, 34 footnotes | ✅ | | [[historical-narrative]] | ~140 | Story arc: RNN→LSTM→Transformer→SSM with timeline, quotes, context | ✅ | | [[sequence-processing-comparison]] | 324 | Parallel vs sequential, filing cabinet vs notebook, LRA benchmarks | ✅ | | [[computational-complexity]] | 266 | O(n²) handshake/O(n) scribe, GPU memory tables, real benchmarks | ✅ | | [[strengths-and-weaknesses]] | 187 | Quick-reference trade-off table, Pareto frontier | ✅ | | [[zoology-associative-recall]] | ~80 | Zoology paper: 82% quality gap = associative recall; 70M attn > 1.4B SSM | ✅ | | [[analogies-and-intuitions]] | 406 | 15+ scored analogies (access/accuracy/memory ratings), master framework | ✅ | | [[real-world-products]] | 404 | Models table, Falcon Mamba 7B benchmarks, NVIDIA 8B hybrid study | ✅ | | [[hybrid-models]] | 315 | Jamba, Griffin, Hawk, RWKV-6, Zamba, RetNet, when-to-use table | ✅ | | [[current-landscape-2025]] | 290+ | 2025–2026 debate: SSM case, Transformer countercase, Jamba, Griffin, xLSTM, Titans, **Mamba-3 (ICLR 2026)** | ✅ | | [[applications-and-use-cases]] | ~110 | Genomics (HyenaDNA 1M tokens), audio, video, edge deployment | ✅ | | [[teaching-best-practices]] | 368 | 4 pedagogy profiles, 10-step progression, anti-patterns | ✅ | | [[anti-patterns]] | 211 | 10 documented failure modes with real examples | ✅ | | [[diagrams-and-visuals]] | 930+ | 20 ASCII+Mermaid diagrams, 3 Mermaid ready-to-render, 4 designer specs | ✅ | | [[efficient-transformers]] | 468 | FlashAttention, sparse/linear attn, RWKV, LRU, Griffin, convergence table | ✅ | | [[key-facts-for-report]] | ~140 | All verified numbers, quotes, 13 key papers, LRA benchmark table (corrected) | ✅ | | [[induction-heads-icl]] | ~150 | Induction heads, in-context learning mechanism, AR measurements, why SSMs struggle | ✅ | | [[lra-benchmarks]] | ~160 | Full LRA table (S4: 86.09% avg vs Transformer: 54.39%), Path-X breakthrough analysis | ✅ | | [[advanced-technical-topics]] | ~417 | Position encodings, ICL in SSMs, length generalisation, Vision SSMs (ViM, VMamba) | ✅ | ## Total Research: ~8,500+ lines / ~450KB (21 notes + pre-assembly) ## Key Facts for Report - **Zoology**: 82% of Transformer-SSM quality gap = associative recall; 70M attention > 1.4B Hyena - **Falcon Mamba 7B**: 64.1 HF LLM benchmark — beats LLaMA-3-8B (62.6) and Mistral-7B (61.0) - **NVIDIA hybrid**: +2.65 pts over pure Transformer with 43% Mamba-2 + 7% attention - **HyenaDNA**: 1M-nucleotide context, 160× faster training, SOTA on 12/18 genomic benchmarks - **Mamba-2**: 2–8× faster than Mamba-1 (SSD unification with attention) - **Jamba**: 256K context on single 80GB GPU; 52B total / 12B active - **Throughput**: Mamba-3B = 5× faster at 2K context, 15× faster at 16K context vs Transformer ## Key Analogies for Report (Top 5) 1. **Cocktail party** (attention) — accessibility ★★★★★ accuracy ★★★★☆ 2. **Photographic memory vs running notes** (Transformer vs SSM memory) — ★★★★★ all 3. **Handshakes** (O(n²) complexity) — accessibility ★★★★★ 4. **Jazz musician who decides what to remember** (Mamba selectivity) — ★★★★★ 5. **Google Search (QKV)** — accuracy ★★★★★ ## Focus Areas for Final Report (Self-Generated) 1. **The Sequence Problem** — what are we solving? 2. **RNN→LSTM→Transformer** — the historical arc 3. **Transformers: Attention** — QKV, parallel, quadratic, KV cache 4. **SSMs: State** — compression, HiPPO→Mamba, selective memory 5. **5 Lenses of Difference** — memory, speed, recall, inference, training 6. **The Zoology Surprise** — 82% quality gap from one capability 7. **Real-World Products** — what uses what, why 8. **Beyond Language: Genomics, Audio, Video** — SSM's killer apps 9. **Hybrid Models** — Jamba, Griffin, the 7% attention rule 10. **Where We're Headed** — SSD unification, Mamba-2, 2025 trajectory ## Drafts - `./drafts/pre-skeleton.md` — being written by agent (ready ~01:20 AM) ## Cross-links - Complexity: [[computational-complexity]], [[sequence-processing-comparison]] - Visual intuitions: [[analogies-and-intuitions]], [[diagrams-and-visuals]] - Concrete examples: [[real-world-products]], [[current-landscape-2025]], [[applications-and-use-cases]] - Teaching guidance: [[teaching-best-practices]], [[anti-patterns]] - Historical story: [[historical-narrative]]