# Applications and Use Cases: Where Each Architecture Wins
> [[index]] · [[ssm-basics]] · [[transformers-basics]] · [[strengths-and-weaknesses]] · [[real-world-products]]
## The Key Insight
The architecture you choose should match the **nature of your data and task**:
- Does your task require exact recall of specific facts? → Transformer
- Does your task require processing very long sequences efficiently? → SSM
- Does your task require streaming/real-time inference? → SSM
- Both? → Hybrid
## Language Modeling
### Where Transformers Excel
| Task | Why Transformer Wins | Example |
|------|---------------------|---------|
| Fact lookup | Exact associative recall | "What is the capital of France?" |
| Code with distant references | Tracks variable defined 1000 lines ago | GitHub Copilot |
| Document QA | "What did paragraph 3 say about X?" | RAG systems |
| Few-shot learning | Reads and reuses examples in context | GPT-4 demos |
| Reasoning chains | Each step must attend to prior steps | Chain-of-thought |
### Where SSMs Are Competitive
| Task | Why SSM is Good Enough | Example |
|------|----------------------|---------|
| Long-context summarization | Gist is enough; no exact lookup | "What was this 100K-word document about?" |
| Chat with long history | Conversational gist > exact recall | Casual chat assistants |
| Token-free (byte) LMs | SSMs scale to byte-level 1M+ token contexts | MambaByte[^1] |
| Streaming chat | Low latency, O(1) per token | Edge deployment |
## Genomics and Biology: SSM's Killer Application
> [!NOTE] Why Genomics is Perfect for SSMs
> DNA sequences are **enormously long** (human genome = 3 billion base pairs) and require **local pattern matching** more than long-range exact recall. No DNA sequence has a "associative recall" task. The key patterns are in local motifs, not precise cross-references.
### HyenaDNA[^2]
- **Context**: Up to **1 million nucleotides** at single-nucleotide resolution
- **Compare**: Transformer-based genomic models maxed at 4,096 tokens (<0.001% of human genome)
- **Speed**: Trains 160× faster than Transformer at same sequence length
- **Quality**: SotA on 12/18 Nucleotide Transformer benchmarks; 7/8 GenomicBenchmarks (+10 pts avg)
- **Architecture**: Hyena (implicit convolution SSM)
### Evo[^3]
- Multi-modal DNA/RNA/protein model
- 7B parameters, trained on 300B genomic tokens
- Can generate functional gene sequences from scratch
### Caduceus
- Bidirectional Mamba for genomics
- Exploits SSM's efficiency for very long sequence genomic modeling
> [!TIP] The Genomics Metaphor for the Report
> Reading a DNA sequence is like reading a novel in a language you're learning — you need to track patterns and motifs as they flow past, not look up what was on page 42. SSMs are like a very attentive reader taking notes; Transformers would be taking a photo of every page.
## Audio and Speech
### Why Audio Suits SSMs
- Audio is a **continuous time-series signal**
- A second of audio at 44.1kHz = 44,100 samples
- Relevant patterns are local (phonemes) or medium-range (words)
- Perfect for state compression — no need for exact lookup
### Examples
| Model | Architecture | Application |
|-------|-------------|-------------|
| MambaAudio | Mamba | Audio classification, separation |
| Mamba TTS | Mamba | Text-to-speech synthesis |
| AudioMamba | Mamba | Audio event detection |
| S4 (original) | S4 | speech commands, pathfinder |
> [!NOTE] The S4 Audio Achievement
> S4 was the **first model to solve the Speech Commands and Long Range Audio tasks** with high accuracy. The Long Range Audio task requires modeling audio sequences of thousands of samples — this is where SSMs first showed their dominance over Transformers.
## Video Understanding
### VideoMamba[^4]
- SSM-based video understanding
- O(n) complexity over spatial + temporal dimensions
- Unlike ViT-based video models: no quadratic attention over all frame patches
### Why Video Suits SSMs
- Video = sequences of image frames = extremely long sequences
- A 1-minute video at 30fps = 1,800 frames
- Full Transformer attention over all patches of all frames: *catastrophically expensive*
- SSMs track temporal state efficiently
## Time Series and Signal Processing
| Domain | Model | Key Advantage |
|--------|-------|--------------|
| Forecasting | Mamba Time Series | Long horizon, efficient training |
| Medical signals (ECG, EEG) | SSM variants | Continuous signal modeling |
| Industrial IoT | Mamba | Streaming, low-latency prediction |
| Finance | SSMs (experimental) | Long history, efficient rolling window |
## On-Device and Edge Deployment
> [!TIP] Why SSMs Matter for Edge
> The key property: **O(1) memory per generation step** vs O(n) KV cache growth for Transformers.
>
> A smartphone can't hold a 100K-token KV cache (multiple GB). An SSM needs only its fixed state vector (KBs) regardless of context length. This is why SSMs may power **truly long-running AI assistants** on mobile devices.
## Scientific and Specialized Domains
| Domain | SSM Application | Why |
|--------|----------------|-----|
| Drug discovery | Molecular sequence modeling | Very long SMILES strings |
| Protein folding | Sequence + structure SSMs | Long amino acid chains |
| Climate modeling | Long time series SSMs | Multi-year temporal data |
| Code generation (streaming) | Mamba for code | Token-by-token, long files |
## Where Transformers Will Likely Remain Dominant
1. **Complex reasoning tasks** — require exact tracking of logical steps
2. **Multi-document QA** — needs to reference specific passages
3. **Code with complex dependencies** — function calls, imports, distant variables
4. **Few-shot learning / in-context learning** — requires reading and reusing examples from context
5. **Large-scale pre-training** — GPT-4, Claude architecture is proven and tooled
## The Competitive Landscape (2025)
```
GPT-4o, Claude 3.5, Gemini 1.5 → Pure Transformer (with efficient attention variants)
Falcon Mamba 7B, Mamba-2 → Pure SSM (competitive at 7B+ scale)
Jamba, Griffin, Zamba → Hybrid (SSM + sparse attention)
HyenaDNA, Evo → SSM for genomics
AudioMamba, VideoMamba → SSM for audio/video
RWKV-6 → Linear Transformer/SSM hybrid
```
[^1]: Yan et al. (2024). "MambaByte: Token-free Selective State Space Model." COLM 2024. arXiv:2401.13660.
[^2]: Nguyen et al. (2023). "HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution." NeurIPS 2023 (Spotlight). arXiv:2306.15794.
[^3]: Arc Institute (2024). "Sequence Modeling and Design from Molecular to Genome Scale with Evo." bioRxiv.
[^4]: Chen et al. (2024). "VideoMamba: State Space Model for Efficient Video Understanding." arXiv.