The Bandwidth Wall

How much data one GPU ships per optimizer step: AdamW-DDP vs DeMo's DCT+top-k compression, on a log scale. Toggle OLMo-300M ↔ OLMo-1B, hover a bar for exact MB/step and reduction factor, and slide the link speed to turn megabytes into seconds-per-step. Data: DeMo paper, Table 1 (arXiv:2411.19870).

chart SVG Machine Learning Infrastructure Jun 12, 2026

⬢ loading artifact…

The Bandwidth Wall — toggle OLMo-300M ↔ OLMo-1B · hover / focus a bar for exact MB/step + reduction · slide link speed to convert MB/step → seconds · data as of Jun 12, 2026 · DeMo paper, Table 1 (arXiv:2411.19870) ↗ open artifact ↗

View artifact source on GitHub ↗

Appears in

Machine Learning Blockchain Research

What the Blockchain Actually Does in Decentralized AI Training

The gradients never touch the chain. What Solana actually stores when Psyche trains a 36B model across 24 nodes, how TOPLOC audits untrusted GPUs in 258 bytes, and why the flagship 'decentralized' model still shipped from a 512-GPU cluster.

Jun 12, 2026 7 min read ⬢ interactive