How much data one GPU ships per optimizer step: AdamW-DDP vs DeMo's DCT+top-k compression, on a log scale. Toggle OLMo-300M ↔ OLMo-1B, hover a bar for exact MB/step and reduction factor, and slide the link speed to turn megabytes into seconds-per-step. Data: DeMo paper, Table 1 (arXiv:2411.19870).
The gradients never touch the chain. What Solana actually stores when Psyche trains a 36B model across 24 nodes, how TOPLOC audits untrusted GPUs in 258 bytes, and why the flagship 'decentralized' model still shipped from a 512-GPU cluster.