Activations, Not Parameters, Are the Memory Bottleneck in CNNs

Deep Learning, Efficient-ML
Author

Imad Dabbura

Published

April 26, 2025

In CNN training/inference, activations are the memory bottleneck, not parameters. Early layers have high-resolution feature maps (high activation memory, low parameter count) while later layers have many channels but small spatial dimensions (high parameter count, low activation memory).

Examples:

During training with batch size 32, that early layer needs ~400MB just for activations (for backprop), while its weights need <1MB!