Imad Dabbura - Arithmetic Intensity for GEMM

AI = \(\frac{MNK}{MK + KN + MN}\). This assumes perfect reuse and is theoretical, not what hardware actually achieves.
Hidden Assumption: Infinite-cache model → every tile of (A, B, C) is loaded once and fully reused by all thread blocks.
Edge Cases: If M=1 or N=1 -> GEMV -> less data reuse → AI < 1 → always memory-bound.
GEMM becomes compute-bound only when all dimensions are reasonably large. NVIDIA guideline: any dimension \(≤ 128a\) → likely memory-bound.