文学城论坛
+A-

on-chip SRAM AI ASIC; 为啥是AI ASIC

胡雪盐8 2025-12-14 10:22:46 ( reads)

An on-chip SRAM AI ASIC is an accelerator where most of the working set (activations, partial sums, sometimes weights) stays inside SRAM physically on the compute die instead of being fetched from off-chip DRAM/HBM.

?

1. Latency dominance (especially LLM inference)

For token-by-token inference, this difference dominates user-visible latency.

2. Energy efficiency

Approximate energy per access:

LLMs are often memory-energy limited, not compute-limited.

3. Deterministic performance

Chip class On-chip SRAM
Mobile NPU 4–32 MB
Edge inference ASIC 32–128 MB
Datacenter inference ASIC 100–300 MB
Wafer-scale (Cerebras) 10s of GB

?

Famous examples (and what they optimized for)

Groq

Google TPU v1–v3

Cerebras

?

When on-chip SRAM AI ASICs are the right answer

Ultra-low latency LLM inference
Real-time systems (finance, robotics, telecom)
Edge or power-constrained environments
Predictable workloads with known model shapes

?

?

?

?

?

?

?


更多我的博客文章

?

?




更多我的博客文章>>>

跟帖(0)