
What does NVIDIA's inference context memory storage mean for NAND?

Citigroup's report points out that NVIDIA's newly launched AI inference context memory storage (ICMS) architecture is expected to significantly exacerbate the global NAND flash supply shortage. Each server requires an additional 1152TB SSD, which is expected to bring an additional demand equivalent to 2.8% and 9.3% of the global NAND total demand in 2026 and 2027, respectively. This move will not only drive up NAND prices but also provide clear structural growth opportunities for leading memory chip manufacturers such as Samsung, SK Hynix, and Micron
Citi believes that the context memory storage technology adopted by NVIDIA in AI inference applications is expected to exacerbate the supply shortage in the NAND flash memory market.
According to the Wind Trading Desk, Citi's latest report points out that the inference context memory storage (ICMS) architecture launched by NVIDIA will significantly drive NAND flash demand, creating structural opportunities for memory chip manufacturers and potentially further increasing NAND prices. It is recommended to closely monitor changes in the supply-demand dynamics of the storage industry chain, as relevant manufacturers are expected to continue benefiting from this round of demand growth.
NVIDIA announced that its Vera Rubin platform will adopt the ICMS architecture equipped with BlueField-4 chips, breaking through memory bottlenecks and enhancing AI inference performance by offloading KV Cache. Each server in this architecture requires an additional 1152TB SSD NAND, and the report estimates that it will bring new demand accounting for 2.8% and 9.3% of global NAND demand in 2026 and 2027, respectively. This move will further exacerbate the global NAND supply shortage while creating significant market opportunities for leading NAND suppliers such as Samsung Electronics, SK Hynix, SanDisk, Kioxia, and Micron Tech.
ICMS: A Solution to the Storage Bottleneck in AI Inference
The report points out that large-scale AI inference faces significant memory bottlenecks. The core memory optimization mechanism of the Transformer model—KV Cache—avoids redundant calculations by storing computed key-value pairs and hierarchically storing them based on performance and capacity needs: active KV cache is stored in GPU HBM (G1), transitional/overflow KV cache is placed in system DRAM (G2), and hot KV cache is allocated to local SSD (G3).
To specifically optimize this architecture, NVIDIA has launched the inference context memory storage (ICMS) solution. This solution does not replace the existing storage hierarchy but adds a dedicated KV Cache at a new G3.5 level between local SSD (G3) and enterprise shared storage (G4). This level can efficiently convert cold KV context data from G4 into warm KV cache in G2 and work in coordination with HBM, significantly improving data transfer efficiency and overall AI inference performance.
In terms of hardware implementation, the Vera Rubin platform uses 16TB TLC SSDs as the ICMS storage medium, combined with a KV cache manager and topology-aware scheduling mechanism, aiming to achieve three major performance breakthroughs: a maximum increase of 5 times in tokens processed per second, a maximum increase of 5 times in energy efficiency, and lower latency. In terms of specific configuration, each server is equipped with 72 GPUs, with each GPU corresponding to 16TB of dedicated ICMS NAND capacity, resulting in a total NAND requirement of 1152TB for a single server.
NVIDIA's introduction of context memory storage technology in AI inference marks an important evolution in AI computing architecture. Unlike traditional training scenarios, the inference process relies on a large amount of context data storage and rapid retrieval capabilities. This shift in technological direction opens up new application scenarios for NAND flash memory and is expected to become an important growth point in demand following data centers and smartphones.
Clear Increase in NAND Demand, Supply Shortage Continues to Deepen
Citigroup believes that after scenario analysis, the large-scale implementation of the ICMS architecture will bring significant and certain demand growth to the global NAND market. The report forecasts that by 2026, the shipment of Vera Rubin servers will reach 30,000 units, corresponding to a NAND demand of 34.6 million TB (equivalent to 3.46 billion 8Gb equivalents), which will account for 2.8% of the total global NAND demand that year; as AI inference demand is further released, the shipment of Vera Rubin servers is expected to increase to 100,000 units by 2027, at which point the NAND demand driven by ICMS will soar to 11.52 million TB (equivalent to 11.52 billion 8Gb equivalents), raising its share of global NAND total demand to 9.3%.
The report also points out that the current global NAND market is already in a state of supply tightness, and the explosive growth of the AI industry in recent years has driven a continuous rise in data storage demand, making the supply-demand balance of NAND, as a core storage medium, quite fragile. The new demand brought by NVIDIA's ICMS architecture is characterized by strong rigidity and large scale, which will directly disrupt the existing supply-demand pattern, further exacerbating the global NAND supply shortage.
Accelerated Upgrade of the NAND Market Driven by AI
Citigroup believes that the launch of NVIDIA's ICMS architecture is not an isolated technological innovation, but an inevitable result of the deep integration of AI technology and the storage industry, a trend that will profoundly affect the future development of the NAND market. The report points out that in the context of continuously expanding large model inference scenarios and increasing computational scale, the performance, capacity, and energy efficiency of storage systems have become key factors determining the AI application experience, which will drive NAND technology to accelerate its iteration towards higher density, faster read and write speeds, and lower power consumption.
At the same time, the report predicts that the innovative exploration of AI-native storage architectures will open up new growth space for the NAND industry. In addition to the current ICMS architecture, more customized storage solutions targeting specific AI scenarios may emerge in the future, continuously releasing the demand potential for NAND.
The report also mentions that the demand increase brought by the ICMS architecture will not only benefit NAND manufacturers but will also transmit to the upstream industrial chain, promoting the coordinated development of SSD manufacturing, storage controllers, and other related sectors, injecting new growth momentum into the entire semiconductor industry chain
