Track Hyper | NVIDIA: In addition to Hynix, we also need Micron's HBM3E

2024, the era of HBM3E.

Author: Zhou Yuan/Wall Street Journal

The memory specifications of NVIDIA's next-generation AI accelerator card, the B100 (Blackwell architecture), will adopt HBM3E. Currently, only SK Hynix, Samsung Electronics, and Micron Tech can provide this type of memory.

Wall Street Journal has exclusively learned from multiple channels in the supply chain that, almost at the same time as SK Hynix, Micron Tech has also become a memory supplier for the B100. Currently, Samsung Electronics has not yet entered the memory supply chain for the B100.

According to previous supply chain news, the B100 will be launched in the second quarter of this year, half a year ahead of schedule.

What is HBM3E? Where does NVIDIA's B100 excel?

B100 Boosts HBM3E Production Capacity

The AI server boom has driven the demand for AI accelerator cards. High-frequency wide memory HBM has also become a key DRAM module for AI accelerator cards (GPUs).

According to Gartner's forecast report, the global revenue scale of HBM is expected to reach approximately $2.005 billion in 2023. It is projected that by 2025, the market size of HBM will double to $4.976 billion, with a growth rate of 148.2%.

Currently, SK Hynix, Samsung, and Micron Tech are the only three HBM suppliers in the world. Industry data shows that in the HBM market in 2022, SK Hynix accounted for 50% of the market share, Samsung accounted for 40%, and Micron Tech accounted for 10%.

The NVIDIA H200 is the first AI accelerator card to adopt the HBM3E memory specification. HBM3E increases the GPU memory bandwidth from 3.35TB/s of the H100 to 4.8TB/s, an improvement of 1.4 times. The total memory capacity also increases from 80GB of the H100 to 141GB, an increase of 1.8 times.

According to previous supply chain news, NVIDIA's next-generation AI accelerator card, the B100, originally planned to be launched in the fourth quarter of 2024, will now be released in the second quarter of 2024.

Based on the limited information currently available, the B100 adopts TSMC's 3nm process and a more complex multi-chip module (MCM) design. NVIDIA holds over 80% of the market share in the AI GPU market, and if the B100 can be launched ahead of schedule, NVIDIA will further consolidate its leading position in the AI field.

Starting from the second half of 2023, SK Hynix, Samsung, and Micron Tech, the three HBM suppliers, have all begun testing HBM3E in sync. It is expected that mass production will be achieved from the first quarter of 2024.

Wall Street Journal has exclusively learned from multiple channels in the supply chain that, like SK Hynix, Micron Tech has also become a memory supplier for NVIDIA's B100, providing HBM3E products.HBM3E Application and Micron Tech's Layout

Micron Tech CEO Mehrotra revealed that HBM3E, designed specifically for AI and supercomputers, is expected to enter mass production in early 2024 and is expected to generate hundreds of millions of dollars in revenue in the fiscal year 2024. Mehrotra also told analysts that "from January to December 2024, Micron Tech's HBM is estimated to be sold out."

Micron Tech's technology is located in Taichung, Taiwan, and its fourth factory officially started operation in November 2023. Micron Tech stated that the Taichung fourth factory will integrate advanced detection and packaging testing functions to mass produce HBM3E and other products, meeting the growing demand for various applications such as artificial intelligence, data centers, edge computing, and the cloud.

HBM, short for High Bandwidth Memory, is a new type of CPU/GPU memory chip. From a physical structure perspective, it stacks many DDR chips together and packages them with the GPU, achieving a large capacity and high bit width DDR combination array.

Generally, an AI accelerator card has a physical structure where the GPU is in the middle, and the DDR chips are stacked on the left and right sides of the die.

Die refers to the bare chip, which is a small piece cut from the silicon wafer using lasers. Die is the small unit of the silicon wafer before the chip is packaged, and it is a term used to refer to the intact, stable, and sufficient capacity die after cutting and testing the wafer.

Chip refers to the packaged die, which is the general term for semiconductor component products made by packaging intact, stable, and sufficient capacity die.

The wafer is the silicon wafer used to make silicon semiconductors, with silicon as the raw material and a circular shape. Generally, the size of the wafer varies from 6 inches, 8 inches, to 12 inches. Various circuit components can be processed and manufactured on the wafer to become integrated circuit products with specific functions.

Unlike the Hopper/Ada architecture, the Blackwell architecture will be extended to data centers (IDC) and consumer-grade GPUs. The biggest technical change in the B100 is that the underlying architecture is likely to undergo significant adjustments: NVIDIA will adopt new packaging technology to separate GPU components into independent chips.

Although the specific number and configuration of chips have not been determined, the MCM design method will give NVIDIA greater flexibility in customizing chips.

Currently, there is not enough information to indicate which 3nm process NVIDIA B100 will adopt. TSMC's related processes include performance-enhanced N3P and HPC-oriented N3X. In addition, there are N3E and N3B, the former used by MediaTek and the latter used by Apple for the customization of A17 Pro.

HBM3E Application and Micron Tech's Layout

Micron Tech is currently producing HBM3E gen-2 memory, which uses 8-layer vertically stacked 24GB chips. Micron Tech's 12-layer vertically stacked 36GB chips will begin sampling in the first quarter of 2024. This will be done in collaboration with semiconductor foundry operator TSMC, using their Gen-2 HBM3E for AI and HPC design applications.

HBM3E is an extended version of HBM3, with a memory capacity of 144GB and provides a bandwidth of 1.5TB per second, equivalent to processing 230 5GB full HD movies in one second.

As a faster and larger memory, HBM3E can accelerate generative AI and large language models, as well as advance scientific computing for HPC workloads.

On August 9, 2023, Huang Renxun released the GH200 Grace Hopper super chip, which also marked the first appearance of HBM3E. Therefore, the GH200 Grace Hopper became the world's first HBM3E GPU.

The GH200 consists of a 72-core Grace CPU and a 4PFLOPS Hopper GPU; each GPU has a capacity 1.7 times that of the H100 GPU and a bandwidth 1.55 times that of the H100.

Compared to the H100, the GH20 system has a total of 144 Grace CPU cores, a GPU with 8PFLOPS computing performance, and 282GB of HBM3E memory, with a memory capacity 3.5 times that of the H100 and a bandwidth 3 times that of the H100. If the LPDDR memory connected to the CPU is included, a total of 1.2TB of ultra-fast memory is integrated.

NVIDIA pointed out that memory bandwidth is crucial for HPC applications as it enables faster data transfer and reduces complex processing bottlenecks. For memory-intensive HPC applications such as simulation, scientific research, and AI, the higher memory bandwidth of the H200 ensures efficient access and manipulation of data. Compared to CPUs, results can be obtained up to 110 times faster.

TrendForce's consulting research indicates that in 2023, the mainstream of the HBM market is HBM2e. NVIDIA A100/A800, AMD MI200, and most CSPs' self-developed acceleration chips are designed with this specification.

At the same time, in order to meet the evolving demand for AI accelerator chips, various manufacturers plan to launch new products, HBM3E, in 2024. It is expected that HBM3 and HBM3E will become the mainstream of the market in 2024. The mainstream demand in 2023 will shift from HBM2e to HBM3, with estimated proportions of approximately 50% and 39%, respectively.

With the gradual increase in the volume of HBM3-based accelerator chips, market demand will significantly shift towards HBM3 in 2024. It is estimated that HBM3 will surpass HBM2e directly, accounting for 60% of the market. At the same time, benefiting from a higher average selling price (ASP), it will drive significant revenue growth for HBM in 2024.TrendForce, a consulting firm, previously estimated that the average selling price of HBM3 is much higher than HBM2e and HBM2, which will contribute to the revenue of original HBM manufacturers and is expected to further drive the overall HBM revenue to $8.9 billion in 2024, a year-on-year increase of 127%.

According to Semiconductor-Digest's prediction, the global high-bandwidth memory market is expected to grow from $293 million in 2022 to $3.434 billion in 2031, with a compound annual growth rate of 31.3% during the forecast period from 2023 to 2031.

Since HBM is packaged together with GPUs, the packaging of HBM is usually done by wafer foundries. HBM is likely to become the next-generation high-performance solution for DRAM, providing higher AI/ML performance and efficiency for rapidly developing data centers and cloud computing.

Although Micron Tech's market share in the HBM market is only about 10%, the company is accelerating the development of updated HBM4 and HBM4e, faster than SK Hynix.

To improve memory transfer rates, the next-generation HBM4 may make more substantial changes to high-bandwidth memory technology, starting from a wider 2048-bit memory interface, thus achieving significant technological breakthroughs at multiple levels.

Micron Tech stated in 2023 that "HBM Next" (i.e., HBM4) memory will be available around 2026, with each stack having a capacity ranging from 36GB to 64GB, and a peak bandwidth of 2TB/s or higher.