Hyper Race | Samsung: Vowing to dethrone Micron from the top spot in HBM

Samsung Electronics has established a High Bandwidth Memory (HBM) team to increase production and compete for a leading position in the HBM field. HBM plays a crucial role in AI accelerator cards and is highly sought after in the market. Samsung had previously misjudged the prospects of the HBM market but is now determined to correct its mistake. HBM provides the highest current memory bandwidth and is widely used in processors. Samsung aims to achieve a leading position in the HBM field through the establishment of the HBM team

Author: Zhou Yuan / Wall Street News

From the industry perspective, GenAI (Generative Artificial Intelligence) has two core components: GPU and HBM. The latter provides the highest memory bandwidth available today, while the performance of GPU is not determined by the main frequency, but rather by the memory bandwidth.

Leading GPU company NVIDIA has achieved a surprising market value growth rate in the past year, but NVIDIA's AI accelerator cards still rely on support from HBM companies. Kyung Kye-hyun, head of Samsung's semiconductor business, said, "The leadership position of HBM is coming towards us."

The role of bandwidth is directly related to capacity. If the capacity is large but the bandwidth is narrow, it will affect the performance of the GPU. Currently, the highest capacity model of HBM is the HBM3E 12H introduced by Samsung in February this year, with a stack count of 12 layers.

Recently, Samsung Electronics established a High Bandwidth Memory (HBM) team within its memory chip division to increase production. This is the second dedicated HBM team established by Samsung after setting up the HBM special task force in January this year. In 2019, Samsung Electronics misjudged the market prospects for HBM and therefore disbanded the HBM team at that time.

Now, Samsung Electronics is determined to correct this mistake and has high hopes for the newly established HBM team: to take the lead in the HBM field.

Memory bandwidth determines the performance of AI accelerator cards

The demand for GenAI applications brought by ChatGPT and Sora is changing the world.

This has stimulated huge demand for AI PCs, AI servers, AI phones, and AI processors. Most of these processors (including AMD and NVIDIA's compute GPUs, Intel's Gaudi, AWS's Inferentia and Trainium, and other dedicated processors and FPGAs) use HBM because HBM provides the highest memory bandwidth currently available.

Compared to GDDR6/GDDR6X or LPDDR5/LPDDR5X, the reason why HBM is so popular in bandwidth-intensive applications is that each stack of HBM has a speed of up to 1.2 TB/s, a bandwidth speed that no commercial memory can match.

However, the high cost and technical difficulty are the price for such outstanding performance. HBM is now actually the result of advanced packaging, which limits supply and increases costs.

The DRAM devices used for HBM are completely different from typical DRAM ICs used for commercial memory (such as DDR4 and DDR5). Memory manufacturers must produce and test 8 or 12 DRAM devices; then, package them on a pre-tested high-speed logic layer, and then test the entire package. This process is both expensive and time-consuming DRAM devices used for HBM must have a wide interface, making their physical size larger and therefore more expensive than conventional DRAM ICs.

As a result, meeting the demands of AI servers by increasing HBM memory production will impact the supply scale of all types of DRAM.

From a physical structure perspective, the finished product of HBM involves stacking many DDR chips together and packaging them with a GPU to create a large-capacity, high-bit-width DDR array.

In the physical structure of AI accelerator cards, HBM is located on the left and right sides, stacked with DDR chips, with the GPU in the middle.

Due to cost constraints of HBM, it has given a lifeline to commercial memory types such as DDR, GDDR, and LPDDR. These categories are also used for applications requiring high bandwidth, such as AI, HPC, graphics, and workstations. Micron Technology has stated externally that the development of commercial memory technologies optimized for capacity and bandwidth is accelerating as AI hardware development companies have a clear demand for them.

Krishna Yalamanchi, Senior Manager of Micron's Computing and Networking Business Unit, seems to have a redundant view on HBM.

"HBM has great application prospects, with tremendous growth potential in the market," Yalamanchi said. "Currently, the application of HBM is mainly focused on areas requiring high bandwidth, high density, and low power consumption, such as AI and HPC. With more processors and platforms adopting HBM, this market is expected to grow rapidly."

This view may not be novel at present, but it actually represents Micron's perspective, as Micron is an industry giant, albeit ranked behind Samsung and SK Hynix.

According to Gartner's forecast, demand for HBM is expected to surge from 123 million GB in 2022 to 972 million GB in 2027. This means that HBM demand is projected to increase from 0.5% of the overall DRAM market in 2022 to 1.6% in 2027.

Such growth is mainly due to the continuous acceleration of demand for HBM in standard AI and generative AI applications.

Gartner analysts believe that the overall market size of HBM will increase from $11 billion in 2022 to $52 billion in 2027, with HBM prices expected to decrease by 40% relative to 2022 levels.

As technology advances and the demand for GenAI applications expands, the density of HBM stacks will also increase: from 16 GB in 2022 to 48GB in 2027.

Micron estimates that by 2026, they will be able to launch a 64GB HBM Next (HBM4, sixth generation) stack. The HBM3 (fourth generation) and HBM4 specifications allow for the construction of 16-Hi stacks, enabling the use of 16 32GB devices to build a 64GB HBM module

Samsung establishes a dual-track AI semiconductor strategy

HBM is so difficult and expensive to do that even giant companies had demand misjudgments before ChatGPT came into being.

Samsung Electronics, currently ranked second in the HBM field, lags behind SK Hynix. This may be related to Samsung Electronics misjudging the prospects of HBM technology demand in 2019. That year, Samsung Electronics "unexpectedly" disbanded its HBM business and technology team.

In order to surpass the "friendly competitor" SK Hynix and dominate the HBM market, Samsung Electronics established two HBM teams in January and March this year, with some members from the equipment solutions department, mainly responsible for the development and sales of DRAM and NAND flash memory; the leader is Samsung's Executive Vice President and DRAM Product and Technology Manager Hwang Sang-joon.

To catch up with and surpass SK Hynix, Samsung's HBM team plans to mass-produce HBM3E in the second half of this year and produce subsequent models HBM4 in 2025.

It is worth noting that on April 1st, Samsung Electronics' DS department head Kyung-gyu Hyun announced that to enhance competitiveness in the AI field, the company has implemented a dual-track AI semiconductor strategy internally, focusing on the development of storage chips for AI and AI computing chips. The HBM team led by Hwang Sang-joon will also accelerate the development process of the AI inference chip Mach-2.

Kyung-gyu Hyun pointed out that the market demand for the AI inference chip Mach-1 is increasing, and some customers have expressed the need to use the Mach series chips to process large model inferences with over 1000B parameters. This trend has prompted Samsung Electronics to accelerate the development pace of the next-generation Mach-2 chip to meet the urgent market demand for high-performance AI chips.

Mach-1 is currently under development and is expected to launch a prototype product within this year. This chip is in the form of SoC (System on Chip) for AI inference acceleration, which can reduce the bottleneck between GPU and HBM.

Mach-1 is a highly efficient AI inference chip. Samsung Electronics plans to deploy it by the end of 2024 and early 2025, with South Korean IT giant Naver considering a large-scale purchase, with a transaction amount expected to reach 1 trillion Korean won (approximately 741 million US dollars).

HBM3E is an extended version of HBM3, with a memory capacity of 144GB, providing a bandwidth of 1.5TB per second, equivalent to processing 230 5GB full HD movies in one second. As a faster and larger memory, HBM3E can accelerate generative AI and large language models, while also advancing scientific computing workloads in HPC.

On August 9, 2023, Huang Renxun released the GH200 Grace Hopper super chip, which is the first appearance of HBM3E. Therefore, the GH200 Grace Hopper became the world's first HBM3E GPU Currently, HBM3E is the best-performing DRAM for AI applications, with a technology generation of five. The HBM generations are divided into five: the first generation is HBM, the second generation is HBM2, HBM2E belongs to the third generation, and the fourth generation is HBM3.

According to Kyung Kye-hyun, the head of Samsung Electronics' semiconductor business, customers interested in HBM4 are engaging in joint development for customization, but he did not disclose which company the collaboration is with. Kyung Kye-hyun also stated that several customers are interested in collaborating with Samsung Electronics to develop customized versions of the next generation HBM4 (sixth generation) memory.

On March 26, at Memcon 2024, a global semiconductor manufacturer gathering held in San Jose, California, Samsung Electronics expects that the company's HBM memory production this year will increase by 2.9 times compared to 2023