Track Hyper | Samsung: Vowing to dethrone Micron from the top spot in HBM

Once made mistakes, but the king is willing to correct them

Author: Zhou Yuan / Wall Street News

From the industry perspective, GenAI (Generative Artificial Intelligence) has two core components: GPU and HBM. The latter provides the highest memory bandwidth available today, while the performance of GPU is not determined by the main frequency, but rather by the memory bandwidth.

Leading GPU company NVIDIA has achieved a surprising market value growth in the past year, but all of NVIDIA's AI accelerator cards still rely on support from HBM companies. Kyung Kye-hyun, head of Samsung's semiconductor business, said, "The leadership position of HBM is coming towards us."

The role of bandwidth is directly related to capacity. If the capacity is large but the bandwidth is narrow, it will affect the performance of the GPU. Currently, the highest capacity model of HBM is the HBM3E 12H introduced by Samsung in February this year, with a stack of up to 12 layers.

Recently, Samsung Electronics established a High Bandwidth Memory (HBM) team within its memory chip division to increase production. This is the second dedicated HBM team established by Samsung after setting up a special HBM task force in January this year. In 2019, Samsung Electronics misjudged the market prospects of HBM and therefore disbanded the HBM team at that time.

Now, Samsung Electronics is determined to correct this mistake and has high hopes for the newly established HBM team: to take the lead in the field of HBM.

Memory bandwidth determines the performance of AI accelerator cards

The demand for GenAI applications brought by ChatGPT and Sora is changing the world.

This has stimulated a huge demand for AI PCs, AI servers, AI phones, and AI processors. Most of these processors (including AMD and NVIDIA's compute GPUs, Intel's Gaudi, AWS's Inferentia and Trainium, and other dedicated processors and FPGAs) use HBM because HBM provides the highest memory bandwidth currently available.

Compared to GDDR6/GDDR6X or LPDDR5/LPDDR5X, the reason why HBM is so popular in bandwidth-intensive applications is that the speed of each stack of HBM reaches up to 1.2 TB/s, a bandwidth speed that no commercial memory can match.

However, the cost of such outstanding performance is high, and the technical difficulty is great. HBM is now actually the result of advanced packaging, which limits supply and increases costs.

The DRAM devices used for HBM are completely different from typical DRAM ICs used for commercial memory (such as DDR4 and DDR5). Memory manufacturers must produce and test 8 or 12 DRAM devices; then, package them on top of a pre-tested high-speed logic layer, and then test the entire package. This process is both expensive and time-consuming For HBM DRAM devices, they must have a wide interface, so their physical size is larger and therefore more expensive than conventional DRAM ICs.

As a result, to meet the demands of AI servers, increasing the production of HBM memory will impact the supply scale of all types of DRAM.

From a physical structure perspective, the finished product of HBM is stacking many DDR chips together and packaging them with a GPU to achieve a large capacity, high bit-width DDR array.

In the physical structure of AI accelerator cards, HBM is on the left and right sides, stacked by DDR particles, with the GPU in the middle.

Due to cost constraints of HBM, it has given a lifeline to commercial memory types such as DDR, GDDR, and LPDDR. These categories are also used for applications requiring high bandwidth, such as AI, HPC, graphics, and workstations. Micron Technology has stated that the development of commercially optimized memory technologies in terms of capacity and bandwidth is accelerating as AI hardware development companies have a clear demand for them.

Krishna Yalamanchi, Senior Manager of Micron's Computing and Networking Business Unit, seems to have a redundant view on HBM.

"HBM has great application prospects, and the market's future growth potential is enormous," Yalamanchi said. "Currently, the application of HBM is mainly focused on areas that require high bandwidth, high density, and low power consumption, such as AI and HPC. With more processors and platforms adopting HBM, this market is expected to grow rapidly."

This view may not be novel at present, but it actually represents Micron's perspective, and Micron is an industry giant, albeit ranked behind Samsung and SK Hynix.

According to Gartner's forecast, demand for HBM is expected to surge from 123 million GB in 2022 to 972 million GB in 2027. This means that HBM demand is projected to increase from 0.5% of the overall DRAM market in 2022 to 1.6% in 2027.

Such growth is mainly due to the continuous acceleration of demand for HBM in standard AI and generative AI applications.

Gartner analysts believe that the overall market size of HBM will increase from $11 billion in 2022 to $52 billion in 2027, and HBM prices are expected to decrease by 40% relative to 2022 levels.

As technology advances and the demand for GenAI applications expands, the density of HBM stacks will also increase: from 16 GB in 2022 to 48GB in 2027.

Micron estimates that by 2026, they will be able to launch a 64GB HBM Next (HBM4, sixth generation) stack. The HBM3 (fourth generation) and HBM4 specifications allow for the construction of 16-Hi stacks, so a 64GB HBM module can be built using 16 32GB devices

Samsung establishes dual-track AI semiconductor strategy

HBM is so difficult and expensive to do that even giant companies had demand misjudgments before ChatGPT came out.

Samsung Electronics, currently ranked second in market share in the HBM field, lags behind SK Hynix. This may be related to Samsung Electronics misjudging the prospects of HBM technology demand in 2019. That year, Samsung Electronics "unexpectedly" disbanded its HBM business and technology team.

In order to pull down the "friendly competitor" SK Hynix, dominate the HBM market, Samsung Electronics established two HBM teams in January and March this year, with some members from the equipment solutions department, mainly responsible for the development and sales of DRAM and NAND flash memory; the leader is Samsung's Executive Vice President and DRAM Product and Technology Manager Hwang Sang-joon.

To catch up with and surpass SK Hynix, Samsung's HBM team plans to mass-produce HBM3E in the second half of this year and produce the follow-up model HBM4 in 2025.

It is worth noting that on April 1st, Samsung Electronics' DS department head Kyung-gyu Hyun announced the implementation of a dual-track AI semiconductor strategy internally to enhance the company's competitiveness in the AI field, focusing on the development of AI storage chips and AI computing chips. The HBM team led by Hwang Sang-joon will also accelerate the development process of the AI inference chip Mach-2.

Kyung-gyu Hyun pointed out that the market demand for the AI inference chip Mach-1 is increasing, and some customers have expressed the need to use the Mach series chips to process large model inferences with over 1000B parameters. This trend has prompted Samsung Electronics to accelerate the development pace of the next-generation Mach-2 chip to meet the urgent market demand for high-performance AI chips.

Mach-1 is currently under development and is expected to launch a prototype product within this year. This chip is in SoC (System on Chip) form, used for AI inference acceleration, and can reduce the bottleneck between GPU and HBM.

Mach-1 is a highly efficient AI inference chip. Samsung Electronics plans to deploy it by the end of 2024 and early 2025, with South Korean IT giant Naver considering a large-scale purchase, with a transaction amount expected to reach 1 trillion Korean won (approximately 7.41 billion US dollars).

HBM3E is an extended version of HBM3, with a memory capacity of 144GB, providing a bandwidth of 1.5TB per second, equivalent to processing 230 5GB full HD movies in one second. As a faster and larger memory, HBM3E can accelerate generative AI and large language models, while also advancing scientific computing workloads in HPC.

On August 9, 2023, Huang Renxun released the GH200 Grace Hopper super chip, which is the first appearance of HBM3E. Therefore, the GH200 Grace Hopper became the world's first HBM3E GPU Currently, HBM3E is the best-performing DRAM for AI applications, with a technology generation of five. The HBM generations are divided into five: the first generation is HBM, the second generation is HBM2, HBM2E belongs to the third generation, and the fourth generation is HBM3.

According to Kyung Kye-hyun, the head of Samsung Electronics' semiconductor business, customers interested in HBM4 are engaging in joint development for customization, but he did not disclose which company the collaboration is with. Kyung Kye-hyun also stated that several customers are interested in collaborating with Samsung Electronics to develop customized versions of the next generation HBM4 (sixth generation) memory.

On March 26, at the Memcon 2024 global chip manufacturer gathering held in San Jose, California, Samsung Electronics expects that the company's HBM memory production this year will increase by 2.9 times compared to 2023