HBM 4, about to be completed

HBM4 is the next generation high-bandwidth memory (DRAM) standard aimed at further increasing data processing speeds while maintaining key features such as higher bandwidth, lower power consumption, and larger capacity. HBM4 doubles the number of channels per stack and also has a larger physical footprint. This standard supports device compatibility, specifies 24Gb and 32Gb layers, and allows for the selection of TSV stacks of different heights. JEDEC has indicated that discussions are ongoing regarding higher frequencies. HBM4 is crucial for applications requiring efficient processing of large datasets and complex computations, such as artificial intelligence, high-performance computing, high-end graphics cards, and servers

Recently, JEDEC Solid State Technology Association announced that the next version of the highly anticipated High Bandwidth Memory (HBM) DRAM standard, HBM4, is nearing completion.

HBM4 is described as an evolutionary version of the current HBM3 standard, aimed at further increasing data processing speeds while maintaining key features such as higher bandwidth, lower power consumption, and larger chip and/or stack capacities. These advancements are crucial for applications that require efficient processing of large datasets and complex computations, including artificial intelligence (AI) generation, high-performance computing, high-end graphics cards, and servers.

Compared to HBM3, HBM4 is planned to double the number of channels per stack, resulting in a larger physical footprint. To support device compatibility, the standard ensures that a single controller can work with both HBM3 and HBM4 when needed. Different configurations will require different intermediate layers to accommodate varying physical footprints. HBM4 will specify 24 Gb and 32 Gb layers, with optional support for 4-high, 8-high, 12-high, and 16-high TSV stacks.

JEDEC noted that the committee has reached a preliminary agreement on speed grades up to 6.4 Gbps and is currently discussing higher frequencies.

What are the updates for HBM4?

High Bandwidth Memory has been around for about a decade, and during its continuous development, its speed has steadily increased from 1 GT/s (initial HBM) to the current 9 GT/s of HBM3E. This remarkable leap in bandwidth within less than 10 years has made HBM a crucial cornerstone for new HPC accelerators introduced to the market thereafter.

However, as memory transfer rates continue to rise, especially when the fundamental physical properties of DRAM units remain unchanged, sustaining such speeds becomes increasingly challenging. Therefore, for HBM4, major memory manufacturers behind the specification are planning more substantial changes to high bandwidth memory technology, starting with a wider 2048-bit memory interface.

HBM4 will expand the memory stack interface from 1024 bits to 2048 bits, marking one of the most significant changes to the HBM specification since the introduction of this memory type eight years ago. Doubling the number of I/O pins while maintaining a similar physical footprint poses a significant challenge for memory manufacturers, SoC developers, foundries, and outsourced assembly and testing (OSAT) companies According to the plan, this will enable HBM4 to achieve significant technological breakthroughs on multiple levels. In terms of DRAM stacking, a 2048-bit memory interface will require a significant increase in the number of silicon vias for wiring through the memory stack. At the same time, the spacing between the bump pads on the external chip interface will need to be reduced to below 55 microns, while increasing the total number of microbumps from the current number in HBM3 (approximately 3982 bumps).

Memory manufacturers have also stated that they will stack up to 16 memory chips in a module, known as 16-Hi stacking, adding some complexity to the technology. (HBM3 technically also supports 16-Hi stacking, but so far no manufacturer has actually used it) This will allow memory suppliers to significantly increase the capacity of their HBM stacks, but it brings new complexity, namely connecting more DRAM chips in a defect-free manner, and then maintaining the final HBM stack appropriately and consistently short. All of this in turn requires closer collaboration between chip manufacturers, memory manufacturers, and chip packaging companies to ensure smooth operation.

However, as the number of DRAM stacks increases, some have pointed out limitations in packaging technology.

Existing HBM uses TC (thermo-compression) bonding technology, which creates TSV channels in DRAM and electrically connects them through microbumps in the form of small protrusions. Samsung Electronics and SK Hynix have slightly different specific methods, but they both use bumps.

Initially, customers stacked DRAM up to 16 layers and requested that the final packaging thickness of HBM4 be 720 microns, the same as the previous generations. The general consensus is that it is practically impossible to achieve a 16-layer DRAM stack HBM4 at 720 microns using existing bonding. Therefore, the alternative solution being considered in the industry is hybrid bonding. Hybrid bonding is a technology that directly bonds copper wiring between chips and wafers. Since microbumps are not used between DRAMs, it is easier to reduce the packaging thickness.

However, according to reports in Korean media in March, at the time of the discussion, the companies involved decided to relax the packaging thickness standard to 775 microns, thicker than the previous 720 microns. The main participants of the Joint Electron Device Engineering Council (JEDEC) also agreed to set the standard for HBM4 products at 775 microns. If the packaging thickness is reduced to 775 microns, even with existing bonding technology, a 16-layer DRAM stack HBM4 can be fully realized. Considering the huge investment cost of hybrid bonding, memory companies are likely to focus on upgrading existing bonding technologies According to the roadmap shared by Trendforce at the end of last year, the first batch of HBM4 samples is expected to have a capacity of up to 36 GB per stack, and the complete specifications are expected to be released by JEDEC around the second half of 2024 to 2025. The first batch of customer samples and supply time is expected to be in 2026, so we still have a long time to see the new high-bandwidth memory solution put into use.

Latest Layout of the Three Giants

Currently, there are three major players in the market: SK Hynix, Samsung, and Micron, who are also competing in the field of HBM4.

First, let's look at SK Hynix. At an industry event in May, SK Hynix stated that they may be the first to launch the next generation HBM4 in 2025. SK Hynix plans to use TSMC's advanced logic process in the base chip of HBM4 to cram additional features into limited space, helping SK Hynix customize HBM to meet a wider range of performance and energy efficiency requirements.

At the same time, SK Hynix stated that they plan to optimize the combination of their HBM and Chip-on-Wafer-on-Substrate (CoWoS, TSMC's packaging technology) technologies to meet customer's HBM needs.

In SK Hynix's view, the company's HBM products have the best speed and performance in the industry. Especially our unique MR-MUF technology provides the most stable heat dissipation for high performance, ensuring the creation of top global performance. SK Hynix claims that the large-scale reflow molding bottom filling (MR-MUF) technology is 60% more robust than products manufactured using thermocompression non-conductive film (TC-NCF). In addition, the company has the ability to rapidly mass-produce high-quality products, and our response speed to customer demands is also top-notch. The combination of these competitive advantages sets our HBM apart and ranks it at the forefront of the industry.

Specifically in terms of DRAM, it is reported that SK Hynix plans to apply 1b DRAM to HBM4 and transition to 1c DRAM for HBM4E applications. However, it is understood that SK Hynix still has the flexibility to change application technologies according to market conditions.

Moving on to Samsung, as a follower, Samsung is also fully engaged.

Samsung Electronics has established a new "HBM Development Team" within its Device Solutions (DS) division to enhance its competitiveness in High Bandwidth Memory (HBM) technology. This strategic move was taken more than a month after Vice Chairman Kyung-Hyun Kyung took over as head of the DS division, reflecting the company's determination to maintain a leading position in the rapidly evolving semiconductor market The newly established HBM development team will focus on advancing HBM3, HBM3E, and the next-generation HBM4 technologies. This initiative aims to meet the surging demand for high-performance memory solutions driven by the expansion of the artificial intelligence (AI) market. Earlier this year, Samsung set up a task force (TF) to enhance its competitiveness in HBM, and the new team will integrate and build upon these existing efforts.

Samsung Electronics also emphasized its commitment to strengthening the customized services for the upcoming sixth-generation high-bandwidth memory (HBM4) set to be released next year.

Choi Jang-seok, Vice President of the New Business Planning Group in the Memory Business Division at Samsung, stated, "Compared to HBM3, HBM4 offers significantly improved performance," and added, "We are expanding capacity to 48GB (gigabytes) and developing towards next year's production target."

Samsung Electronics is applying MOSFET technology to HBM3E and is actively considering transitioning to FinFET technology starting from HBM4. As a result, compared to MOSFET applications, HBM4 offers a 200% speed increase, 70% reduction in size, and over 50% performance improvement. This marks Samsung Electronics' first public disclosure of HBM4 specifications.

Vice President Choi mentioned, "There will be significant changes in the HBM architecture. Many customers aim for customized optimization rather than existing general-purpose usage." He further explained, "For example, the 3D stacking of HBM DRAM and custom logic chips brings significant improvements." "Due to the intermediary layer and a large number of inputs/outputs (I/O) in general HBM, it may be possible to reduce performance and eliminate barriers to performance expansion," he elaborated.

He continued, "HBM must not only consider performance and capacity but also power consumption and thermal efficiency. Therefore, the 16-layer HBM4 not only adopts various cutting-edge packaging technologies such as HCB (Hybrid Bonding) technology outside of NCF (Non-Conductive Film) assembly technology but also new processes." "Proper implementation of various new technologies is crucial, and Samsung is preparing according to plan," he added.

Reports have indicated that Samsung Electronics recently internally devised a plan to change the originally planned 1b DRAM in HBM4 to 1c DRAM. The mass production target date has been moved up from the end of next year to the latter half of next year, but this rumor remains unconfirmed pending yield support.

Another HBM participant, Micron, is expected to launch 12H and 16H versions of HBM4 between 2025 and 2026, with capacities ranging from 36GB to 48GB and speeds exceeding 1.5TB/s. According to Micron, following HBM4, HBM4E is set to debut in 2028. The extended version of HBM4 is projected to achieve higher clock frequencies, increase bandwidth to over 2+ TB/s, and raise capacity to 48GB to 64GB per stack

Accelerating High-Bandwidth Memory to the Speed of Light

The emergence of HBM is to provide GPU and other processors with more memory than the standard x86 slot interface can support. However, as GPUs become more powerful, they need to access data from memory faster to shorten application processing times. For example, large language models (LLMs) may involve accessing tens of billions or even trillions of parameters repeatedly during machine learning training, which may take hours or even days to complete.

Current HBM follows a fairly standard design: the HBM memory stack is connected to an intermediate layer on the base package through microbumps, and the microbumps are connected to silicon vias (TSVs or connection holes) in the HBM stack. A processor is also installed on the intermediate layer, providing the connection between HBM and the processor.

HBM suppliers and standard organizations are researching the use of technologies such as photonics or directly installing HBM on processor chips to speed up access between HBM and the processor. Suppliers are setting HBM bandwidth and capacity speeds—seemingly faster than what the JEDEC standard organization can keep up with.

Samsung is researching the use of photonics in the intermediate layer. Photons flow faster on links than bits encoded as electrons, and consume less power. Photon links can operate at the speed of femtoseconds. This means 10-15 units of time—one trillionth of a second (one billionth of a millionth) According to South Korean media reports, SK Hynix is still studying the concept of direct HBM-logic connection. This concept involves manufacturing GPU chips and HBM chips together in a hybrid semiconductor. The chip factory sees this as HBM4 technology and is in negotiations with Nvidia and other logic semiconductor suppliers. The idea involves memory and logic manufacturers jointly designing chips, which will then be manufactured by foundries such as TSMC.

This is somewhat similar to the idea of Processing in Memory (PIM), which will be proprietary and subject to vendor lock-in unless protected by industry standards.

Unlike Samsung and SK Hynix, Micron has not discussed integrating HBM and logic into a single chip. It will inform GPU suppliers (AMD, Intel, and Nvidia) that they can achieve faster memory access speeds using a combined HBM-GPU chip, while GPU suppliers will be well aware of the dangers of proprietary lock-in and single sourcing.

As ML training models become larger and training times longer, the pressure to shorten runtimes by speeding up memory access and increasing GPU memory capacity will also increase. Giving up the competitive advantage of standardized DRAM for locked HBM-GPU combination chip designs (despite better speed and capacity) may not be the right way forward.

Source: Semiconductor Industry Watch Original Title: "HBM 4, Coming Soon"