Wallstreetcn
2024.06.07 01:37
portai
I'm PortAI, I can summarize articles.

Track Hyper | Intel AI chip powerhouse Lunar Lake debuts

The new architecture brings a surge in AI performance

Author: Zhou Yuan / Wall Street News

Intel's next-generation dedicated mobile AI chip architecture for AI PCs - Lunar Lake, finally reveals its full details.

On June 4th, Intel CEO Pat Gelsinger officially unveiled all the technical details of the Lunar Lake architecture at COMPUTEX 2024: CPU, GPU, NPU performance improvements, reduced power consumption, comprehensive AI computing power reaching 120TOPS, surpassing the previously leaked figure of over 100TOPS.

Compared to the first-generation Core Ultra Meteor Lake which changed the CPU structure, Lunar Lake, designed specifically for AI PCs, adopts a completely new architecture design: such as the P-Core (performance core) Lion Cove architecture, the E-Core (efficiency core) Skymont architecture, performance comparable to discrete graphics architecture Xe2 integrated graphics, NPU quantity increased from two to four, and the first use of package-level memory (LPDDR5x memory integrated with the compute module).

Furthermore, it is rumored in the industry that Intel has adopted TSMC for manufacturing: using the N3B process for manufacturing the compute module (Compute Tile), while TSMC N6 is responsible for manufacturing the platform control module (Platform Controller Tile).

Major Changes: P/E-Core New Architecture Advantages

The Lunar Lake architecture design includes seven aspects: modular structure, packaging technology, P performance core, E efficiency core, hybrid architecture and thread scheduling, GPU integrated graphics, NPU AI engine, and platform connectivity.

The main highlights of this new architecture are threefold: firstly, this is the first time Intel has fully adopted TSMC for chip manufacturing (although Intel has not officially confirmed this); secondly, this new architecture integrates LPDDR5x memory on the new AI PC chip, which means laptops using this chip do not need additional memory configuration, similar to the SoC (system-level chip) structure of smartphones; thirdly, Lunar Lake uses Intel's Foveros packaging technology.

In terms of the computing core architecture, Lunar Lake has 4 P-Cores (performance cores), 4 E-Cores (efficiency cores), totaling 8 threads, that is 4P+4E/8T.

The significant design changes come from the P-Core and E-Core: the former adopts the new Lion Cove architecture, while the latter uses the Skymont architecture. This replaces the Cresmont (energy-efficient high-efficiency core) of the original Meteor Lake architecture, and the LP E-Core (low-power efficiency core) design seen on Core Ultra has also been abandoned.

Among them, the speed of the E core is the same as the LP-E core, but the power consumption is only 30% of the LP-E core, with a performance improvement of 2 times or 4 times (single/multi-thread) in performance The more important change is that internally, E-Core does not connect like P-Core with a Ring bus, but instead has the characteristics of LP E-Core, and with TSMC's N3B process efficiency and new architecture design, the number of instructions per cycle (IPC) is increased, resulting in significant gains.

The role of Lion Cove, technically, involves investing more in cache in CPU design to address CPU performance issues. As CPU system design becomes more complex, it is necessary to increase the cache subsystem to ensure overall performance and execution efficiency improvement.

Furthermore, the Lion Cove architecture also has a major design change, which Intel even believes will have a profound impact on future chip designs: Intel is focusing on creating larger partitions, changing from the previous small partition design.

The benefit of this design is to reduce the overall design cost and complexity of the chip, making future design iterations and upgrades easier.

The new P-Core adopts the Lion Cove architecture, which also has benefits in power consumption control: the IPC of the P-Core has increased by 30%, and dynamic power efficiency has improved by 20%.

What are the benefits of the E-Core in Lunar Lake using the Skymont architecture?

Intel states that the E-Core using the Skymont architecture can perform on par with the previous generation P-Core (E-Core as an energy-efficient core, far less powerful than the performance core P-Core); moreover, in some work scenarios, the performance can even surpass it.

How is this achieved?

The Skymont architecture uses a new design, including decoding and executing 9 instructions in a single clock cycle, which is 9-wide decoding, a 50% increase over the previous E-Core's Crestmont architecture. Generally, the wider the decoding stage, the stronger the processor performance, as it can more effectively utilize resources and speed up instruction execution.

E-Cores using this architecture show a significant improvement in power efficiency; while single-thread performance is increased by 1.7 times, power consumption is only 30% of Meteor Lake LP E-Core; when comparing Skymont E-Core clusters with Meteor Lake's LP E-Core, with the same power consumption, multi-thread performance is increased by 2.9 times.

First Use of Package-Level Memory

Lunar Lake features an amazing technological "innovation": this is the first time Intel has adopted the practice of integrating memory within the processor internally, which Intel refers to as "Memory on Package" In other words, laptops equipped with the Lunar Lake processor do not support independent SO-DIMM standard memory (LPDDR5x), so memory cannot be expanded later to upgrade performance.

If memory cannot be expanded, wouldn't it be the same as current ultra-thin laptops (memory integrated on the PCB motherboard)?

The physical structure of Lunar Lake mainly consists of three parts: the computing module and platform controller module, these two parts adopt the modular design approach of Meteor Lake, forming the computing performance core of Lunar Lake; to reinforce the computing core structure, Intel also added a filler module without circuits and performance functions.

The computing module (integrating the latest Xe2 GPU, 4th generation NPU, IPU), platform controller module, and the filler module without practical function, are packaged on a base tile through Intel's Foveros packaging technology, forming a whole.

Compared to the previous generation, the Xe2 GPU of Lunar Lake has increased gaming and graphics performance by 1.5 times, AI throughput increased by over 3.5 times, and computing power reaching 67 TOPS.

In terms of internal communication, the computing module connects to the main units through Home Agent, Coherency Agent, etc., while the platform controller module connects through IO Coherency to ensure internal consistency and achieve efficient communication.

The technical highlight of Lunar Lake, or a significant design change, is the encapsulation of two memories. In the upper half of the CPU floor plan, two 64-bit 32GB LPDDR5X (SO-DIMM standard) memories are encapsulated: with a maximum frequency of 8500MHz, each chip has four 16-bit channels, with a total capacity of up to 32GB.

Intel claims that this design can save 40% of power consumption and free up to 250 square millimeters of motherboard area, significantly improving battery life and leaving more space for other designs of the laptop.

As an AI chip architecture for AI PCs, Lunar Lake provides AI power with the new NPU 4 and Arc Xe2-LPG integrated graphics. NPU 4 achieves 48 TOPS in INT 8, far exceeding the Microsoft Copilot+ AI PC performance standard (40 TOPS), meeting future AI PC performance requirements.

In comparison to the mere 11.5 TOPS computing power of the NPU in Meteor Lake, Lunar Lake has doubled the number of NPUs, increased memory bandwidth by 1 time, increased clock speed from 1.4GHz to 1.95GHz, achieving 48 TOPS and about 2-4 times overall performance After adding the Arc Xe2-LPG integrated graphics to Lunar Lake's NPU 4, its comprehensive computing power reaches up to 120 TOPS. However, the downside is that the power consumption will increase significantly when running at full load.

How to solve this issue? Intel has a solution.

Intel has enhanced the Intel Thread Director (ITD) in collaboration with Microsoft to optimize it for Windows Copilot and other AI assistants.

With the new thread director, Windows can now create containment zones to offload most of the actual workload to the Skymont E cores, addressing the increased power consumption issue caused by high computing power to ensure users' efficient battery life.

The architects of Lunar Lake have indeed introduced several innovative designs, such as Intel's decision to eliminate hyper-threading technology. At common laptop clock speeds, the E-core performance of Lunar Lake is even stronger than the P-core of Meteor Lake, with a single-thread performance improvement of up to 20%. Additionally, the four Lion Cove P cores have achieved a 14% increase in IPC performance.

According to Intel's scheduled plan, Lunar Lake is set to be released in the third quarter of this year