Apple M4 initiates AI PC battle: Will TOPS become the future benchmark for switching devices?

Bank of America Merrill Lynch pointed out that TOPS refers to trillions of operations per second, with a higher value indicating faster processor performance in handling AI tasks, similar to measuring chip performance with 2nm and 3nm. TOPS will be used to compare the "AI performance" of Apple and its PC competitors' devices

Author: Zhao Ying

Source: Hard AI

Apple released the most expensive iPad Pro in history this week, debuting the powerful M4 chip on the iPad, kicking off the AI PC battle.

According to a report by Bank of America Merrill Lynch on Wednesday, the Apple M4 chip uses Arm architecture, is a System on Chip (SoC) with over 28 billion transistors, based on TSMC's second-generation 3nm process, equipped with a 10-core CPU and 10-core GPU.

Bank of America Merrill Lynch pointed out that it is worth mentioning that the M4 includes a 16-core Neural Processing Unit (NPU), a part of the SoC specifically designed to accelerate AI tasks, with performance measured in trillions of operations per second (TOPS). The NPU performance of the M4 is 38 TOPS (assuming INT8 precision), which is twice as fast as the M2 chip and nearly 60 times stronger than the A11 Bionic chip.

Bank of America Merrill Lynch stated:

"Although the AI PC market is still in its early stages, we expect the TOPS metric to be an imperfect but simple way (similar to 3nm/2nm) to compare the 'AI performance' of Apple and its PC competitors."

Bank of America Merrill Lynch also predicts that consumer demand for AI features in PCs/tablets will increase:

"By 2027, the AI PC market may grow from the current 50 million units to 167 million units. Apple PCs and its competitors will focus on AI PCs, paying attention to Microsoft's Windows/Surface AI activities (May 20) and the Taipei Computex (June 2 and beyond) events."

TOPS Leading the AI PC Battle

Bank of America Merrill Lynch pointed out:

"The TOPS performance of Apple's M4 is 38. Reports suggest that Microsoft will set the 'minimum specification' for AI PCs at 40 TOPS, a standard that no PC currently meets, although several upcoming options may offer 45 TOPS.

Intel's Meteor Lake (the first 'AI PC' brand 'Core Ultra') NPU supports 11 TOPS, but it promises to reach 45 TOPS this year with Lunar Lake.

Qualcomm also plans to launch a 45-TOPS Snapdragon Elite X CPU in the near future.

"However, according to NVIDIA, a PC with around 45 TOPS can only perform very basic AI tasks, while higher-performance AI tasks require a dedicated GPU in the PC, with a deployment base of over 100 million RTX GPUs (100-1300+ TOPS), although it requires 2 to 5 times the power." The memory and NPU markets will benefit

At the same time, Bank of America Merrill Lynch pointed out that memory is an underestimated beneficiary:

Larger AI models mean the need for more processing power, larger sizes, faster bandwidth, higher energy efficiency, and higher ASP memory specifications. Memory will become a unique beneficiary of edge AI upgrades.

According to Anandtech, Apple's new M4 has faster LPDDR5X memory, which may be 1.2 times faster than traditional LPDDR (120 GB/s) and more energy efficient.

In addition, NPUs are also necessary for edge AI, as mentioned by Bank of America Merrill Lynch:

As AI (such as Microsoft's Copilot) shifts to the edge/devices (PCs, smartphones, tablets), it requires additional dedicated chips besides CPUs and GPUs, namely NPUs.

The AI inference performance potential of NPUs is measured in TOPS, and NPUs can be seen as smaller-scale versions of GPU/ASIC accelerators used in data centers. Although NPUs are smaller in scale, they have similar processing capabilities, able to perform MAC operations at a certain frequency and precision, involving a trade-off between speed and accuracy.

According to Qualcomm, high-precision AI models use 32-bit or 16-bit floating-point numbers to ensure accuracy, while low-precision, high-speed models use 8-bit or 4-bit integer precision. Currently, the industry standard typically adopts 8-bit or INT8 precision, which improves processing speed while maintaining acceptable accuracy