TSMC Chairman Liu Deyin published an article on the IEEE website, interpreting the impact of the artificial intelligence revolution on the semiconductor industry. He stated that in the next ten years, artificial intelligence will require a GPU with 1 trillion transistors, which is 10 times the current device quantity. The development of artificial intelligence will bring many innovative applications, such as generative artificial intelligence and ChatGPT, which are expected to become digital assistants for all human endeavors. These advancements are made possible by efficient machine learning algorithms, the availability of large amounts of data, and advances in semiconductor technology
In previous speeches, TSMC has repeatedly mentioned the roadmap to one trillion transistors. Today, on the IEEE website, an article titled "How We’ll Reach a 1 Trillion Transistor GPU" was published, detailing how TSMC is achieving the goal of one trillion transistors on a chip.
It is worth mentioning that the article is authored by MARK LIU and H.-S. PHILIP WONG, with Mark Liu being the Chairman of TSMC. H.-S Philip Wong is a professor at Stanford University's School of Engineering and TSMC's Chief Scientist.
Here, we will translate this article for readers.
The following is the text of the article:
In 1997, IBM's Deep Blue supercomputer defeated world chess champion Garry Kasparov. This was a breakthrough demonstration of supercomputer technology and the first demonstration that high-performance computing could one day surpass human intelligence. Over the next 10 years, we began using artificial intelligence for many practical tasks such as facial recognition, language translation, and recommending movies and products.
Fifteen years later, artificial intelligence has developed to the point where it can "synthesize knowledge." Generative artificial intelligence, such as ChatGPT and Stable Diffusion, can create poetry, produce art, diagnose diseases, write summary reports and computer code, and even design integrated circuits comparable to those made by humans.
Artificial intelligence has become a digital assistant for all human endeavors, facing tremendous opportunities. ChatGPT is a great example of how artificial intelligence democratizes the use of high-performance computing and benefits everyone in society.
All these wonderful applications of artificial intelligence are attributed to three factors: innovation in efficient machine learning algorithms, the availability of large amounts of data for training neural networks, and advances in energy-efficient computing through semiconductor technology. Despite being ubiquitous, the final contributions to the generative artificial intelligence revolution have not been properly recognized.
Over the past thirty years, major milestones in artificial intelligence have been achieved through leading semiconductor technology at the time, without which they would not have been possible. Deep Blue was implemented using a mix of 0.6-micron and 0.35-micron node chip manufacturing technologies; the deep neural networks that won the ImageNet competition and ushered in the current era of machine learning were built using 40-nanometer technology; AlphaGo conquered the game of Go using 28-nanometer technology; the initial version of ChatGPT was trained on a computer built with 5-nanometer technology; and the latest version of ChatGPT is supported by servers using more advanced 4-nanometer technology. Every layer of the computer system involved, from software and algorithms to architecture, circuit design, and device technology, acts as a multiplier of artificial intelligence performance. But it can be fairly said that the advancement of basic transistor device technology has driven progress in all the layers mentioned above If the artificial intelligence revolution is to continue at its current pace, it will need more contributions from the semiconductor industry. Within ten years, it will require a GPU with 1 trillion transistors, which means the number of GPU devices will be ten times that of typical devices today.
The continuous growth in the size of AI models has increased the computational and memory access requirements for artificial intelligence training by several orders of magnitude over the past five years. For example, training GPT-3 requires the equivalent of over 50 billion calculations per second for an entire day (i.e., 5,000 petaflops/day) and a memory capacity of 3 trillion bytes (3 TB).
The computational power and memory access required for new generative artificial intelligence applications are rapidly increasing. We now need to address an urgent question: How can semiconductor technology keep up?
From Integrated Devices to Integrated Microchips
Since the invention of integrated circuits, semiconductor technology has been focused on shrinking feature sizes so that we can cram more transistors into chip sizes the size of thumbnails. Today, integration has reached a new level; we are moving beyond 2D scaling into 3D system integration. We are now combining many chips into a tightly integrated, large-scale interconnected system. This is a paradigm shift in semiconductor technology integration.
In the era of artificial intelligence, the capability of a system is directly proportional to the number of transistors integrated into the system. One of the main limitations is that lithography chip manufacturing tools are designed to produce ICs no larger than about 800 square millimeters, known as the reticle limit. However, we can now extend the size of integrated systems beyond the lithography mask limit. By connecting multiple chips onto a larger intermediate layer (a silicon interposer with built-in interconnects), we can integrate a system with a much larger number of devices than what may be possible on a single chip. For example, TSMC's CoWoS (chip-on-wafer-on-substrate) technology can accommodate up to six reticle areas of compute chips and dozens of high-bandwidth memory (HBM) chips.
CoWoS is TSMC's advanced packaging technology on silicon wafers, which is currently being used in products. Examples include Nvidia's Ampere and Hopper GPUs. Each one consists of a GPU chip and six high-bandwidth memory cubes, all on the silicon interposer. The size of the compute GPU chip is approximately the size allowed by chip manufacturing tools. Ampere has 540 billion transistors, and Hopper has 800 billion. The transition from 7-nanometer technology to denser 4-nanometer technology has increased the number of transistors packaged on the same area by 50%. Ampere and Hopper are the workhorses for training large language models (LLMs) today Training ChatGPT requires tens of thousands of such processors.
HBM is another example of a key semiconductor technology that is increasingly important for AI: the ability to integrate systems by stacking chips together, which we refer to at TSMC as SoIC (system-on-integrated-chips). HBM consists of a stack of vertically interconnected DRAM chips on top of control logic ICs. It uses vertical interconnects called Through Silicon Vias (TSV) to allow signals to pass through each chip and solder bumps to form connections between memory chips. Today, high-performance GPUs widely use HBM.
Looking ahead, 3D SoIC technology can provide a "bumpless alternative" to traditional HBM technology, offering denser vertical interconnects between stacked chips. Recent advancements have shown that a storage system stacked with 12 layers of chips using hybrid bonding technology, achieving higher density than solder bumps can provide. This storage system is bonded at low temperatures on top of a larger logic chip, with a total thickness of only 600 µm.
For high-performance computing systems composed of chips running large AI models, high-speed wired communication may soon limit computational speed. Optical interconnects are now being used to connect server racks in data centers. Soon, we will need silicon photonics-based optical interfaces, integrated with GPUs and CPUs. This will allow for expanded bandwidth with energy and area efficiency, enabling direct optical GPU-to-GPU communication, where hundreds of servers can act as a single giant GPU with unified memory.
Due to the demands of AI applications, silicon photonics will become one of the most important enabling technologies in the semiconductor industry.
Towards Trillion-Transistor GPUs
As mentioned earlier, typical GPU chips used for AI training have reached the reticle field limit, with around 100 billion transistors. The ongoing trend of increasing transistor counts will require multiple chips to perform computations through 2.5D or 3D integrated interconnects. By integrating multiple chips using CoWoS or SoIC and related advanced packaging technologies, the total number of transistors in each system can be much larger than what can be packed into a single chip. The AMD MI 300A is an example of a chip manufactured using this technology.
The AMD MI300A accelerator processor unit leverages both CoWoS and TSMC's 3D SoIC technology. The MI300A combines GPU and CPU cores designed to handle the largest AI workloads. The GPU performs intensive matrix multiplication operations for AI, while the CPU controls the overall system operation, with high-bandwidth memory (HBM) serving both. Nine compute chips built with 5-nanometer technology are stacked on top of four 6-nanometer base chips dedicated to cache and I/O traffic The basic chip and HBM are located above the silicon interposer. The computational part of the processor consists of 150 billion transistors.
We predict that within ten years, multi-chip GPUs will have over 1 trillion transistors.
We need to connect all these small chips together in a 3D stack, but fortunately, the industry has been able to rapidly reduce the vertical interconnect spacing, increasing the connection density. And there is enough space to accommodate more. We believe that the interconnect density has no reason not to grow by an order of magnitude, or even higher.
Energy Efficiency Trends of GPUs
So, how do all these innovative hardware technologies improve system performance?
If we observe a steady improvement in a metric called energy efficiency, we can see the trends already present in server GPUs. EEP is a comprehensive measure of the energy efficiency and speed of a system. Over the past 15 years, the energy efficiency performance of the semiconductor industry has improved by about three times every two years. We believe this trend will continue at a historical pace. It will be driven by various innovations, including new materials, devices and integration technologies, extreme ultraviolet (EUV) lithography, circuit design, system architecture design, and the collective optimization of all these technological elements.
In particular, the increase in EEP will be achieved through the advanced packaging technologies discussed here. Furthermore, concepts such as system-technology co-optimization (STCO) will become increasingly important, where different functional parts of the GPU are separated onto their own small chips and the most optimal and cost-effective technologies are used to build each part.
The Mead-Conway Moment of 3D Integrated Circuits
In 1978, Professor Carver Mead of the California Institute of Technology and Lynn Conway of Xerox PARC invented a computer-aided design method for integrated circuits. They used a set of design rules to describe chip scaling so that engineers could easily design very large scale integrated (VLSI) circuits without needing to know too much about process technology.
3D chip design also requires the same functionality. Today, designers need to understand chip design, system architecture design, and hardware and software optimization. Manufacturers need to understand chip technology, 3D IC technology, and advanced packaging technology. Just as we did in 1978, we once again need a common language to describe these technologies in a way that electronic design tools can understand This hardware description language allows designers to freely engage in 3D IC system design without considering underlying technologies. It's on the way: an open-source standard called 3Dblox has been adopted by most tech companies and Electronic Design Automation (EDA) companies today.
The Future Beyond the Tunnel
In the era of artificial intelligence, semiconductor technology is a key driver of new capabilities and applications in AI. New GPUs are no longer constrained by traditional size and form factors. New semiconductor technologies are no longer limited to shrinking the next generation of transistors on a two-dimensional plane. Integrated AI systems can be composed of as many energy-efficient transistors as possible, efficient system architectures for specialized computing workloads, and optimized relationships between software and hardware.
For the past 50 years, the development of semiconductor technology has been like walking in a tunnel. The path ahead was clear because there was a defined road. Everyone knew what needed to be done: shrink transistors.
Now, we have reached the end of the tunnel. From here on, semiconductor technology will become increasingly challenging to develop. However, beyond the tunnel, there are more possibilities. We are no longer bound by the past constraints.
Source: Semiconductor Industry Watch, Original Title: "Trillion-Transistor GPUs Are Coming, TSMC Chairman Interprets"