Cost plummets by 70%! Google's TPU is aggressively catching up, and its cost-performance ratio has matched NVIDIA

Wallstreetcn
2026.01.21 04:54
portai
I'm PortAI, I can summarize articles.

Goldman Sachs stated that Google/Broadcom's TPU is rapidly narrowing the gap in inference costs with NVIDIA's GPU. The unit token inference cost has decreased by about 70% from TPU v6 to TPU v7, roughly on par with NVIDIA's GB200 NVL72. This does not mean that NVIDIA's position is shaken, but it clearly indicates that the core evaluation system of AI chip competition is shifting from "who computes faster" to "who computes cheaper and more sustainably."

In the current context where AI capital expenditures remain high but commercialization pressures are rising, the market focus is undergoing a subtle yet profound shift: Can large models continue to "run without regard to costs"?

According to the Chasing Wind Trading Desk, Goldman Sachs' latest AI chip research report does not continue the familiar market comparisons of "computing power, process, and parameter scale," but instead approaches from a perspective closer to commercial reality—the unit cost of the inference stage. By constructing an "inference cost curve," Goldman Sachs attempts to answer a crucial question for the AI industry: After models enter a high-frequency calling phase, what is the real cost incurred for processing one million tokens under constraints such as depreciation, energy consumption, and system utilization for different chip solutions?

The research conclusions point to a change that is accelerating but has not yet been fully digested: Google/Broadcom's TPU is rapidly narrowing the gap with NVIDIA's GPU in terms of inference costs. From TPU v6 to TPU v7, the unit token inference cost has decreased by about 70%, making it comparable to NVIDIA's GB200 NVL72 in absolute cost terms, and in some calculated scenarios, even slightly advantageous.

This does not mean NVIDIA's position is shaken, but it clearly indicates that the core evaluation system of AI chip competition is shifting from "who computes faster" to "who computes cheaper and more sustainably." As training gradually becomes an upfront investment and inference becomes a long-term cash flow source, the slope of the cost curve is replacing peak computing power as the key variable determining the industry landscape.

I. From Computing Power Leadership to Cost Efficiency, the Evaluation Criteria for AI Chip Competition is Switching

In the early stages of AI development, training computing power almost determined everything. Whoever could train larger models faster held the technological discourse power. However, as large models gradually enter the deployment and commercialization phase, inference loads begin to far exceed training itself, and cost issues are rapidly magnified.

Goldman Sachs points out that at this stage, the cost-performance ratio of chips is no longer determined solely by single-card performance but is shaped by system-level efficiency, including computing power density, interconnect efficiency, memory bandwidth, and energy consumption. The inference cost curve constructed based on this logic shows that Google/Broadcom's TPU has made enough progress in raw computing performance and system efficiency to compete directly with NVIDIA on cost.

In contrast, AMD and Amazon's Trainium still have limited generational cost reductions. From current calculations, the unit inference costs of both remain significantly higher than those of NVIDIA and Google solutions, resulting in relatively limited impact on the mainstream market.

II. Behind the TPU Cost Leap is System Engineering Capability Rather than Single Point Breakthrough

The significant cost reduction of TPU v7 does not come from a single technological breakthrough but rather from the concentrated release of system-level optimization capabilities. Goldman Sachs believes that as computing chips themselves gradually approach physical limits, the future decline in inference costs will increasingly depend on advancements in "adjacent computing technologies." These technologies include: higher bandwidth and lower latency network interconnections; continuous integration of high bandwidth memory (HBM) and storage solutions; advanced packaging technologies (such as TSMC's CoWoS); and improvements in density and energy efficiency for rack-level solutions. The collaborative optimization of TPU in these areas demonstrates significant economic advantages in inference scenarios.

This trend is also highly consistent with Google's own computing power deployment. The usage ratio of TPU in Google's internal workloads continues to rise and has been widely used for training and inference of the Gemini model. At the same time, external customers with mature software capabilities are also accelerating the adoption of TPU solutions, with the most notable case being Anthropic's approximately $21 billion order to Broadcom, with related products expected to begin delivery in mid-2026.

However, Goldman Sachs also emphasizes that NVIDIA still holds the "time-to-market" advantage. Just as TPU v7 has just caught up with GB200 NVL72, NVIDIA has already advanced to GB300 NVL72 and plans to deliver VR200 NVL144 in the second half of 2026. The continuous product iteration pace remains a key leverage for maintaining customer stickiness.

III. Investment Implications Rebalanced: The Rise of ASICs, but NVIDIA's Moat Has Yet to Be Breached

From an investment perspective, Goldman Sachs has not downgraded its judgment on NVIDIA due to the rapid catch-up of TPU. The institution maintains a buy rating on NVIDIA and Broadcom, believing that both are directly tied to the most sustainable part of AI capital expenditure and will benefit in the long term from upgrades in networking, packaging, and system-level technologies.

In the ASIC camp, Broadcom's benefit logic is particularly clear. Goldman Sachs has raised its earnings per share forecast for fiscal year 2026 to $10.87, about 6% higher than the market consensus, and believes the market still underestimates its long-term profitability in AI networking and custom computing.

AMD and Amazon's Trainium are currently still in the catch-up phase, but Goldman Sachs also points out that AMD's rack-level solution may have a latecomer advantage. It is expected that by the end of 2026, the Helios rack solution based on MI455X is likely to achieve about a 70% reduction in inference costs in certain training and inference scenarios, which is worth continuous tracking.

More importantly, this research report does not conclude with a "winner takes all" perspective, but rather presents a gradually clearer picture of industrial division of labor: GPUs continue to dominate the training and general computing power market, while custom ASICs are increasingly penetrating scalable and predictable inference loads. In this process, NVIDIA's CUDA ecosystem and system-level R&D investment still constitute a solid moat, but its valuation logic will continue to be subjected to the reality check of "declining inference costs."

When AI truly enters the stage of "every token must calculate returns," the competition for computing power will ultimately return to economics itself. The 70% drop in TPU costs is not just a simple technological catch-up, but a critical stress test on the feasibility of AI business models. And this may be the signal that the market should take seriously behind the GPU and ASIC rivalry