Route, unique among others.

NVIDIA is becoming increasingly dominant in the field of generative AI, but competitors are still emerging one after another, continuously challenging NVIDIA's rapidly forming AI chip empire.

Recently, Google unveiled the latest generation of generative AI chip, "Google Cloud TPU v5e," at the 2023 Google Conference. This AI accelerator is designed specifically for large-scale models.

Compared to its predecessor, Cloud TPU v4, the cost of TPU v5e has been reduced by 50%. In terms of cost per dollar, TPU v5e provides up to 2 times the training performance and 2.5 times the inference performance.

However, NVIDIA's AGI chip dominance is difficult to shake in an instant. Therefore, Google's goal is different from NVIDIA's. Google is attempting to create a complete AGI ecosystem, which is a more ambitious market positioning than NVIDIA's AI chip dominance.

Can Google succeed?

TPU v5e: Designed for Generative AI

At the Google Cloud Next 2023 conference held on August 30th, Google announced a series of product updates. Among them, the AI accelerator Cloud TPU v5e launched by Google is considered the latest assault on NVIDIA's AI chip dominance in the industry. IBM's brain-like AI chip was released slightly earlier on August 28th.

TPU, short for "Tensor Processing Unit," is a custom ASIC (Application-Specific Integrated Circuit) designed by Google for machine learning (ML), specifically for Google's deep learning framework TensorFlow.

ASIC refers to special specification chips customized according to different product requirements. In contrast, non-customized chips are chips that apply to specific standard products.

Compared to graphics processing units (GPUs), TPUs use low-precision (8-bit) calculations to reduce the number of transistors used in each operation.

Reducing precision has little impact on the accuracy of deep learning but can significantly reduce power consumption and speed up calculations. At the same time, TPU uses a systolic array design to optimize matrix multiplication and convolution operations, reducing I/O operations. In addition, TPU also uses larger on-chip memory to reduce access to DRAM, thereby maximizing performance.

In 2016, Google first announced TPU at its I/O conference. The first generation of TPUs was launched the same year, the fourth generation was released in 2021, and they were made available to developers in 2022.

Cloud TPU is a Google Cloud service suitable for training large and complex deep learning models that require massive matrix calculations, such as large language models, protein folding modeling, and drug development. It helps businesses save money and time when implementing AI workloads. To this day, technology companies like Google, when launching AI chips, will certainly not overlook the application needs of LLM reasoning and training. Cloud TPU v5e is exactly that.

However, although this AI accelerator is also designed for cost-effectiveness and performance required for training and inference, its training scale is not super large, but rather medium to large.

The technical roadmap of Cloud TPU v5e seems a bit like the initial brand positioning of the Chinese company Xiaomi: emphasizing cost-effectiveness. Compared to its predecessor, Cloud TPU v4, this latest AI accelerator focuses on efficiency, reducing costs by 50%, while improving training performance by 2 times and inference capability by 2.5 times.

Therefore, Google refers to Cloud TPU v5e as a "supercomputer" that balances performance, flexibility, and efficiency, allowing up to 256 chips to be interconnected, with an aggregate bandwidth of over 400 Tb/s and 100 petaOps of INT8 performance. In addition, it also supports eight different virtual machine (VM) configurations, with the number of chips per chip ranging from one to over 250.

In terms of performance, there is a set of data for reference: according to speed benchmark tests, using Cloud TPU v5e, the speed of training and running artificial intelligence models has increased by 5 times; within 1 second, it can process real-time internal speech-to-text and emotion prediction models for 1000 seconds, an improvement of 6 times compared to before.

Google stated, "We are at a rare turning point in the field of computing. The traditional approach of designing and building computing infrastructure is no longer sufficient to meet the exponentially growing demand for generative artificial intelligence and LLM workloads. Over the past five years, the number of parameters in LLM has increased by 10 times each year. Therefore, customers need cost-effective and scalable artificial intelligence optimization infrastructure."

By providing AI new infrastructure technologies, TPUs, and GPUs, Google Cloud is striving to meet the needs of developers. This effort includes two aspects: in addition to Cloud TPU v5e (currently available in preview), it also includes integration with Google Kubernetes Engine (GKE), Vertex AI, as well as frameworks such as PyTorch, JAX, and TensorFlow, to improve the efficiency of developers' use.

Considering that Cloud TPU v5e is designed for medium to large models, Google has also prepared a new product for super large models: the "Google A3 VM" supercomputer based on the NVIDIA H100 GPU, which will be fully launched in September. This is a super AI platform designed to support large-scale AI models.

Roadmap: Building a Development Ecosystem

In addition to its powerful performance and appealing cost-effectiveness, the ease of use of Google Cloud TPU v5e is also exceptionally prominent.

Developers (which may also include businesses or research institutions) can use Google Kubernetes Engine (GKE) to manage medium to large-scale AI workloads based on Cloud TPU v5e, thereby improving AI development efficiency. For businesses or research institutions that prefer simple hosting services, Vertex AI now supports the use of Cloud TPU virtual machines for training different frameworks and libraries.

GKE is a managed container orchestration service on the Google Cloud platform, while Kubernetes is an open-source container orchestration platform that helps technical personnel in organizations manage and schedule containerized applications. GKE simplifies the process for technical users to deploy, manage, and scale containerized applications on Google Cloud.

With the powerful tools and services provided by GKE, developers can easily create and manage Kubernetes clusters. Through GKE, technical developers or organizations can quickly start and stop Kubernetes clusters, automatically manage and scale nodes, and monitor and debug applications. GKE also provides highly reliable infrastructure and automated operations, allowing technical users to focus on application development and deployment without worrying about underlying infrastructure details.

The usability of Cloud TPU v5e reflects Google's different approach in the generative AI field compared to NVIDIA.

The ultimate goal of this approach is to establish a systematic generative AI developer ecosystem.

Cloud TPU v5e provides built-in support for Google AI frameworks such as JAX, PyTorch, and TensorFlow, and can also be integrated with Google's AI developer platform, Vertex AI.

Vertex AI is a machine learning (ML) platform released by Google Cloud in May 2021, mainly used for training and deploying ML models and AI applications, and can also be used for custom LLM.

Vertex AI combines data engineering, data science, and ML workflows, allowing technical development teams to use a common toolset for collaboration. It leverages the advantages of Google Cloud to scale applications and provides options such as AutoML, custom training, model discovery, and generative AI, enabling automated deployment and scaling with end-to-end MLOps tools.

This AI development platform supports multiple interfaces, including SDKs, consoles, command lines, and Terraform. VertexAI Extensions are a set of fully managed developer tools that enable real-time data flow and practical operations from models to APIs.

According to Google, Vertex AI Search and Conversations can help developers who want to quickly start common generative artificial intelligence (AI) use cases, such as chatbots and custom search engines, get started without any AI experience. In many cases, with the Vertex AI platform, developers don't need to write any code.

In fact, the Vertex AI developer platform is Google's weapon in the competition for generative AI. Google also intends to build Vertex AI into a large AI development ecosystem. In this ecosystem, Google is pushing both software and hardware to the top with increasingly powerful performance. On top of that, it integrates a one-stop service for AI development.

This is a different path from the AGI chip leader, NVIDIA, which takes the route of providing tools for AGI. Only a platform-based ecosystem can be closely tied to the industry and potentially compete with the NVIDIA empire.

Track Hyper | Google's Attempt to Shake NVIDIA's Empire

TPU v5e: Designed for Generative AI

Roadmap: Building a Development Ecosystem