AMD reaches a milestone moment! Oracle collaborates with MI300X to build a supercomputing cluster

Zhitong
2024.09.27 08:03
portai
I'm PortAI, I can summarize articles.

Oracle chooses AMD's MI300X AI accelerator to provide core artificial intelligence computing power for its latest OCI supercomputing cluster. This collaboration marks a milestone moment for AMD in the data center AI GPU market, potentially helping it gradually expand market share and challenge NVIDIA's dominant position. Although NVIDIA still holds an absolute advantage in the AI infrastructure field, AMD is actively participating in AI infrastructure competition through its partnership with Oracle

According to the Zhitong Finance APP, the global technology giant Oracle (ORCL.US), known for its cloud computing services and database software, recently chose to equip the AMD Instinct MI300X AI accelerator with the ROCm open software ecosystem. This accelerator is considered the most powerful competitor to NVIDIA's H100 and H200 AI GPUs, providing the core artificial intelligence computing power hardware support for Oracle's latest OCI supercomputing cluster instances. Collaborating with Oracle is a "milestone moment" for AMD (AMD.US), which currently holds less than 10% market share in the data center AI GPU market. This partnership signifies AMD gradually integrating into the global cloud computing giant circle and is poised to continuously capture market share from NVIDIA in the AI GPU market.

Oracle, a cloud giant, selected the AMD Instinct MI300X AI accelerator for its latest OCI (Oracle Cloud Infrastructure) supercomputing cluster. This indicates that AMD is continuously enhancing its influence in the AI GPU market. Gaining the recognition and support from Oracle, a cloud giant, is crucial for AMD. AMD has the opportunity to leverage Oracle's strong influence in the global cloud computing service market to expand the market share of Instinct MI300X in the data center AI GPU field.

Despite NVIDIA's dominant position in the global data center AI GPU market, especially in playing a key role in AI large model training and inference hardware systems, this move shows that AMD is actively participating in this AI infrastructure competition. Through its Instinct MI300X AI accelerator + ROCm software acceleration ecosystem, AMD is challenging NVIDIA's dominant position in the AI GPU field.

The Oracle Cloud Infrastructure (OCI) Supercluster is a cloud supercomputing cluster created by Oracle, providing high-performance AI infrastructure. The powerful AI computing resources provided are used for one-stop training, tuning, and deploying generative AI large models as well as efficiently deploying and operating generative AI applications similar to ChatGPT.

According to the latest news, Oracle's latest OCI supercomputing cluster is equipped with the AMD MI300X AI accelerator as the core AI computing hardware. By combining Oracle's extreme network structure technology with other accelerator devices on OCI, a single cluster system can support up to 16,384 high-performance GPUs.

These OCI bare metal instances from Oracle are designed to run extremely demanding artificial intelligence workloads, including large language model inference and training parallel heavy computing workloads that require high throughput, leading industry memory capacity, and bandwidth. It is understood that many well-known tech companies such as Fireworks AI have already adopted these OCI bare metal instances

Through cooperation with Oracle, AMD is expected to rapidly increase its share in AI data centers

As large cloud computing service providers begin to seek alternatives to the expensive and scarce H100/H200 from NVIDIA, and AMD makes some progress in AI GPU by providing better hardware and software synergy support, the AMD MI300X has now become a popular hardware foundation in the AI field.

The AI accelerator MI300X developed by AMD has significant advantages in memory bandwidth and capacity compared to NVIDIA's Hopper architecture AI GPU, especially suitable for training and inference tasks of generative AI models with high requirements for AI parallel computing power. Oracle's latest choice indicates that AMD has strong competitiveness in hardware design and AI-related software ecosystem support, especially in the synergy of hardware and software systems required for high-performance computing and AI workloads.

Undoubtedly, the cooperation with Oracle will help expand AMD's MI300X AI accelerator's market share in data center GPU market comprehensively, significantly improving the efficiency of its customers in parallel computing-intensive enterprise computing workloads.

NVIDIA's deep accumulation in hardware architecture, parallel computing, and software acceleration ecosystem required for AI training/inference has firmly secured its dominant position in the data center AI GPU market at least in recent years. Enterprises often heavily rely on NVIDIA's high-performance AI GPU rooted in the global AI development for many years and the CUDA acceleration software ecosystem during the construction of large-scale AI infrastructure. However, many analysts believe that if AMD can continue to improve its ROCm software ecosystem and accelerate its support for mainstream AI developer environments, it may further erode NVIDIA's market share in the data center AI GPU market.

With AMD's grand blueprint for AI GPU released some time ago, it is evident that AMD is very confident in occupying more market share in the data center AI GPU market in the future. According to the AI blueprint presented by CEO Lisa Su at the Taiwan Computex conference, the upgraded version of AMD M300X AI chip for AI data center servers - MI325X will start selling in the fourth quarter, the more advanced MI350 series will be launched in 2025, and the MI400 series will be launched a year later. AMD's approximately annual release cycle aligns comprehensively with NVIDIA CEO Jensen Huang's plan for annual AI GPU new product releases.

Lisa Su pointed out that the performance improvement of MI325X AI by AMD is the largest in AMD's history, with over 1.3 times improvement compared to the competitor NVIDIA H200. Specifically, the peak theoretical FP16 of AMD MI325X is around 1.3 times that of H200, 1.3 times the memory bandwidth of H200, and the model size per server is twice that of H200.

Global renowned strategic consulting firm Bain predicts that as artificial intelligence (AI) technology rapidly disrupts enterprises and economies, all markets related to artificial intelligence are expanding, and will reach $990 billion by 2027 The consulting firm pointed out in its fifth annual "Global Tech Report" released on Wednesday that the overall market, including artificial intelligence-related services and basic hardware, will grow by 40% to 55% annually from the base of $185 billion last year. This means that by 2027, it will bring in revenues of $780 billion to $990 billion