NVIDIA GB 300 details exposed, the next generation GPU monster
Nvidia is about to launch its second-generation Blackwell B300 series processors, which are expected to have a 50% performance improvement, with a memory capacity of 288GB and a bandwidth of 8TB/s. The new processors use TSMC's 4NP process, with a TDP of 1400W, which is only an increase of 200W compared to the B200 series. The B300 will also be equipped with an 800G ConnectX-8 NIC, which has double the bandwidth of the 400G ConnectX-7, supporting larger batch processing and extended sequence lengths, significantly reducing inference costs. Nvidia also plans to redesign its supply chain to improve production efficiency
Nvidia encountered obstacles when launching its first-generation Blackwell B200 series processors due to yield issues and several unverified reports of server overheating. However, according to SemiAnalysis, Nvidia's second-generation Blackwell B300 series processors seem to be on the verge of release. They not only have larger memory capacity but also a 50% performance increase, with a TDP increase of only 200W.
The B300 series processors from Nvidia feature a significantly revised design and will still utilize TSMC's 4NP manufacturing process (a 4nm node optimized for Nvidia, enhancing performance), but reports indicate that their computational performance will be 50% higher than that of the B200 series processors. The performance improvement comes at the cost of a TDP of up to 1,400W, which is only 200W higher than the GB200. SemiAnalysis states that the B300 will be released approximately six months after the B200.
Another significant improvement in the Nvidia B300 series is the use of 12-Hi HBM3E memory stacks, providing 288 GB of memory and 8 TB/s bandwidth. The enhanced memory capacity and higher computational throughput will enable faster training and inference, with inference costs potentially reduced by up to three times, as the B300 can handle larger batch sizes and support extended sequence lengths while addressing latency issues in user interactions.
In addition to higher computational performance and larger memory, Nvidia's second-generation Blackwell machines may also adopt the company's 800G ConnectX-8 NIC. This NIC has double the bandwidth of the current 400G ConnectX-7 and features 48 PCIe lanes, compared to only 32 in its predecessor. This will provide significant improvements in lateral scaling bandwidth for new servers, which is a win for large clusters.
Another major improvement in the B300 and GB300 is that Nvidia is reportedly redesigning the entire supply chain compared to the B200 and GB200. The company will no longer attempt to sell entire reference motherboards or complete server chassis. Instead, Nvidia will only sell the B300 equipped with SXM Puck modules, Grace CPUs, and Axiado Host Management Controllers (HMC). This will allow more companies to participate in the Blackwell supply chain, making Blackwell-based machines more accessible.
With the B300 and GB300, Nvidia will provide its hyperscale and OEM partners with greater freedom to design Blackwell machines, which will impact their pricing and even performance.
Nvidia's Christmas Gift: GB300 and B300
Just six months after the release of the GB200 and B200, they launched a brand new GPU to the market, named GB300 and B300. While it may sound like just an incremental upgrade on the surface, the actual impact far exceeds expectations.
These changes are particularly significant as they include substantial improvements in inference and training performance for inference models. Nvidia has prepared a special Christmas gift for all hyperscale enterprises, especially Amazon, certain players in the supply chain, memory suppliers, and their investors. With the transition to the B300, the entire supply chain is being restructured and transformed, bringing gifts to many winners, but some losers are left with coal.
The B300 GPU is a new tape-out based on TSMC's 4NP process node, meaning it is finely tuned for computing chips. This allows the GPU to deliver 50% more FLOPS at the product level compared to the B200. Part of the performance boost will come from an additional 200W of power, with the TDP of the GB300 and B300 HGX reaching 1.4KW and 1.2KW respectively (compared to the TDP of GB200 and B200 at 1.2KW and 1KW).
The remaining performance improvements will come from architectural enhancements and system-level enhancements, such as power fluctuations between the CPU and GPU. Power fluctuation refers to the dynamic redistribution of power between the CPU and GPU.
In addition to the increase in FLOPS, the memory has been upgraded from 8-Hi to 12-Hi HBM3E, increasing the HBM capacity per GPU to 288GB. However, the pin speed will remain unchanged, so the memory bandwidth remains at 8TB/s per GPU. Note that Samsung is getting coal from Santa, as they will not be able to access the GB200 or GB300 for at least the next nine months.
Moreover, Nvidia's pricing is also quite interesting due to the Christmas atmosphere. This changes the profit margins for Blackwell, but we will discuss pricing and margins later. First, we need to discuss the performance changes.
Built for Inference Model Inference
The improvements in memory are crucial for OpenAI O3 style LLM inference training and inference, as long sequence lengths increase KVCache, thereby limiting key batch sizes and latency.
The following figure shows the improvements in token economics for Nvidia's current generations of GPUs running on 1k input tokens and 19k output tokens, similar to the thought chains in OpenAI's o1 and o3 models. This demonstrative rooftop simulation is run on the FP8 LLAMA 405B, as it is the best public model we can simulate using the H100 and H200 GPUs (the GPUs we have access to)
Upgrading from H100 to H200 is purely an upgrade with larger memory and faster speed, resulting in two effects.
Due to the larger memory bandwidth, the interactivity for all comparable batch sizes has generally increased by 43% (H200 @ 4.8TB/s vs H100 @ 3.35TB/s).
Since the batch size running on H200 is larger than that on H100, it can generate 3 times the tokens per second, thus reducing costs by about 3 times. This difference is mainly because KVCache limits the total batch size.
The dynamic changes brought by larger memory capacity seem to provide disproportionately large benefits. For operators, the performance and economic differences between these two GPUs are much greater than what the specifications suggest on paper:
The user experience of inference models can be poor due to long wait times between requests and responses. If you can provide faster inference times, it will increase users' willingness to use and pay.
The 3 times cost difference is significant. Frankly, hardware that achieves a 3 times performance improvement through mid-generation memory upgrades is insane, much faster than Moore's Law, Huang's Law, or any other hardware improvements we've seen.
We observe that the most powerful and differentiated models can charge higher fees than slightly less capable models. The gross margin for cutting-edge models exceeds 70%, while the profit margin for lagging models is below 20%. Inference models do not have to be a one-way street. Search exists and can be scaled to improve performance, as seen in O1 Pro and O3. This allows smarter models to solve more problems and generate more revenue per GPU.
Of course, Nvidia is not the only company capable of increasing memory capacity. ASICs can do this, and in fact, AMD may be in a favorable position as their memory capacity is higher than Nvidia's. Generally, the memory capacity of MI300X is 192GB, MI325X is 256GB, and MI350X is 288GB... However, Santa Huang has a red-nosed reindeer called NVLink.
As we turn to GB200 NVL72 and GB300 NVL72, the performance and cost of Nvidia-based systems will be significantly enhanced. The key point of using NVL72 in inference is that it allows 72 GPUs to process the same problem with extremely low latency, sharing memory. No other accelerator in the world has full-to-full exchange connectivity. No other accelerator can achieve all reductions through switches.
Nvidia's GB200 NVL72 and GB300 NVL72 are crucial for achieving many key functions
-
Higher interactivity results in lower latency for each thought chain.
-
72 GPUs distribute KVCache to achieve longer thought chains (enhancing intelligence).
-
Compared to a typical 8 GPU server, batch size scaling is more effective, thereby reducing costs.
-
Solving the same problem by searching for more samples can improve accuracy and ultimately enhance model performance.
Therefore, the token economics of NVL72 is over 10 times better, especially for long reasoning chains. KVCache's memory consumption is fatal for economics, but NVL72 is the only way to extend reasoning length to high batches of 100k+ tokens.
Blackwell Supply Chain Redesigned for GB300
With the launch of GB300, the supply chain and content provided by Nvidia have undergone significant changes. For GB200, Nvidia provided the entire Bianca motherboard (including Blackwell GPU, Grace CPU, 512GB LPDDR5X, VRM content, all integrated on one PCB), as well as switch trays and copper backplanes.
For GB300, Nvidia will not provide the entire Bianca motherboard, but will only provide the B300 on the "SXM Puck" module, the Grace CPU on the BGA package, and the HMC from the American startup Axiado instead of Aspeed from GB200.
End customers will now directly procure the remaining components on the compute board, with the second layer of memory being LPCAMM modules instead of soldered LPDDR5X. Micron will be the main supplier of these modules.
The switch trays and copper backplanes remain unchanged, with all these components provided by Nvidia.
Switching to SXM Puck provides opportunities for more OEMs and ODMs to participate in compute trays. Previously, only Wistron and FII could manufacture Bianca compute boards, but now more OEMs and ODMs can manufacture. Wistron is the biggest loser in terms of ODM, as it has lost market share of the Bianca board. For FII, the loss of market share at the Bianca board level is offset by the fact that they are the exclusive manufacturer of the slots relied upon by SXM Puck and SXM Puck. Nvidia is trying to introduce other suppliers for Puck and slots, but they have not placed any other orders yet Another significant shift is in VRM content. While there is some VRM content on the SXM Puck, most of the onboard VRM content will be sourced directly from VRM suppliers by hyperscale manufacturers/OEMs. On October 25th, we sent a note to Core Research subscribers explaining how the B300 is reshaping the supply chain, particularly around Voltage Regulator Modules ("VRM"). We specifically pointed out how single-chip power systems would lose market share due to the shift in business models and which new entrants are gaining market share. Within a month of sending the note to our clients, MPWR dropped over 37% as the market became aware of the facts in our leading research.
Nvidia also offers the 800G ConnectX-8 NIC on the GB300 platform, providing double the lateral scaling bandwidth on InfiniBand and Ethernet. Nvidia recently canceled the ConnectX-8 for the GB200 due to complexities in the time to market and abandoned enabling PCIe Gen 6 on the Bianca board.
The ConnectX-8 has seen significant improvements over the ConnectX-7. It not only has double the bandwidth but also features 48 PCIe lanes (instead of 32 PCIe lanes), supporting unique architectures such as the air-cooled MGX B300A. Additionally, the ConnectX-8 supports SpectrumX, whereas the efficiency required for SpectrumX's Bluefield 3 DPU in the previous 400G generation was much lower.
Impact of GB300 on Hyperscale
The delays of the GB200 and GB300 have implications for hyperscale computing, meaning that starting in the third quarter, many orders will shift to Nvidia's new, more expensive GPUs. As of last week, all hyperscale computing companies have decided to continue using the GB300. Part of the reason is that the performance of the GB300 has improved due to higher FLOPS and larger memory, but another part is that they are able to take control of their own destiny.
Due to challenges in time to market and significant changes in rack, cooling, and power delivery/density, hyperscale enterprises cannot make significant changes to the GB200 at the server level. This has led Meta to abandon all hopes of sourcing NICs from both Broadcom and Nvidia, instead relying entirely on Nvidia. In other cases, such as Google, they have abandoned internal NICs and opted to work solely with Nvidia.
For hyperscale organizations with thousands of people, this is like a nail on a chalkboard; they are used to cost-optimizing everything from CPUs to networks, even screws and metal plates The most shocking example is Amazon, which chose a very suboptimal configuration, resulting in worse TCO compared to the reference design. Due to the use of PCIe switches and the less efficient 200G Elastic Fabric Adaptor NIC (which requires air cooling), Amazon was unable to deploy the NVL72 rack, unlike Meta, Google, Microsoft, Oracle, X.AI, and Coreweave. Because of its internal NIC, Amazon had to use the NVL36, which has a higher cost per GPU due to more content on the backplane and switches. In summary, due to limitations in customization, Amazon's configuration is not optimal.
Now, with the GB300, hyperscale data center operators can customize motherboards, cooling systems, and more. This allows Amazon to build its own custom motherboard, which uses liquid cooling and integrates components that previously used air cooling, such as the Astera Labs PCIe switch. In the third quarter of 2025, the implementation of more liquid-cooled components and eventually HVM on the K2V6 400G NIC means that Amazon can revert to the NVL72 architecture and significantly improve its TCO.
However, there is a significant downside: hyperscale companies must undertake a large amount of design, validation, and confirmation work. This is undoubtedly the most complex platform that hyperscale companies have ever had to design (excluding Google's TPU system). Some hyperscale companies will be able to design quickly, but others with slower teams will fall behind. Overall, despite market cancellation reports, we believe Microsoft is one of the slowest companies to deploy the GB300, as their design speed is too fast, and they are still purchasing some GB200 in the fourth quarter.
As components are drawn from Nvidia's profit pool and shifted to ODMs, the total price paid by customers varies significantly. ODM revenues will be affected, and most importantly, Nvidia's gross margins will also change within a year.
Source: Semiconductor Industry Observation, Original title: "NVIDIA GB 300 Details Exposed, Next-Generation GPU Monster" Risk Warning and Disclaimer
The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk