On the AI boom and Nvidia’s current hardware dominance
It is very hard these days to miss the frenzy surrounding all things AI, and Nvidia is by far the company benefiting the most from the current boom. The firm’s margin in the second quarter of this year (Q2 FY2024 in financial speak) was 70 per cent, and even taken in the historical context of the semiconductor industry, it’s a huge number. In fact, it’s an even bigger margin than what famed semiconductor juggernaut Intel managed in the heydays of its near monopoly on x86 datacenter CPUs (general purpose processors).
Nvidia’s very high margin this last quarter is the sign of a clearly overheated market for datacenter GPUs (AI accelerators, to grossly simplify), where demand far outstrips supply and where prices are consequentially severely inflated. What is more, the delivery time for Nvidia’s latest and greatest, the H100 GPU, is hovering around 40 weeks.
After the overblown hype that we have witnessed these past years following the advent of 5G and autonomous vehicles, and putting aside the seemingly utter madness that constitute cryptocurrencies, asking whether or not the current mania surrounding AI amount to a bubble would seem like a rather legitimate question. This is of course not to say that 5G, autonomous vehicles and AI are useless or irrelevant technologies. They are not. And they will certainly change the world profoundly. But that doesn’t necessarily prevent their respective hype cycle from leading to unrealistic expectations, and potentially result in market disruptions (like an unforeseen decrease in demand), and maybe even financial loss for some over-ambitious start-ups.
Disruptions may be inevitable
The fact that demand for AI compute may somehow subside once the initial enthusiasm recedes is probably not outside the realm of possibility. First of all, Artificial Intelligence is hard. Reaching satisfactory results requires much more time, effort, and scarce financial and human capital than what was required to run a successful online business during the dot com bubble at the start of the century. Some companies may underestimate these costs and difficulties, while some others still seem to be fooling around with dubious results.
Then there is the matter of the datasets (think of it as the raw material – made of data – from which AI models are built). No matter how much compute you throw at the problem, your end result will only be as good as your starting dataset. Depending on the application, some companies may underestimate the time and effort required to build one that is good enough.
Compounding these factors are the huge costs currently associated with AI compute. Many businesses are currently confined to the cloud, either for the flexibility it provides, or because they balk at the huge capital expenditures presently necessary to build their own capacity. But the cloud can be famously expensive, especially when using already scarce and overpriced AI hardware. Once they discover how hard it can be to achieve actually good results, and taking into account the huge costs currently associated with AI compute, some companies may simply scale back their current AI ventures in the current quarters.
Finally, China is yet another wild card in this broader market. The current ever-tightening US sanctions on the country push many Chinese companies to buy as much Nvidia hardware as they can, as fast as they can, and sometimes no matter the price. This situation results in overinflated prices, but it is obviously a temporary one.
Three broad categories of users
To better understand the matter at hand, we will simplify things by focusing only on businesses that buy datacenter GPUs by the tens of thousands, and by segmenting this market into three different and somewhat overlapping categories: first the Cloud Service Providers (Amazon’s AWS, Microsoft, Google, Oracle and so on), who provide compute capacity to their paying customers, then the Internet Giants (Google, Meta, Microsoft, etc.), who need this capacity for their own internal needs, and finally a few ambitious and extremely well funded start-ups, namely OpenAi, Inflection and Anthropic. For more information on these “GPU-rich” actors, Dylan Patel from Semianalysis has an excellent write-up on the subject, as usual.
OpenAi, the creator of ChatGPT, doesn’t need an introduction at this point. Originally a non-profit, it has recently created a for-profit subsidiary. Inflection, for its part, is trying to build some kind of AI-powered digital companion that individuals would pay for. And finally, Anthropic specializes on research to help make the use of AI safer.
For the sake of exhaustivity, one could argue that a fourth category exists, comprising all kind of other actors (for instance in the petrochemical and pharmaceutical sectors) that need to buy tens of thousands of datacenter GPUs to help them build their end products. But in the interest of simplification, these businesses can be considered as outliers. One such prominent outlier is the car manufacturer Tesla. In any case, the automaker is clearly an exception, as it is building its homegrown Dojo AI supercomputer to supplement the GPUs it buys from Nvidia.
Ultimately, the distinction into these distinct categories is a glaring oversimplification, but the idea at this point is to provide the reader with an accessible framework to better understand the current state of affairs.
The crux of the matter here is that the Cloud Service Providers (CSPs) have a huge number of different paying customers, each with their own distinct workloads and circumstances, and it would seem rather unlikely that the demand from all these disparate customers would suddenly decline at the same time. The idea is the same for the Internet Giants: they have an immense internal need for datacenter GPUs to process the vast trove of data they operate on, and it seems difficult to imagine a future where this need abruptly disappears.
Let’s now examine our third category of “GPU-rich” companies: the start-ups. These businesses mostly have a single purpose, and they are responsible for a remarkable share of the current demand for datacenter GPUs, especially relative to their size. These corporations offer the most serious potential for a sudden contraction of the demand for Nvidia’s GPUs.
Predicting whether or not these start-ups will be successful is frankly beyond the scope of this article, but one should note that OpenAI is already making money from many different customers, whereas Inflection and Anthropic seem rather more like long shots. And taking into account the overinflated expectations the market has systematically generated over the past two decades regarding technological disruptions, it seems reasonable to harbor a healthy dose of skepticism. In any case, the idea here is to better understand where the risk of a future slackening of demand may reside, to help better quantify it.
The cushioning role of the CSPs and the Internet Giants
Back to the first two categories of buyers of datacenter GPUs: they have a structural capacity to absorb some hypothetical future demand-side shock, either due to the huge number and diversity of their customers (in the case of the Cloud Service Providers), or due to their sheer size (in the case of the Internet Giants). For some of these companies (Google, Microsoft), their dual roles would allow them to cope even more efficiently: any unused Cloud GPU could then be repurposed for internal use, or vice and versa.
In case of a serious market correction in the coming quarters, we may however witness some serious internal readjustments among the Internet Giants, and, for instance, a reshuffling of resource allocation among the different teams inside these very large companies, as the environment evolves and priorities shift.
As for the start-ups, even in the worst case, a bankruptcy would simply result in a sudden influx of second hand datacenter GPUs flooding the market. At that point, the CSPs and Internet Giants, fulfilling again their cushioning role, would then be certainly more than happy to snatch up such precious hardware at bargain price.
However, a slackening in the demand for GPU compute would necessitate from the part of these giant companies a prolonged period of “digestion”. That would in turn severely impede Nvidia’s capacity to keep on selling such huge quantities of datacenter GPUs in the coming years, especially as a new generation of hardware is on the horizon. A successor to the current Nvidia H100 GPU is indeed expected in the 2024-2025 time frame. In other words, the party may not last forever for Nvidia, and demand for its next generation accelerators may be lower in the future than what we are seeing right now, especially if broader sentiment significantly shifts in the market.
A broader look at the market for AI accelerators
What is more, even though Nvidia is by far the company that benefits the most from the current AI boom, it is not alone in the market for AI accelerators. AMD is hot on their heels with their brand new and innovative MI300 family of products. More importantly, its accompanying software ecosystem (called ROCm) slowly but surely catches on with Nvidia’s famed Cuda software tools. As a matter of fact, Nvidia is famous for having more software engineers than hardware engineers, and this constitutes one of the very key to its current success.
Then there is Intel, hard at work trying to correct its past errors. Even though these past mistakes have been mostly – but not exclusively – related to its semiconductor manufacturing activities, the company has recently reset and delayed its GPU roadmap for what can arguably be described as the third time in the last two decades. However, its Falcon Shores family of products is now expected before the year 2025 is over, and just like AMD, the company has patiently built in the past years its obligatory accompanying software ecosystem, called oneAPI.
But that’s not all, as beyond these three incumbent players, there is a variety of hardware start-ups hungry for success, the most prominent of which is Cerebras with its very innovative Wafer Scale Engine. In a remarkable achievement, Cerebras has recently won a $100M contract from Abu Dhabi based Group 42 for its second generation product. In the same vein, SambaNova and Tenstorrent are two other prominent competitors worth mentioning.
Competitors and customers enter the fray
However, beyond Nvidia’s competitors, it may very well be Nvidia’s own customers that will push the hardest for the company to reduce its prices, and thus bring back its gross margin to “reasonable” territory. They will do so either by adopting Nvidia’s competitors’ products, or, more remarkably, by deciding to build their own.
Beyond the newfound structural importance of the CSPs and Internet Giants in the broader semiconductor market, the other paradigm shift in the industry these past years has been the ability of many of these companies to simply build their own products, instead of buying what the hardware vendors have to sell. For example Apple with their A series of processors for iPhones and M series for laptops, AWS with its Graviton CPUs, Trainium AI accelerators and in house SSD controllers, Google with its TPU AI accelerators, and Tesla with its homegrown Dojo AI supercomputer.
This has been made possible by the emergence this past decade of a broader ecosystem of many different companies offering distinct services to help with this endeavor: for instance businesses like Cadence and Synopsis offer the necessary intellectual property, Marvel, Broadcom and Global Unichip Corp offer design services, and TSMC, Samsung (and very soon Intel) offer third party manufacturing.
To summarize, a significant decrease in the demand for AI hardware in the coming quarters seems pretty unlikely at this point. What could happen however is that after the current manic phase comes a prolonged period of hardware digestion. That would lead to subdued demand for next generation accelerators, as the Cloud Service Providers and the Internet Giants reduce their current levels of capital expenditures in this regard.
More importantly, the very structure of the semiconductor industry has an important role to play in this matter. The CSPs and the Internet Juggernauts, by their sheer size and diversity, may be able to cushion any future sudden decrease in demand. But they also weigh very heavily on the industry in one other fascinating way: instead of buying their future hardware, they may simply build it on their own, thanks to a new ecosystem of companies dedicated to making such a complex task achievable.
That would allow them to recover a bigger part of the value created by the AI revolution, instead of letting Nvidia get away with such a big part of the cake. For now, Nvidia’s party is in full swing. But nothing lasts forever, and a 70 per cent gross margin seems to be simply unsustainable in the long run.