Nvidia’s Blockbuster Quarter and the Value of ‘Compute’
From the Techne on The Dispatch
Welcome back to Techne! Apparently, the Nile is broken up by six cataracts, regions where the river is filled with small boulders or treacherous whitewaters. The First Cataract is especially important because it served as a natural southern border that protected the region from invasions, separating Egypt from the Kingdom of Kush.
Notes and Quotes
Techne reader Patrick sent me this story about Ethan Thorton, a 19-year-old entrepreneur who landed $85 million from investors to build hydrogen-powered weapons through his startup, Mach. But early tests have raised major safety concerns. In one incident, Thorton had to physically hold down a hydrogen-powered drone at full throttle for launch. Another time, an unexpected ignition of a hydrogen-powered gun sent metal shards flying into an employee.
University of Cambridge researchers have developed a scalable method for producing low-emission concrete. Their approach utilizes massive electric arc furnaces typically used for steel recycling to simultaneously recycle concrete and cement. With concrete currently accounting for 7.5 percent of global human-caused carbon emissions, this dual recycling technology could significantly reduce the construction industry’s environmental impact.
Russia launched a new satellite, Cosmos 2576. Based on initial reports, it looks like an inspector satellite similar to previous Cosmos iterations.
Rumors have swirled that the encrypted messaging app Signal is compromised due to its links to U.S. government funding. But as Luke Hogg and Zach Graves of the Foundation for American Innovation explain, these criticisms largely neglect the fact that the federal government has valid policy reasons for funding tech tools beyond surveillance purposes.
NASA has postponed the launch of Boeing’s Starliner spacecraft again, pushing it back to June 1. The agency previously pushed back its original May 6 launch date amid a series of delays for the crewed mission.
A U.S. cybersecurity official has revealed that attackers have repeatedly tracked people’s physical locations within the U.S. by exploiting vulnerabilities in the global cellular network infrastructure. These attacks have persisted recently, despite major telecoms like AT&T, Verizon, and T-Mobile claiming to have bolstered network security.
What are the “good old days” that people are always talking about? Rather than nostalgia for a specific time in the nation’s history, a Washington Post survey suggests Americans may be nostalgic for the age they were during their formative years. The chart below is telling …
Is This the ‘Compute Era’?
The chip manufacturer Nvidia reported its first-quarter earnings last week—and it was another blockbuster. For the first three months of 2024, Nvidia booked $26 billion in revenue, up 18 percent from the last quarter of 2023 and up 262 percent year over year. Nvidia’s stock price has more than tripled in the past 12 months, sending its valuation just below $3 trillion, making it the third most valuable company by market capitalization.
Nearly all of that revenue (87 percent) came from the data center part of the business, which brought in $22.6 billion. This segment is largely being driven by investments that the big tech companies are making to expand their own services, including Amazon Web Services, Microsoft Azure, Google Cloud, and Oracle Cloud. Nvidia’s most recent earnings results, as noted by its finance chief Colette Kress, indicate about 40 percent of the company’s overall revenue came from data centers powering the artificial intelligence (AI) revolution.
Nvidia has been on a tear because it has managed to capture a significant share of the investment in data centers. Microsoft, Amazon, and Meta have set aside more than $75 billion combined in this kind of AI infrastructure. And these projects are happening worldwide: in France, Germany, Japan, Malaysia, and Mexico, among other countries.
All of the big companies have lots of cash on hand and not many places to park that money. By investing in infrastructure, they can scale their operations to meet the ever-growing demands of their customers while maintaining a competitive edge over their peers.
But perhaps the most interesting part about Nvidia’s ramp-up in stock price is how it underscores the increasing value of compute.
The cost of getting AI technology off the ground.
Compute, as Amazon’s official documentation explains it, is “a generic term used to reference processing power, memory, networking, storage, and other resources required for the computational success of any program.” But more and more, compute is coming to describe the processing power necessary to train and then deploy advanced AI applications like ChatGPT, machine learning, and image recognition.
To simplify things a bit, we can think of the AI workflow as having two parts. First, the model must be trained; then, it can be queried for a prediction. Inputting a prompt drives the AI to make a prediction or an inference for the best next term. Both parts of the process are costly because training compute and inference compute require specialized and expensive hardware to be efficiently calculated.
The numbers are staggering. OpenAI’s GPT-3 model was trained in 2020 and required a computational infrastructure that was equivalent to one of the five largest supercomputers in the world. By way of comparison, a standard laptop would need several millennia of training to build a model with GPT-3 level performance. These models have since been eclipsed by much larger, more costly models. While OpenAI has never been firm about numbers, it’s believed that GPT-3 cost between $4.6 million and $12 million to train. The current model, GPT-4, cost more than $100 million to train.
Not surprisingly, compute takes a lot of energy. As Joseph Polidoro wrote in a great recent piece on the site:
Researchers estimated the now somewhat stale GPT-3, a 175 billion-parameter AI model, to have required almost 1,300 MWh (one MWh equals 1,000 kWh) of electricity, roughly equal to the annual power consumption of 130 homes in the U.S. This is probably much less power than was used to develop GPT-4, which reputedly was trained on 1.8 trillion parameters.
But training is just the first part; inference isn’t cheap either. According to OpenAI’s Sam Altman, ChatGPT’s operating expenses are “eye-watering,” totaling a few cents per chat in total compute costs. The inference cost for each query, multiplied over millions of users, translates into real costs for OpenAI.
If you cobble it all together, how large is it exactly? Here are some of the topline numbers:
In the United States, the electricity consumption by data centers is projected to increase in the next few years, rising from approximately 200 terawatt hours (TWh) in 2022 (around 4 percent of U.S. electricity usage) to nearly 260 TWh by 2026, representing 6 percent of the total electricity demand.
The largest data center hubs are located in California, Texas, and Virginia. Virginia’s economy was especially dominated by the expansion of data centers, attracting 62 percent of all of the state’s new investments in 2021.
In Northern Virginia, developers of data centers have asked local utility Dominion Energy for the generating capacity of multiple nuclear reactors. CEO Bob Blue stated that he frequently handles requests from developers for their planned campuses, which demand “several gigawatts” of electricity.
According to estimates by the International Energy Agency (IEA), there are currently more than 8,000 data centers globally. About a third are located in the United States, 16 percent in Europe, and 10 percent in China.
Data centers consumed 240 to 340 TWh, or 1.0 percent to 1.3 percent of total global energy consumption in 2022, excluding cryptocurrency mining, which consumed an additional 0.4 percent globally.
The economics and politics of compute.
I’ll have more to say about compute down the road, but for now I just want to mention two things: one about the economics of compute and another about its politics.
Nvidia’s dramatic success has pushed all of the major players to get into the chip-making business. OpenAI is looking at creating chips, Apple is designing its own, Meta will deploy its chips later this year, and Google just released the Axion chip. Even the government has plans to back new AI chip development. It’s a throwback to the old days when Silicon Valley actually meant building chips.
In the coming years, compute capabilities are likely to be extended into your computer, your phone, and the rest of your devices. Apple already has what it calls the “Neural Engine” in the iPhone, a separate collection of circuits for running AI. Further developments for more localized on-device compute should be expected, especially as small-form models like Microsoft’s Phi 3 proliferate.
One of the chip architectures that intrigues me is the Tensor Stream Processor (TSP) design that drives the AI infrastructure company Groq. According to one commenter, running the open-source Llama 3 model on Groq was 30 times cheaper than running the query through GPT-4 Turbo. This report in TechRadar explains why Groq’s tech is different:
The company’s Tensor Stream Processor (TSP) is likened to an assembly line, processing data tasks in a sequential, organized manner. In contrast, a GPU is akin to a static workstation, where workers come and go to apply processing steps. The TSP’s efficiency became evident with the rise of Generative AI, leading Ross to rebrand the TSP as the Language Processing Unit (LPU) to increase its recognizability.
Unlike GPUs, LPUs utilize a streamlined approach, eliminating the need for complex scheduling hardware, ensuring consistent latency and throughput. LPUs are also energy efficient, reducing the overhead of managing multiple threads and avoiding underutilization of cores. Groq’s scalable chip design allows multiple TSPs to be linked without traditional bottlenecks, simplifying hardware requirements for large-scale AI models.
Importantly, one of the investors noted on his podcast that these chips aren’t produced at the most advanced chip factories. They are made through a process that is almost 15 years old at this point. So I expect that dedicated AI chips will migrate into devices to alleviate some of the demand for compute. The demand for compute might thus serve as a decentralizing force in tech. Ben Thompson’s recent Stratechery post on “AI Integration and Modularization” offers a deeper look at the forces of centralization and decentralization.
The demand for compute is also bringing the industry headlong into the energy debate. Energy demand is on the rise after plateauing for years, and the demand from data centers is one of the three reasons why. As Cy McGeady at the Center for Strategic & International Studies explained, “There is very strong evidence that a confluence of three trends—reshoring of industry, AI-driven database expansion, and broad-based electrification—will drive a sustained period of electric demand growth.” The growing appetite for computational resources will become a much bigger player in the politics of energy. Since load growth is going to be localized, there are likely to be odd local political fights. More on that in the future, as well as on efforts by companies to economize on compute.
Next week, I am going to take on the challenge posed by Tim Lee, author of the must-read Understanding AI substack, “I really wish there were more economists involved in discussions of the implications of superintelligence.”
Until then,
?? Will
AI Roundup
Researchers Nirit Weiss-Blatt (whom I cited last week), Adam Thierer, and Taylor Barkley just published a paper “The AI Technopanic and Its Effects.” The report “documents the most recent extreme rhetoric around AI, identifies the incentives that motivate it, and explores the effects it has on public policy.”
OpenAI and the Wall Street Journal have struck a deal over content for $250 million. OpenAI also secured a key partnership with Reddit.
Elon Musk has raised another $6 billion in capital to support his new xAI artificial intelligence venture.
I find this strategy of developing autonomous vehicles by Gatik interesting: “Gatik’s progress stems in part from what its vehicles don’t do. Its trucks provide middle-mile delivery services from distribution centers to stores on easy and predictable routes.”
Research and Reports
I’m just now diving into “The Economics of Social Media” from a cadre of researchers, but I imagine I will be using it a lot in coming work since it’s a “guide to the burgeoning literature on the economics of social media” that synthesizes “the main lessons from the empirical economics literature and organize[s] them around the three stages of the life cycle of content: (1) production, (2) distribution, and (3) consumption.”
Finally, useful philosophy: Philosophers are studying Reddit’s “Am I the Asshole?” to understand moral dilemmas experienced in daily life. The full paper is titled “A Large-Scale Investigation of Everyday Moral Dilemmas.” The moral pragmatist in me really enjoys this work.