November 27, 2024
Executive Summary
The emergence of generative artificial intelligence (GenAI) has catalyzed a surge of capital expenditures (capex) directed at building out AI infrastructure. Big tech companies are on pace to spend over $200 billion on capex this year and even more in 2025. A significant portion of this will be directed to AI-related equipment, including semiconductors, data centers and the tools that power them. As investors attempt to forecast the durability of AI spending, some critical debates have emerged. In our view, the two most important factors that will determine the durability of AI spending trends are (1) the ability of scaling laws to continue to advance large language models and (2) the return on investment (ROI) from AI-related spending. This white paper will focus on AI scaling laws, including the constraints and opportunities to drive scaling. Part 2 of this series will focus on the ROI of AI investments.
Scaling Laws
Scaling laws describe how the performance of a deep learning model is influenced by certain variables, such as the size of the dataset and computational resources. Like Moore’s Law, which observes the advancement of semiconductors, scaling laws are not physical laws of the universe, but rather empirical laws based on observed data and patterns. Model scaling is critical for investors to appreciate because scaling is highly correlated with demand for high-performance chips, specifically graphics processing units (GPUs). If AI models fail to demonstrate improvement with additional data and compute, then the incentive to continue building out AI infrastructure may falter. Also like Moore’s Law and the development of leading edge semiconductors, novel techniques beyond existing strategies will be necessary to further advance model scaling.
For much of the last few years, the basic assumption that models get smarter by adding more data and compute has proven true. As a result, technology firms have been in an arms race to acquire GPUs and build bigger models. One timely example is Elon Musk’s AI startup, xAI, whose new supercomputer, named Colossus, contains a cluster of 100,000 Nvidia Hopper GPUs. The belief that scaling laws could be maintained led some to extrapolate recent trends and drew anticipation that we would soon achieve artificial general intelligence (AGI), a level of AI that would match or surpass human cognitive capabilities across a wide range of domains.
More recently, debate has surfaced as to whether LLM capabilities are plateauing. It has been reported by some that AI companies, like OpenAI, Google and Anthropic, have hit roadblocks with their latest generation of LLMs. This has renewed interest in potential scaling constraints as well as innovative techniques that could unlock the next leg of model advancement.
“It’s become increasingly difficult to find new, untapped sources of high-quality, human-made training data that can be used to build more advanced AI systems… At the same time, even modest improvements may not be enough to justify the tremendous costs associated with building and operating new models, or to live up to the expectations that come with branding a product as a major upgrade.”
– Bloomberg, November 13, 2024
Constraints
Much of the research on scaling laws today is focused on pre-training, the process of training a model on a massive dataset of general information before being fine-tuned for a specific task. Some have drawn the analogy of teaching a child how to identify letters and words before they are able to read a book. To-date, pre-training has been performed predominantly on text data, much of which is sourced from the internet. The quantity and quality of data has been identified as a potential constraint to pre-training scaling. There is evidence that, at a certain point, pumping more data into a model is subject to the law of diminishing returns. In other words, the incremental improvement in performance decreases once a dataset reaches a certain size. There are a few reasons for this. For one, as datasets grow larger, they often contain redundant information. Additionally, larger models are prone to “overfitting”, which causes models to become too specialized to training data and struggle when presented with new queries. Furthermore, LLMs today are largely trained on human-generated text data, which, while vast, is not infinite.
“If you just put in more compute, you put in more data, you make the model bigger — there are diminishing returns. In order to keep the scaling laws going, in order to keep the rate of progress increasing, we also need new ideas… But that’s pretraining. The methodology around post-training, I would say, is quite immature and has a lot of room left to improve.”
– Robert Nishihara, co-founder, Anyscale, in an interview with TechCrunch
Opportunities
Despite these constraints, there are several dynamics unfolding that provide fuel for scaling laws to take another leap forward.
Pre-training. At the pre-training phase, the AI industry has been leaning on synthetic datasets, created by artificial intelligence, to augment real-world data. In addition to expanding datasets, synthetic data can better simulate “edge cases”, which are rare scenarios that are more difficult to capture in real-world data. There is evidence that synthetic data can greatly advance the training of LLMs.
There is growing interest in pre-training models on alternative modalities, such as image, audio and video data. Multimodal large language models (MLLMs) require significantly larger datasets than text trained LLMs.
Post-training. There are also new techniques being adopted at the post-training phase which are driving model improvements. Post-training, which refers to the optimization of a model after its initial training phase, originally relied on human feedback, known as reinforcement learning human feedback (RLHF). A relatively new technique, reinforcement learning from AI feedback (RLAIF), incorporates AI feedback and synthetic data to assist in scaling. RLAIF has demonstrated several advantages to RLHF, including speed, consistency and cost-effectiveness. While more cost effective relative to human labor, RLAIF still requires significantly more compute resources than RLHF. The future of post-training is likely to involve a combination of RLAIF and RLHF, which will remain valuable for aligning AI models with human values and preferences.
Test-time scaling. Microsoft CEO Satya Nadella stated onstage at the recent Microsoft Ignite event that "we are seeing the emergence of a new scaling law." Nadella was referring to the latest form of AI model scaling known as test-time scaling (or inference-time scaling). Test-time scaling is best exemplified through OpenAI’s latest “o1” series of models. o1 has been designed to spend more time thinking during the inference phase before responding. Specifically, o1 utilizes chain of thought reasoning to breakdown complex problems into simpler steps, and then learns to refine its problem solving strategies through reinforcement learning. A distinct feature of o1 models is that it shifts more computational resources towards the post-training and inference phases, which has proven to improve model accuracy.
Source: OpenAI
"Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.”
– OpenAI, Learning to Reason with LLMs
NVIDIA projects that inference compute will experience scaling laws driven from multimodality and test-time scaling.
Source: NVIDIA
“Now we have multimodality foundation models and the amount of petabytes of video that these foundation models are going to be trained on is incredible. And so my expectation is that for the foreseeable future, we're going to be scaling pre-training, post-training as well as inference-time scaling which is the reason why I think we're going to need more and more compute..."
– Jensen Huang, CEO, NVIDIA, Q3 2025 Earnings Call
Conclusion
While there is some evidence that that quality and quantity of data serves as a constraint to scaling for traditional text-based LLMs, there are multiple vectors for continued model scaling across pre-training (synthetic data and multimodalities), post-training (RLAIF) and test-time scaling (as exemplified by OpenAI o1) which suggests AI companies will continue to invest in R&D and push AI innovation forward. While scaling laws are a critical factor in the durability of AI spending, they do not guarantee commercial success. Ultimately, use cases with profound economic appeal must be widely adopted in order to justify the investment. Part 2 of this white paper series will review the current and potential return on investment from AI spending.