AI Infrastructure’s Second Act: Environments, Evals, Experience
The “real” work of AI Infrastructure is just beginning
AI Infrastructure is in the midst of one of the most dynamic transformations we’ve seen in decades, marked by both creative destruction and fierce debate as to where the future is heading. About a year ago, my team and I at Bessemer published our AI Infrastructure roadmap to help make sense of this rapid evolution. We’ve witnessed extraordinary innovation across the ecosystem and categories within AI Infrastructure have attracted billions in venture capital investment. The landscape has already seen IPOs and blockbuster M&A, signaling how quickly some startups have scaled within parts of the stack. And yet, despite this breathtaking progress, I believe we’re still only in Act One for AI Infrastructure, and that there’s a more consequential Act Two up ahead.
OpenAI’s Shunyu Yao offers a thoughtful perspective on how we arrived at this moment. In his essay “The Second Half”, he explains how the first phase of AI progress — characterized by breakthroughs such as backpropagation, convolutional networks, and transformers — has largely focused on advances in algorithms and methods. Today’s AI infrastructure landscape reflects this zeitgeist, with the rise of infra giants in areas such as foundation models, compute, and data labeling.
But what lies ahead could be more profound. As Shunyu articulates:
So what comes next? The second half of AI — starting now — will shift focus from solving problems to defining problems. In this new era, evaluation becomes more important than training. Instead of just asking, “Can we train a model to solve X?”, we’re asking, “What should we be training AI to do, and how do we measure real progress?”

I’m already observing that the agenda for state-of-the-art AI research is tilting away from improving abstract algorithms toward enabling AI to interface effectively and purposefully with reality. This is bringing reinforcement learning back into the spotlight. At the same time, as demonstrated by Walmart’s recent announcement, enterprises are maturing their use of AI from proof-of-concepts into customer-facing production deployments. As these tailwinds accelerate, an Act Two for AI Infrastructure is unfolding as we enter an “Era of Experience” (chart above), where infrastructure innovations are purpose-built to ground AI in an operational context for real-world utility. Here are some key frontiers that I’ve seen taking shape:
Environments: If Act One was about human-generated datasets and annotation, Act Two goes one step further into interactive techniques to support real-world learning. Examples here include high-fidelity task curation and scalable generation of reinforcement learning environments/gyms.
Evals: Act Two requires a fundamental re-thinking of evals since real-world parameters are very different from benchmark climbing on leaderboards. I’m already seeing the emergence of new innovations around LLM-as-a-judge, continuous methods, and novel setups/proprietary frameworks.
Systems: Act Two centers on system design rather than models as the primitive. Infrastructure is advancing to support production-quality compound AI systems and agentic workflows across functional “experience” components such as memory (especially for long-horizon tasks), knowledge retrieval, reasoning, and inference optimization.
The first arc of AI progress (and corresponding dominant AI infrastructure paradigms) largely echoes Sutton’s “Bitter Lesson”, which puts forth that the most enduring advances in AI have come not from human-crafted knowledge or domain-specific heuristics, but from leveraging compute and general learning algorithms. While this lesson serves as a reminder that there are still many “unknown unknowns” as to which ideas will prove most effective in our quest to scalably embed context, understanding, and expertise into AI systems, I’m excited to see what breakthroughs emerge in Act Two when the “real” work of AI Infrastructure begins.

