Bigspin – Claude 4.1, Genie 3, and GPT-5 are compound AI systems, not simply models

We had quite a run of product announcements from the big GenAI players last week: Genie 3 and Claude 4.1 on August 5 and GPT-5 on August 7.

These are all extremely innovative and impressive efforts. However, I feel compelled to say that it is so confusing that we continue to talk about these products as model releases. They do have new models at their heart, but we experience those models only as part of complex, highly engineered systems. Referring to these products as models is likely referring to your car as an engine or your computer as a CPU.

‍

Back in February 2024, a bunch of us (led by Matei Zaharia) wrote a blog post called “The Shift from Models to Compound AI Systems”. Our central observation was that "state-of-the-art AI results are increasingly obtained by compound systems with multiple components, not just monolithic models", and we provided a long list of arguments for why compound systems would lead to faster and more assured progress. On October 1, 2024, Sam Altman similarly predicted “a shift from talking about models to talking about systems”. (I can only assume that he studied our blog carefully.)

‍

In December 2024, I went a step further. In a talk called “Large Language Models Get the Hype, but Compound Systems Are the Future of AI”, I argued that we only ever interact with systems, never models on their own.

‍

On their own, even the most advanced AI models are inert. They just sit there on disk, as extraordinarily expensive configurations of bits. To get them to do anything at all, you need at least two more pieces: a prompt and a sampling procedure. The prompt will put the model into an interesting state, and the sampling procedure will control what it generates. Both choices will be extremely significant for the performance of the overall system.

‍

I think we all intuitively know that prompts matter. For a fixed model, a stellar prompt could lead to state-of-the-art results and a subpar prompt could lead to mediocre results or worse. Our “Compound AI Systems” blog documents a number of high-profile instances of exactly this. Recently, I have noticed that companies are reporting that complex late-stage reinforcement learning techniques are proving transformative. This is surely true, but those results too depend on having outstanding prompts. If you are ever forced to choose between having great prompts and having lots of fancy RL algorithms, you should certainly choose the great prompts.

‍

Sampling procedures are more often overlooked, but they are just as critical. The sampling procedure is separate from the model. In effect, it is how we make the model express itself. The sampling techniques themselves have grown incredibly sophisticated. Next time a model generates a large block of well-formed computer code for you, you might quietly acknowledge the sampling procedure – the unsung hero of that event. It’s almost certain that the underlying model was not going to get there on its own.

‍

So the minimal system is prompt + model + sampling procedure. Getting them to work together is non-trivial and very significant. And of course present-day systems often also include databases, Web services, calculators, and other tools, and they may even have the capacity to create and run their own tools on the fly. These are transparently systems in the fullest sense. Here again, having a great model is not sufficient for a well engineered system – the model has to be configured (by prompts and sampling procedures) to make good use of all these tools. By the same token, a relatively weak model might shine if it is perfectly situated inside the larger system.

All of the products released last week are extraordinarily well-engineered compound systems. I should say that last week was also an incredible week for model releases. You can use the GPT-5 model and the Claude 4.1 model via their APIs as building blocks. In addition, Qwen-Image appeared on Hugging Face on August 4, and GPT-OSS appeared there on August 5. More generally, this summer has seen more open-weights models appearing on Hugging Face than ever before. These are incredible components. At Bigspin, we are eager to help you use them to build your own well-engineered compound AI systems.

Chris Potts

Co-Founder & Chief Scientist

How product designers can lean into and learn from friction in the age of GenAI

LLM Models

Claude 4.1, Genie 3, and GPT-5 are compound AI systems, not simply models

other posts

How product designers can lean into and learn from friction in the age of GenAI

The PB&J Problem