Microsoft AI CEO Mustafa Suleyman says the next chapter of artificial intelligence will be defined by compute costs, not model intelligence. Taking to X, Suleyman argued that inference compute scarcity, rather than building the smartest AI, will determine winners over the next two to three years.
For several years, the emphasis was on training larger foundation models. For 2026, the challenge is to serve those models to millions of users in real-time. Deloitte’s TMT Predictions 2026 report states that inference workloads currently account for about two-thirds of total AI compute spending.
The lead times for GPUs are now almost a year, with high-bandwidth memory out of stock through 2026. Of the 16 GW of data centre capacity planned for 2026 globally, only 5 GW is currently under construction.
Suleyman’s flywheel reasoning prioritises high-margin products such as Microsoft 365 Copilot, enterprise legal software, and healthcare SaaS. These can afford premium inference costs, lower latency, increase user retention, build proprietary data, and improve model tuning.
This loop helps increase adoption and revenue, giving a compound effect. According to Microsoft, paid Copilot seats have reached 15 million in Q2 FY2026, a growth of 160% y/y.
Cash-constrained AI companies and consumer apps may not have sufficient premium inference, and this may affect them. They may not have the luxury of paying for tokens, and this may affect responses and user retention. The flywheel may not turn. It may be argued that intelligence for a dollar or open source may help companies cope, but Suleyman’s focus is on size and financial prowess. Microsoft is investing more than $80 billion annually in AI infrastructure.