Cost-cutting aside, having such machines on-premise is better for data security, firms realise
[SINGAPORE] While companies encourage the use of artificial intelligence, they are also grappling with the rising costs of using tokens, the basic building blocks of the language understood by AI models.
In China alone, daily token consumption exceeded 140 trillion in March 2026, up from just 100 billion at the start of 2024.
Globally, token consumption is only going to get higher. Consumers and enterprises are expected to use 120 quadrillion tokens each month between 2026 and 2030, Goldman Sachs said in a report.
Tokens are what AI models use to understand the inputs – the questions asked or tasks they have been assigned to carry out – and to generate the outputs, that is, the content or response.
As it is, AI platforms like OpenAI and Anthropic have moved away from unlimited-use price plans to billing systems that calculate costs based on the number of tokens used.
OpenAI charges US$5 per million tokens for input, and US$30 per million tokens for output for its latest model GPT 5.5. Anthropic charges a similar amount for Claude Opus 4.8, at US$5 per million tokens for input and US$25 per million tokens for output.
And with the arrival of agentic AI, through which AI agents perform tasks autonomously, token usage will only go up.
This adds up to a ballooning bill for companies, despite token costs having fallen from US$10 to US$2.50 a year.
Per-token cost down, but bills are ballooning
Varun Chhabra, senior vice-president of infrastructure and telecom marketing at Dell Technologies, said: “This leads to a paradox, even though the per-token cost is going down substantially, the amount of tokens being generated in an organisation is so high that the overall costs of tokens are going up substantially.”
Within companies that have been using AI, only about 5 per cent of users are sophisticated enough to drive value through its usage, KPMG said in a report.
These users are the ones substantially consuming tokens and have yet to really max out their token usage, said Chhabra.
Currently, most companies use cloud providers to generate tokens, and pay cloud providers like Amazon Web Services and Google Cloud for the computing power.
But with media reports on companies “tokenmaxxing” or making employees use as much AI for their work as possible, costs have gone up and triggered a rethink on how tokens can be more effectively used, said Danny Elmarji, vice-president of pre-sales in Dell Technologies Asia-Pacific, Japan and Greater China.
Companies going beyond just AI for a basic search functionality are becoming increasingly aware of the costs. Elmarji added: “Costs were high when it was just basic search functionality… Now, with agentic systems, it’s a 10-fold increase.”
As token costs go up, businesses will have to consider bringing in their own token factories, which could take the form of a server in a data centre, or a desktop machine placed next to the user.
This effectively moves the costs of tokens off the cloud and onto companies’ premises.
Such “AI token factories” are pieces of infrastructure designed to generate, process and manage AI-driven outputs on an industrial scale, and are optimised for output rather than speed.
Some of these machines run the AI model themselves; others call the model via an application programming interface (API) to keep the data ring-fenced in the machine.
Owning the infrastructure – machines start at US$6,332 – will remove the variable token costs.
Elmarji said: “One of the reasons they are going on-premise is due to the cost and how they can contain and control costs.”
He added that some customers are already looking for data centres to house their token factory servers. This changes the conversation from cost per token to that of the operating expenditure of running the infrastructure, said Chhabra.
This means companies can maximise the usage of their own infrastructure to drive down token costs, rather than be reliant on cloud operators, which are a factor not in their control.
Over a two-year period, having an on-premise token factory could mean an up to 87 per cent cost reduction, noted Chhabra, citing the findings of a study by Signal 65 and Futurum Group.
Token factories for better data security
Data security is another factor behind the move to bring AI to the data rather than the data to AI.
Chhabra said: “Bringing the AI agent closer to the user, or actually bringing it to one device, actually limits the surface area of the agent.” This means that the AI agents are given less exposure to sensitive systems than when data is sent out to the agent, he added.
Even as costs tied to agentic AI continue to balloon, customers in the Asia-Pacific are still investing heavily in it. Elmarji said that it is only when companies start examining the data being fed into AI systems that concerns about data sovereignty pop up.
Companies are still figuring out how to let AI access sensitive data (such as personally identifiable information) without potentially exposing it to the world. It is a pertinent issue, given that data residency regulations bar data from leaving the physical borders of a country.
Elmarji said that most of the customers he works with – ranging from enterprises to governments – are either building or exploring these token factories after weighing the trade-offs between speed and security.
“What’s the cost of securing your intellectual property? Is that worth the cost of trading off speed? Most enterprises have said ‘Absolutely’.
Decoding Asia newsletter: your guide to navigating Asia in a new global order. Sign up here to get Decoding Asia newsletter. Delivered to your inbox. Free.


































































































































































































































































































































































































































































































































































































































































































