“You are no longer sending one query and getting one answer. You are running chains of reasoning, tool calls, memory retrievals, and context windows that keep expanding. So even if the rate per token drops, the volume goes up sharply and the net bill stays high or climbs further,” he said.
Further, there are a few specific pressures making this worse. First, the token-to-accuracy ratio in many real-world tasks remains low, which means developers end up spending more tokens just to get a reliable output. “That adds a hidden cost most people do not account for upfront. Second, agentic systems often require a human to review or correct the AI output, which means you are paying for both the AI tokens and the human time. That combined cost can quickly surpass what a straightforward human-led process would have cost. Third, dependence on LLMs makes every workflow subject to the uptime of these systems. Any outage or rate-limiting episode directly disrupts operations, which adds a reliability cost that does not show up in token pricing at all,” said Padmanabhuni.