Google is working on a new initiative to make its artificial intelligence chips better at running PyTorch, the world’s most widely used AI software framework, in a move aimed at weakening Nvidia's longstanding dominance of the AI computing market, according to people familiar with the matter. The effort is part of Google's aggressive plan to make its Tensor Processing Units a viable alternative to Nvidia's market-leading GPUs. TPU sales have become a crucial growth engine of Google's cloud revenue as it seeks to prove to investors that its AI investments are generating returns. But hardware alone is not enough to spur adoption. The new initiative, known internally as “TorchTPU,” aims to remove a key barrier that has slowed adoption of TPU chips by making them fully compatible and developer-friendly for customers who have already built their tech infrastructure using PyTorch software, the sources said. Google is also considering open-sourcing parts of the software to speed uptake among customers, some of the people said. Compared with earlier attempts to support PyTorch on TPUs, Google has devoted more organizational focus, resources and strategic importance to TorchTPU, as demand grows from companies that want to adopt the chips but view the software stack as a bottleneck, the sources said. PyTorch, an open-source project heavily supported by Meta Platforms, is one of the most widely used tools for developers who make AI models. In Silicon Valley, very few developers write every line of code that chips from Nvidia, Advanced Micro Devices or Google will actually execute. Instead, those developers rely on tools like PyTorch, which is a collection of pre-written code libraries and frameworks that automate many common tasks in developing AI software. Originally released in 2016, PyTorch’s history has been closely tied to Nvidia’s development of CUDA, the software that some Wall Street analysts regard as the company’s strongest shield against competitors. Nvidia’s engineers have spent years ensuring that software developed with PyTorch runs as fast and efficiently as possible on its chips. Google, by contrast, has long had its internal armies of software developers use a different code framework called Jax, and its TPU chips use a tool called XLA to make that code run efficiently. Much of Google’s own AI software stack and performance optimization has been built around Jax, widening the gap between how Google uses its chips and how customers want to use them. A Google Cloud spokesperson did not comment on the specifics of the project, but confirmed to Reuters that the move would provide customers with choice. "We are seeing massive, accelerating demand for both our TPU and GPU infrastructure," the spokesperson said. "Our focus is providing the flexibility and scale developers need, regardless of the hardware they choose to build on."

TPU FOR CUSTOMERS Alphabet had long reserved the lion’s share of its own chips, or TPUs, for in-house use only. That changed in 2022, when Google’s cloud computing unit successfully lobbied to oversee the group that sells TPUs. The move drastically increased Google Cloud's allocation of TPUs and as customers' interest in AI has grown, Google has sought to capitalize by ramping up production and sales of TPUs to external customers. But the mismatch between the PyTorch frameworks used by most of the world’s AI developers and the Jax frameworks that Google’s chips are currently most finely tuned to run means that most developers cannot easily adopt Google’s chips and get them to perform as well as Nvidia’s without undertaking significant, extra engineering work. Such work takes time and money in the fast-paced AI race.