Home / Technology / Tech News / Hybrid agentic inference is coming soon to Perplexity Computer: What is it

Hybrid agentic inference is coming soon to Perplexity Computer: What is it

According to Perplexity, its upcoming hybrid AI system can automatically route tasks between on-device and cloud models, aiming to improve privacy, efficiency, and performance

Perplexity has announced a hybrid AI system that can split workloads between on-device and cloud-based models. (Image: Perplexity)

Sweta Kumari New Delhi

4 min read Last Updated : Jun 03 2026 | 2:53 PM IST

Listen to This Article

As AI models become more capable, companies are looking for ways to balance performance, privacy, and the rising cost of compute. While cloud-based models offer greater processing power, they require data to be sent to remote servers. On-device AI can keep information local, but is often constrained by hardware limitations. Determining which workloads should run locally and which should be handled in the cloud has emerged as what the industry increasingly describes as an "orchestration problem."

To address this, Perplexity has announced a new feature called "hybrid agentic inference" for its Personal Computer platform. The system is designed to automatically split workloads between models running on a user's device and more powerful models in the cloud. According to the company, the approach can keep sensitive data local while reserving cloud computing resources for tasks that require greater processing power.

What is an orchestration problem?

An orchestration problem is the challenge of deciding which AI model should do which part of a task, where it should run, and when. In Perplexity's case, imagine you're asking an AI to analyse your bank statement and create a financial summary.

Some parts of the task involve sensitive personal data that should ideally stay on your laptop, while other parts may require the reasoning power of a larger cloud-based AI model. The orchestration problem is figuring out how to split the work between the local and cloud models efficiently.

Also Read

Nvidia's Vera CPU signals rise of computing infra built for agentic AI era

Apple Design Award winner announced, ahead of WWDC 2026

Apple Design Awards 2026 winners announced ahead of WWDC 2026: Check list

From Scout to Majorana 2: Highlights from Build 2026

Windows to AI models and Solara: Everything Microsoft announced at Build

Nifty IT snaps 3-day rally, tumbles 6%; TCS, LTM tank up to 9%; here's why

What is Anthropic's Mythos AI model and why India's inclusion matters?

What is hybrid agentic inference?

Perplexity's hybrid agentic inference system is designed to automatically determine where AI tasks should be processed. A compact model running on a user's device handles sensitive information and decides whether certain data should remain local, while more demanding tasks can be routed to powerful AI models in the cloud.

The company mentioned that the approach is particularly useful for tasks involving personal information such as financial records, health data, and private documents. Rather than requiring users to manually choose between local and cloud processing, the system makes those decisions automatically for each request.

Why Perplexity is pushing local AI

The announcement comes as AI companies increasingly explore running models directly on consumer devices. Improvements in processors, graphics chips, and dedicated AI hardware have made it possible to perform a growing number of AI tasks locally rather than relying entirely on cloud infrastructure.

Perplexity argues that keeping more workloads on-device can improve privacy and reduce the amount of computing power required from remote servers. The company stated that its hybrid approach allows local and cloud models to work together, with each handling the tasks best suited to its capabilities.

The company said, "People would rather own a data centre in their laptop than build one they don't control." Perplexity is arguing that modern PCs are becoming powerful enough to handle a growing share of AI workloads locally. This gives users greater control over their data, reduces the need to send sensitive information to remote servers, and lessens dependence on large centralised data centres operated by technology companies.

Partnership and support for other hardware

Perplexity unveiled the technology alongside Intel and said the system is designed to work across multiple hardware platforms. The company also highlighted support for NVIDIA's RTX Spark platform, adding that its orchestration layer is model-agnostic and can operate across different AI chips and local computing environments.

The move reflects a broader industry trend toward AI-capable PCs and devices. As more hardware gains the ability to run AI models locally, companies are looking for ways to combine on-device processing with cloud-based AI services in a seamless manner.

ALSO READ: Windows to AI models and Solara: Everything Microsoft announced at Build

How it compares with rival approaches

Perplexity's announcement follows a wider industry push toward hybrid AI systems. Apple uses a combination of on-device processing and its Private Cloud Compute infrastructure through Apple Intelligence, while Google offers Gemini Nano for local AI tasks alongside larger cloud-based Gemini models. Microsoft is also expanding on-device AI capabilities with its new Aion model family.

Perplexity said that its approach differs by automatically coordinating local and cloud models within a single workflow. Instead of requiring users or developers to decide where tasks should run, the system determines the most appropriate location for each part of a request.

ALSO READ: AI agents set to become centre of users' digital lives: Qualcomm CEO

When will it be available?

According to Perplexity, a personal computer with local inference support will begin rolling out in July. The company has not yet shared details about hardware requirements, supported devices, or whether the feature will be available to all users at launch.