With the help of ultra-low latency, the system processes requests as fast as it receives them.
"Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users," said Doug Burger, an engineer at Microsoft, in a blog post late on Tuesday.
The 'Project Brainwave' uses the massive field-programmable gate array (FPGA) infrastructure that Microsoft has been deploying over the past few years.
"By attaching high-performance FPGAs directly to our datacentre network, we can serve DNNs as hardware microservices, where a DNN can be mapped to a pool of remote FPGAs and called by a server with no software in the loop," Burger said.
He added that the system architecture reduces latency, since the CPU does not need to process incoming requests, and allows very high throughput, with the FPGA processing requests as fast as the network can stream them.
The system has been architected to yield high actual performance across a wide range of complex models, with batch-free execution.
Microsoft claimed that the system, designed for real-time AI, can handle complex, memory-intensive models such as Long Short Term Memories (LSTM), without using batching to juice throughput.
"Project Brainwave achieves unprecedented levels of demonstrated real-time AI performance on extremely challenging models. As we tune the system over the next few quarters, we expect significant further performance improvements," Burger noted.
Microsoft is also planning to bring the real-time AI system to users in Azure.
"With the 'Project Brainwave' system incorporated at scale and available to our customers, Microsoft Azure will have industry-leading capabilities for real-time AI," Burger noted.