IRCNF

AI PCs with dedicated NPUs are finally in consumers' hands — here's what the chips actually do

Share:
AI PCs with dedicated NPUs are finally in consumers' hands — here's what the chips actually do

The term "AI PC" was first whispered at CES 2024 and promptly shouted from every laptop announcement that followed. By the end of that year, it had joined "4K display" and "all-day battery" as marketing language so ubiquitous it had lost most of its meaning. Every laptop with a Copilot button became an AI PC. Chips with neural processing units — dedicated silicon for accelerating machine learning inference — became the checkbox that justified the label.

Two years on, it's worth stepping back from the marketing and asking what these NPUs actually do, whether the dedicated hardware matters, and whether the AI PC inflection point has actually arrived or just been declared.

Apple set the template

Before there was an "AI PC" category, there was Apple Silicon. The M1 chip, launched in November 2020, included a 16-core Neural Engine alongside its CPU and GPU. Apple has been shipping Neural Engines in iPhones since the A11 Bionic in 2017 — the iPhone X generation — making on-device machine learning inference a native iOS capability years before it became a Windows talking point.

The Neural Engine in Apple Silicon handles Face ID, computational photography (night mode, portrait mode, photonic engine), real-time transcription in Notes, and — more recently — Apple Intelligence features like writing tools and image generation in Image Playground. All of it runs locally, without a cloud call, with low latency and no privacy exposure. The Neural Engine's 38 TOPS (trillion operations per second) in M4 is what makes these features feel instant rather than sluggish.

This is the benchmark against which Windows PC NPUs are measured, and it's a useful one: Apple didn't ship Neural Engine hardware and then figure out what to do with it. The features and the silicon shipped together.

Qualcomm's Snapdragon X moment

The most significant Windows-side development of 2024 was the Qualcomm Snapdragon X Elite — the first Windows on Arm processor to compete seriously with x86 on performance while matching Apple Silicon on battery life. Critically, it includes a 45 TOPS NPU, exceeding Microsoft's 40 TOPS requirement for "Copilot+ PC" certification.

The Snapdragon X Elite's NPU runs Windows Studio Effects — the suite of background blur, eye contact correction, and noise suppression features built into Windows 11. It handles real-time transcription in Windows' Live Captions feature, with offline speech-to-text that works on any audio, any app, without sending audio to the cloud. Cocreator in Microsoft Paint generates images locally using a compressed SDXL model. These are real features, running in real time, on the dedicated neural silicon.

The x86 side caught up quickly. Intel's Core Ultra Meteor Lake chips (late 2023) included an NPU for the first time in Intel's history, rated at 10-34 TOPS depending on variant. Arrow Lake (late 2024) improved this. AMD's Ryzen AI series brought NPUs to AMD's mobile lineup. The Copilot+ PC certification requirement effectively mandated NPU hardware across the industry.

What works today

The honest accounting of what NPU-accelerated features work in practice is shorter than the marketing suggests, but genuinely useful. Windows Studio Effects — background blur, auto-framing, eye contact correction during video calls — run smoothly on NPU hardware without taxing the CPU or GPU. For remote workers on video calls all day, this matters.

Live Captions provides real-time transcription across system audio — any video, any meeting, any application — with reasonable accuracy for English and growing support for other languages. It's the most universally useful AI PC feature for a wide range of users, and it's genuinely better when offloaded to an NPU.

Local LLM inference via tools like Ollama and llama.cpp runs on NPU hardware when the framework supports it. Models like Phi-3 Mini, Llama 3.2 3B, and Gemma 2 2B run usably fast on modern NPUs — not as fast as on a discrete GPU, but without the power consumption and without needing the cloud. For developers who want local AI inference for privacy or offline reasons, NPU-class chips are a meaningful improvement over CPU-only inference.

The fragmentation problem

The biggest practical obstacle to NPU adoption is API fragmentation. Qualcomm's NPU uses its QNN (Qualcomm Neural Network) SDK. Intel's NPU uses OpenVINO and DirectML. AMD's uses ROCm and DirectML. Apple's Neural Engine uses Core ML. None of these are interoperable.

Microsoft's DirectML is the closest thing to a unified Windows API for neural acceleration, but hardware vendors have been slow to expose their full NPU capabilities through it. Application developers have to decide whether to write NPU-specific code for each hardware vendor, rely on DirectML (which may not use the NPU at all on some platforms), or just run on the GPU and ignore the NPU entirely. Most third-party applications choose the last option.

The result is that the NPU usage you see in Windows' Task Manager is almost entirely from Microsoft's own features. Open a third-party video conferencing app instead of Teams or Windows native apps, and that NPU is idle while the GPU or CPU handles the background blur.

Microsoft Recall and the privacy reckoning

The most controversial proposed AI PC feature — Microsoft Recall, which takes periodic screenshots of everything you do on your PC and makes it searchable via natural language — required NPU-class hardware and was initially a Copilot+ exclusive. After significant privacy criticism, Microsoft delayed and redesigned it, adding opt-in requirements, local encryption, and Windows Hello authentication before access.

Recall's troubled launch illustrated a fundamental tension in AI PC marketing: the most ambitious "AI features" involve processing sensitive data continuously. The promise of on-device processing for privacy is real, but only if users trust that the locally-processed data stays local — which requires verifiable design choices, not marketing assertions.

Is it actually a new era?

IDC projects that 60% of PCs shipped in 2025 meet the AI PC specification. That's real hardware saturation. Whether the software ecosystem catches up is the open question. The Microsoft-controlled features work. The ecosystem beyond Microsoft is still figuring out how to use the silicon.

The comparison to Apple Silicon is instructive here too: Apple's Neural Engine features are tightly integrated because Apple controls both the chip design and the OS and the primary applications. The Windows ecosystem's fragmentation — between Microsoft, OEM hardware variation, and third-party developers — makes equivalent integration structurally harder. NPU hardware is necessary but not sufficient for an AI PC that feels as coherent as an M4 MacBook. The software layer is the remaining work.

Share:
AI PCs with dedicated NPUs are finally in consumers' hands — | IRCNF - Intelligent Reliable Custom Next-gen Frameworks