On-Device Custom Wake Word Detection

Capabilities

What you get

Custom phrase tuning

Single words, brand names, or multi-word activations — tuned for your phrase, language, and target devices.

Designed for real-world audio

Tuned against hard negatives and noisy mobile conditions, with thresholds adjustable for your product.

Tunable threshold

Trade precision for recall with one confidence setting tuned to your product.

Battery-aware

Gated by on-device voice activity detection, so it sips power while idle.

Fully on-device

No microphone audio ever leaves the device. No network required.

Live demo · runs in your browser

Start demo and say "Hey Assistant"

Your audio never leaves this page. The model highlights "Hey Assistant" while ignoring all other speech — including similar-sounding words and phrases like "Hey sister" or "Assist me".

Open the full demo →

"Hey Assistant"

Start speaking…

Threshold 0.90

Overview

A wake word that never leaves the device

A wake-word detector is the always-listening trigger that brings your app to attention. VoxRT runs that detector on-device, so it stays responsive without a network connection and never streams microphone audio to the cloud — the privacy posture enterprise buyers expect. The published SDK and reference model are free for commercial use with no per-user fees; custom wake phrases are tuned per customer as a paid engagement.

Models are trained on synthetic data and heavy augmentation, so they hold up against background noise, distance from the mic, and a wide range of accents. A lightweight voice-activity gate keeps power draw low between activations.

How it works

Reference model free, your phrase commercial

The SDK ships a reference wake-phrase model trained 100% on synthetic data. When you need your own phrase, we train a custom model on it and tune the operating point for your acoustic environment.

# your wake word
wake_word:
  phrase: "Hey Vox"
  threshold: 0.9

# at runtime
"...hey vox, what's next" → {
  detected: true,
  confidence: 0.97
}

The runtime

The runtime is the product

VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on constrained, low-power hardware.

Custom Wake Word is one product on that runtime, alongside VAD and streaming ASR. All three share the same Rust runtime crate and NEON kernel set — the runtime is the product; the models are what it runs.

Performance

Fast enough to leave always-on

0.015

real-time factor on iPhone 13 Pro Max — ~150 µs / 10 ms frame

~1.5%

of one core during continuous listening

0.021

real-time factor on a Snapdragon 662 (2020 budget Android)

~48K

parameters — tiny enough for microcontrollers

Model quality

Measured on a hard test set

5,240 positive utterances + 6,416 hard negatives (isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri", phonetic neighbours, arbitrary speech, non-speech audio), all speakers disjoint from train and validation. ROC AUC 0.9966, average precision 0.9899.

Threshold	Precision	Recall	F1	FPR	False positives
0.5	0.864	0.995	0.925	12.8%	822 / 6,416
0.85	0.957	0.987	0.972	3.7%	234 / 6,416
0.9 (default)	0.993	0.982	0.987	0.5%	34 / 6,416
0.95	0.997	0.769	0.868	0.2%	12 / 6,416

Footprint

From ~600 KB in your app

Android net APK impact (arm64, incl. model)~600 KB
iOS app-binary delta after extraction + dead-code elimination2–3 MB
Swift wrapper source (one file)~7 KB
Native xcframework, compressed (device + simulator)~19 MB
Wake-phrase model (fp16)~100 KB

Supported devices

Where it runs

The same ~100 KB wake-phrase model runs behind native mobile SDKs and a Linux C-ABI library. Audio is 16 kHz mono in, and nothing ever leaves the device.

Platform	Minimum OS	CPU architectures	Distribution
iOS	iOS 16.0	arm64 (device); arm64 + x86_64 (simulator)	Swift Package Manager
Android	Android 8.0 (API 26)	arm64-v8a (NEON); x86_64 (emulator)	JitPack / Gradle
Linux	glibc 2.17+ · Ubuntu 18.04+, Debian 10+, RHEL 7+, Raspberry Pi OS	aarch64	Shared library (.so) + Python, Node.js, Go, C/C++, Rust

On Linux, the wake word is validated across common Arm single-board and edge computers — Raspberry Pi 3 / 4 / 5 and Pi Zero 2 W, NVIDIA Jetson Nano and Orin Nano, AWS Graviton, and boards such as Rock Pi, Orange Pi and Khadas (Cortex-A53 / A55). It ships as a single ~480 KB shared library with wrappers for Python, Node.js, Go, C/C++ and Rust. A WebAssembly build (@voxrt/wake-word-browser on npm) is also available for in-browser demos and prototypes.

FAQ

Custom wake words, answered

Does a wake word send audio to the cloud?

No. VoxRT's wake-word detector runs entirely on the device — microphone audio is processed locally and never leaves the phone, and detection works with no network connection at all.

Can I use my own wake phrase?

Yes — brand names, single words, or multi-word activations, in any language. The SDK ships a free reference model ("Hey Assistant"); tell us your phrase and target devices and we train and tune a custom model for your acoustic environment.

How much battery does always-on listening use?

About 1.5% of one CPU core on an iPhone 13 Pro Max and ~2% on a 2020 budget Android (Snapdragon 662) — and the detector is gated by voice activity detection, so it idles when no one is speaking.

What platforms does it run on?

iOS 16+ and Android 8.0+ (API 26) as native Swift Package and Gradle/JitPack modules, plus Linux aarch64 (glibc 2.17+ — Raspberry Pi, NVIDIA Jetson, AWS Graviton) with Python, Node.js, Go, C/C++ and Rust wrappers. A WebAssembly build is available for in-browser demos.

Is the VoxRT wake word free for commercial use?

Yes — the published SDK and reference wake-word model are free to use in commercial production apps: wrappers are open source under Apache-2.0, with no per-user or per-device fees. When you need your own phrase, custom wake-word models are a paid engagement — trained, tuned, and supported for your phrase, language, and target devices. See licensing for the full picture.

Explore

On-device custom wake word detection