On-device custom wake word detection
Wake your app from a custom phrase or word — running entirely on the device. No cloud, no streamed microphone audio, single-digit-millisecond inference even on budget Android.
What you get
Custom phrase tuning
Single words, brand names, or multi-word activations — tuned for your phrase, language, and target devices.
Designed for real-world audio
Tuned against hard negatives and noisy mobile conditions, with thresholds adjustable for your product.
Tunable threshold
Trade precision for recall with one confidence setting tuned to your product.
Battery-aware
Gated by on-device voice activity detection, so it sips power while idle.
Fully on-device
No microphone audio ever leaves the device. No network required.
Start demo and say "Hey Assistant"
Your audio never leaves this page. The model highlights "Hey Assistant" while ignoring all other speech — including similar-sounding words and phrases like "Hey sister" or "Assist me".
"Hey Assistant"
A wake word that never leaves the device
A wake-word detector is the always-listening trigger that brings your app to attention. VoxRT runs that detector on-device, so it stays responsive without a network connection and never streams microphone audio to the cloud — the privacy posture enterprise buyers expect. The published SDK and reference model are free for commercial use with no per-user fees; custom wake phrases are tuned per customer as a paid engagement.
Models are trained on synthetic data and heavy augmentation, so they hold up against background noise, distance from the mic, and a wide range of accents. A lightweight voice-activity gate keeps power draw low between activations.
Reference model free, your phrase commercial
The SDK ships a reference wake-phrase model trained 100% on synthetic data. When you need your own phrase, we train a custom model on it and tune the operating point for your acoustic environment.
# your wake word wake_word: phrase: "Hey Vox" threshold: 0.9 # at runtime "...hey vox, what's next" → { detected: true, confidence: 0.97 }
The runtime is the product
VoxRT is a from-scratch inference runtime for on-device speech models — no ONNX Runtime, no PyTorch Mobile, no LiteRT. It's a custom Rust core sized and tuned for streaming voice workloads on constrained, low-power hardware.
Custom Wake Word is one product on that runtime, alongside VAD and streaming ASR. All three share the same Rust runtime crate and NEON kernel set — the runtime is the product; the models are what it runs.
Fast enough to leave always-on
Measured on a hard test set
5,240 positive utterances + 6,416 hard negatives (isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri", phonetic neighbours, arbitrary speech, non-speech audio), all speakers disjoint from train and validation. ROC AUC 0.9966, average precision 0.9899.
| Threshold | Precision | Recall | F1 | FPR | False positives |
|---|---|---|---|---|---|
| 0.5 | 0.864 | 0.995 | 0.925 | 12.8% | 822 / 6,416 |
| 0.85 | 0.957 | 0.987 | 0.972 | 3.7% | 234 / 6,416 |
| 0.9 (default) | 0.993 | 0.982 | 0.987 | 0.5% | 34 / 6,416 |
| 0.95 | 0.997 | 0.769 | 0.868 | 0.2% | 12 / 6,416 |
From ~600 KB in your app
- Android net APK impact (arm64, incl. model)~600 KB
- iOS app-binary delta after extraction + dead-code elimination2–3 MB
- Swift wrapper source (one file)~7 KB
- Native xcframework, compressed (device + simulator)~19 MB
- Wake-phrase model (fp16)~100 KB
Where it runs
The same ~100 KB wake-phrase model runs behind native mobile SDKs and a Linux C-ABI library. Audio is 16 kHz mono in, and nothing ever leaves the device.
| Platform | Minimum OS | CPU architectures | Distribution |
|---|---|---|---|
| iOS | iOS 16.0 | arm64 (device); arm64 + x86_64 (simulator) | Swift Package Manager |
| Android | Android 8.0 (API 26) | arm64-v8a (NEON); x86_64 (emulator) | JitPack / Gradle |
| Linux | glibc 2.17+ · Ubuntu 18.04+, Debian 10+, RHEL 7+, Raspberry Pi OS | aarch64 | Shared library (.so) + Python, Node.js, Go, C/C++, Rust |
On Linux, the wake word is validated across common Arm single-board and edge computers — Raspberry Pi 3 / 4 / 5 and Pi Zero 2 W, NVIDIA Jetson Nano and Orin Nano, AWS Graviton, and boards such as Rock Pi, Orange Pi and Khadas (Cortex-A53 / A55). It ships as a single ~480 KB shared library with wrappers for Python, Node.js, Go, C/C++ and Rust. A WebAssembly build (@voxrt/wake-word-browser on npm) is also available for in-browser demos and prototypes.
Custom wake words, answered
Does a wake word send audio to the cloud?
No. VoxRT's wake-word detector runs entirely on the device — microphone audio is processed locally and never leaves the phone, and detection works with no network connection at all.
Can I use my own wake phrase?
Yes — brand names, single words, or multi-word activations, in any language. The SDK ships a free reference model ("Hey Assistant"); tell us your phrase and target devices and we train and tune a custom model for your acoustic environment.
How much battery does always-on listening use?
About 1.5% of one CPU core on an iPhone 13 Pro Max and ~2% on a 2020 budget Android (Snapdragon 662) — and the detector is gated by voice activity detection, so it idles when no one is speaking.
What platforms does it run on?
iOS 16+ and Android 8.0+ (API 26) as native Swift Package and Gradle/JitPack modules, plus Linux aarch64 (glibc 2.17+ — Raspberry Pi, NVIDIA Jetson, AWS Graviton) with Python, Node.js, Go, C/C++ and Rust wrappers. A WebAssembly build is available for in-browser demos.
Is the VoxRT wake word free for commercial use?
Yes — the published SDK and reference wake-word model are free to use in commercial production apps: wrappers are open source under Apache-2.0, with no per-user or per-device fees. When you need your own phrase, custom wake-word models are a paid engagement — trained, tuned, and supported for your phrase, language, and target devices. See licensing for the full picture.