Document Type


Subject Area(s)

Artificial Intelligence, Internet of Things, Edge AI


Every modern household owns at least a dozen of IoT devices like smart speakers, video doorbells, smartwatches, where most of them are equipped with a Keyword spotting(KWS) system-based digital voice assistant like Alexa. The state-of-the-art KWS systems require a large number of operations, higher computation, memory resources to show top performance. In this paper, in contrast to existing resource-demanding KWS systems, we propose a light-weight temporal convolution based KWS system named OWSNet, that can comfortably execute on a variety of IoT devices around us and can accurately spot multiple keywords in real-time without disturbing the device's routine functionalities.

When OWSNet is deployed on consumer IoT devices placed in the workplace, home, etc., in addition to spotting wake/trigger words like `Hey Siri', `Alexa', it can also accurately spot offensive words in real-time. If regular wake words are spotted, it activates the voice assistant; else if offensive words are spotted, it starts to capture and stream audio data to speech analytics APIs for autonomous threat and insecurities detection in the scene. The evaluation results show that the OWSNet is faster than state-of-the-art models as it produced ~ 1-74 times faster inference on Raspberry Pi 4 and ~ 1-12 times faster inference on NVIDIA Jetson Nano. In this paper, to optimize IoT use-case models like OWSNet, we present a generic multi-component ML model optimization sequence that can reduce the memory and computation demands of a wide range of ML models thus enabling their execution on low resource, cost, power IoT devices.

APA Citation

Sudharsan, B., Malik, S., Corcoran, P., Patel, P., Breslin, J. G., & Ali, M. I. (2021, April 23). OWSNet: Towards a real-time offensive words spotting network for consumer IoT devices. IEEE 7th World Forum on Internet of Things.