SRAM Optimized Porting and Execution of Machine Learning Classifiers on MCU-based IoT Devices: Demo Abstract

Document Type


Subject Area(s)

Artificial Intelligence, Internet of Things


With the introduction of edge analytics, IoT devices are becoming smarter and ready for AI applications. However, any increase in the training data results in a linear increase in the space complexity of the trained Machine Learning (ML) models, which means they cannot be deployed on IoT devices that have limited memory. To alleviate such memory issues, we recently proposed an SRAM-optimized classifier porting, stitching, and efficient deployment method in. This is currently the most resource-friendly approach that enables large classifiers to be comfortably executed on microcontroller unit (MCU) based IoT devices, and perform ultrafast classifications (1-4x times faster than state-of-the-art libraries) while consuming 0 bytes of SRAM.

In this demo, realizing our recent SRAM-optimized approach, we port and execute 7 dataset-trained classifiers on 7 popular MCUs, and report their inference performance. It is apparent from the demo results that realizing our approach makes even the slowest Atmega328P MCU perform faster unit inference than a NVIDIA Jetson Nano GPU and Raspberry Pi 4 CPU.

Digital Object Identifier (DOI)

APA Citation

Sudharsan, B., Patel, P., Breslin, J. G., & Ali, M. I. (2021). SRAM optimized porting and execution of machine learning classifiers on MCU-based IoT devices: Demo abstract. Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems, 223–224.