NXP Semiconductors N.V. recently released its eIQ Machine Learning (ML) software support for Glow neural network (NN) compiler — delivering the industry’s first NN compiler implementation for higher performance with a low-memory footprint on NXP’s i.MX RT crossover MCUs.
As developed by Facebook, Glow can integrate target-specific optimizations, and NXP leveraged this ability using NN operator libraries for Arm Cortex-M cores and the Cadence Tensilica HiFi 4 DSP, maximizing the inferencing performance of its i.MX RT685 and i.MX RT1050 and RT1060.
MCU architectural features
In May 2018, Facebook, the pioneer of PyTorch, introduced Glow (the Graph Lowering NN compiler) as an open-source community project, with the goal of providing optimizations to accelerate neural network performance on a range of hardware platforms.
As an NN compiler, Glow takes in an unoptimized neural network and generates highly optimized code. This differs from the typical neural network model processing whereby a just-in-time compilation is leveraged, which demands more performance and adds memory overhead. Directly running an optimized code, like that possible with Glow, greatly reduces the processing and memory requirements.
NXP has also taken an active role within the Glow open source community to help drive broad acceptance of new Glow features.
“The standard, out-of-the-box version of Glow from GitHub is device agnostic to give users the flexibility to compile neural network models for basic architectures of interest, including the Arm Cortex-A and Cortex-M cores, as well as RISC-V architectures,” said Dwarak Rajagopal, software engineering manager at Facebook.
He added: “By using purpose-built software libraries that exploit the compute elements of their MCUs and delivering a 2-3x performance increase, NXP has demonstrated the wide-ranging benefits of using the Glow NN compiler for machine learning applications, from high-end cloud-based machines to low-cost embedded platforms.”
Optimized machine learning
The demand for ML applications is expected to increase significantly in the years ahead. TIRIAS Research forecasts that 98% of all edge devices will use some form of machine learning/artificial intelligence by 2025.
Based on market projections, 18 to 25 billion devices are expected to include ML capabilities, even without dedicated ML accelerators, in that time frame. Consumer device manufacturers and embedded IoT developers will need optimized ML frameworks for low-power edge embedded applications using MCUs.
“NXP is driving the enablement of machine learning capabilities on edge devices, leveraging the robust capabilities of our highly integrated i.MX application processors and high performance i.MX RT crossover MCUs with our eIQ ML software framework,” said Ron Martino, senior VP and GM, NXP Semiconductors. “The addition of Glow support for our i.MX RT series of crossover MCUs allows our customers to compile deep neural network models and give their applications a competitive advantage.”
NXP’s edge intelligence environment solution for ML is a comprehensive toolkit that provides the building blocks that developers need to efficiently implement ML in edge devices. With the merging of Glow into eIQ software, ML developers will now have a comprehensive, high-performance framework that is scalable across NXP’s edge processing solutions that include the i.MX RT crossover MCUs and i.MX 8 application processors.
Customers will be better equipped to develop ML voice applications, object recognition, and facial recognition, among other applications, on i.MX RT MCUs and i.MX application processors.
eIQ now includes inferencing support for both Glow and TensorFlow Lite, for which NXP routinely performs benchmarking activities to measure performance. MCU benchmarks include standard NN models, such as CIFAR-10. Using a CIFAR-10 model as an example, the benchmark data acquired by NXP shows how to leverage the performance advantage of the i.MX RT1060 device (with 600MHz Arm Cortex-M7), i.MX RT1170 device (with 1GHz Arm Cortex-M7), and i.MX RT685 device (with 600 MHz Cadence Tensilica HiFi 4 DSP).
NXP’s enablement for Glow is tightly coupled with the Neural Network Library (NNLib) that Cadence provides for its Tensilica HiFi 4 DSP delivering 4.8GMACs of performance. In the same CIFAR-10 example, NXP implementation of Glow achieves a 25x performance advantage by using this DSP to accelerate the NN operations.
“NXP’s inclusion of the Arm CMSIS-NN software library in elQ is designed to maximize the performance and minimize the memory footprint of neural networks on Arm Cortex-M cores,” said Dennis Laudick, VP marketing, Machine Learning at Arm. “Using a CIFAR-10 neural network model as an example, NXP is able to achieve a 1.8x performance advantage with CMSIS-NN. Other NN models should yield similar results, clearly demonstrating the benefits of this advanced compiler and our optimized NN operator library.”
Filed Under: News