VOXReality review of Once for All: Training One Network for Efficient Deployment in AI and Machine Learning

July 13, 2023

The article “VOXReality Review of Once for All: Training One Network for Efficient Deployment in AI and Machine Learning” serves as a comprehensive summary of the original research conducted by Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han from the esteemed Massachusetts Institute of Technology (MIT) and MIT-IBM Watson AI Lab.

In recent years, advancements in technology, machine learning, and artificial intelligence have revolutionised various fields. One notable development in this area is the Once for All (OFA) approach, introduced in the paper “Once for All: Train One Network and Specialize it for Efficient Deployment.” This approach aims to train a single neural network and then specialise it for efficient deployment across different platforms and tasks.

In this article, we will explore the concept of OFA, its applications, and its potential impact on the field of machine learning and artificial intelligence.

What is Once for All (OFA)?

The Once for All (OFA) approach is a novel technique that involves training a single neural network and then specialising it for efficient deployment. Traditional approaches require training multiple networks for different tasks and platforms, which can be time-consuming and resource-intensive.

OFA addresses this challenge by training a large “super-network” that contains multiple sub networks, each tailored to a specific task or platform. By using a combination of network pruning and architecture search techniques, OFA allows for the efficient adaptation of a single network to various deployment scenarios.

Figure 1: Left: a single once-for-all network is trained to support versatile architectural configurations including depth, width, kernel size, and resolution. Given a deployment scenario, a specialized subnetwork is directly selected from the once-for-all network without training. Middle: this approach reduces the cost of specialized deep learning deployment from O(N) to O(1). Right: once-for-all network followed by model selection can derive many accuracy-latency trade-offs by training only once, compared to conventional methods that require repeated training.

Technical background

The Once for All approach optimises several key aspects of the network, including kernel size, depth, and number of channels to strike a balance between performance and efficiency. To optimise the network architecture, the authors propose a differentiable architecture search space that allows for efficient exploration of various network configurations.

This search space encompasses a wide range of kernel sizes, depths, and channel configurations, enabling the network to adapt to different deployment scenarios. By training a large “super-network” that contains multiple sub-networks, OFA leverages network pruning techniques to determine the optimal combination of kernel sizes, depths, and channels for each specialised sub-network.

In detail, a resource-aware training algorithm that takes into account the computational cost and latency constraints of different deployment platforms is introduced. By incorporating resource constraints during the training process, the resulting specialised sub-networks are not only efficient but also tailored to the specific requirements of each platform. In terms of kernel size, the OFA approach explores a range of kernel sizes for convolutional layers, including 1×1, 3×3, and 5×5.

This allows the network to adapt to different receptive field sizes and capture both local and global features effectively. OFA explores a wide range of depths, from shallow to deep architectures. This flexibility enables the network to strike a balance between model complexity and computational efficiency, depending on the deployment scenario and task requirements.

In terms of channels, the OFA approach optimises the number of channels in each layer to achieve an optimal trade-off between model capacity and computational efficiency. By dynamically adjusting the number of channels, the network can adapt to different levels of feature representation and information flow.

Figure 2: Comparison with SOTA hardware-aware NAS methods on Pixel1 phone. OFA decouples model training from neural architecture search. The search cost and training cost both stay constant as the number of deployment scenarios grows. “#25” denotes the specialized sub-networks are fine-tuned for 25 epochs after grabbing weights from the once- for-all network. “CO2e” denotes CO2 emission which is calculated based on Strubell et al. (2019). AWS cost is calculated based on the price of on-demand P3.16xlarge instances.

Key benefits of OFA

The Once for All (OFA) approach offers several benefits that make it an attractive option for efficient deployment in machine learning and artificial intelligence:

Simplicity and Efficiency: OFA simplifies the training process by eliminating the need to train multiple networks for different tasks and platforms. This leads to significant time and resource savings.
Flexibility: OFA enables the adaptation of a single network to different deployment scenarios, such as mobile devices, data centres, or edge devices. This flexibility allows for more efficient resource utilisation and improved performance.
State-of-the-art Performance: Despite its efficiency, OFA achieves state-of-the-art performance across various tasks and platforms. This makes it a promising approach for real-world applications.
Adaptability: The OFA approach can be applied to a wide range of machine learning tasks, including image classification, object detection, and natural language processing. This adaptability makes it a versatile tool for researchers and practitioners.

Applications of OFA

The Once for All (OFA) approach has gained significant attention and adoption in the machine learning and artificial intelligence community. Some notable applications and achievements include:

SONY Neural Architecture Search Library: SONY has adopted the OFA Network in its Neural Architecture Search Library, highlighting its potential for efficient deployment in real-world applications.
ADI MAX78000/MAX78002 Model Training and Synthesis Tool: ADI has also adopted the OFA
Network in its Model Training and Synthesis Tool, further demonstrating its effectiveness in
specialised domains.
Alibaba’s MLPerf Inference Benchmark: OFA ranked first in the open division of the MLPerf Inference Benchmark, conducted by Alibaba. This achievement showcases its performance and efficiency in data center and edge computing scenarios.
CVPR Low-Power Computer Vision Challenge: OFA secured first place in the CVPR Low-Power Computer Vision Challenge, both in the CPU detection and FPGA track. This success highlights its potential for low-power and resource-constrained environments.

Conclusion

The Once for All (OFA) approach presents a groundbreaking solution for training one network and specialising it for efficient deployment in machine learning and artificial intelligence. By eliminating the need for training multiple networks, OFA simplifies the process, improves resource utilisation, and achieves state-of-the-art performance. Its wide range of applications and notable achievements in various domains further validate its potential impact. As the field of machine learning and artificial intelligence continues to advance, the Once for All approach holds great promise for driving innovation and efficiency in the deployment of neural networks.

Stefanos Biliousis

Hello! My name is Stefanos Biliousis and I'm a computer vision researcher with a passion for exploring the latest advances in artificial intelligence. With a background in machine learning and computer science, I spend my days developing innovative algorithms and techniques for image and video analysis. I'm fascinated by the many ways that computer vision and AI are revolutionising the world around us.

VOXReality