Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle

AI-generated keywords: Divide-and-Conquer Principle OnnxRuntime Scalability Inference Batching

AI-generated Key Points

  • The paper addresses the issue of poor scalability of machine learning models when deployed on CPUs.
  • The authors propose a novel approach based on the Divide-and-Conquer Principle to tackle this problem.
  • Instead of allocating all available computing resources to the entire problem, they suggest breaking it into smaller chunks and letting the framework decide how computing resources should be allocated among those chunks.
  • The proposed allocation mechanism is implemented in OnnxRuntime, a popular framework for training and inferencing ML models.
  • The effectiveness of this approach is demonstrated with several use cases, including highly popular models for image processing (PaddleOCR) and NLP tasks (BERT).
  • Section 2 elaborates on various reasons why inference commonly does not scale well on CPUs.
  • In Section 3, the authors describe in detail the concept and implementation details of their proposed Divide-and-Conquer Principle as it applies to inference.
  • Section 4 presents several use cases where this principle can be applied along with performance evaluation results demonstrating its benefits.
  • Their approach allows efficient batching of inference requests of various sizes eliminating the need for padding and letting the framework allocate computing resources proportionally to the length of each sequence.
  • Related work is discussed in Section 5 before concluding in Section 6.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Alex Kogan

License: CC BY-SA 4.0

Abstract: Many popular machine learning models scale poorly when deployed on CPUs. In this paper we explore the reasons why and propose a simple, yet effective approach based on the well-known Divide-and-Conquer Principle to tackle this problem of great practical importance. Given an inference job, instead of using all available computing resources (i.e., CPU cores) for running it, the idea is to break the job into independent parts that can be executed in parallel, each with the number of cores according to its expected computational cost. We implement this idea in the popular OnnxRuntime framework and evaluate its effectiveness with several use cases, including the well-known models for optical character recognition (PaddleOCR) and natural language processing (BERT).

Submitted to arXiv on 12 Jan. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2301.05099v1

In this paper, the authors address the issue of poor scalability of machine learning models when deployed on CPUs. They propose a novel approach based on the Divide-and-Conquer Principle to tackle this problem. Instead of allocating all available computing resources to the entire problem, they suggest breaking it into smaller chunks and letting the framework decide how computing resources should be allocated among those chunks. The authors argue that in many use cases, such a division is natural and requires only trivial changes in user code. The proposed allocation mechanism is implemented in OnnxRuntime, a popular framework for training and inferencing ML models. The inference API is extended to allow user code to invoke parallel inference on multiple inputs. The effectiveness of this approach is demonstrated with several use cases, including highly popular models for image processing (PaddleOCR) and NLP tasks (BERT). In Section 2, the authors elaborate on various reasons why inference commonly does not scale well on CPUs. One reason is that the amount of computation required by a model during inference may not be "enough" for efficient parallelization. In Section 3, they describe in detail the concept and implementation details of their proposed Divide-and-Conquer Principle as it applies to inference. Section 4 presents several use cases where this principle can be applied along with performance evaluation results demonstrating its benefits. For instance, their approach allows efficient batching of inference requests of various sizes eliminating the need for padding and letting the framework allocate computing resources proportionally to the length of each sequence. In Section 5, related work is discussed before concluding in Section 6. Overall, this paper provides an insightful solution to address poor scalability issues faced by machine learning models when deployed on CPUs using a simple yet effective approach based on Divide-and-Conquer Principle.
Created on 15 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.