Learning Compiler Pass Orders using Coreset and Normalized Value Prediction

AI-generated keywords: Compilation Pass Sequence Code-Size Reduction Graph Neural Network (GNN) Submodular Function

AI-generated Key Points

Proposed pipeline for finding program-dependent pass sequences to optimize code-size reduction tasks
Optimal pass sequence can significantly reduce program size and/or improve program efficiency
Prior works on compilation pass ordering have two major drawbacks: excessive budget or fail to generalize to unseen programs
Pipeline identifies a coreset of 50 pass sequences via greedy optimization of a submodular function
Learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting normalized values of the pass sequences in the coreset
Outperforms default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets
Proposed technique transforms raw action space into a small one with denser rewards and improves existing human-designed compiler flags
Hyperparameters such as temperature T, number of layers, embedding dimension, and output dimension are searched over for each method
Best model is selected based on validation results using mean squared loss
Effective approach for optimizing code-size reduction tasks that outperforms existing techniques while being simple and efficient

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Youwei Liang, Kevin Stone, Ali Shameli, Chris Cummins, Mostafa Elhoushi, Jiadong Guo, Benoit Steiner, Xiaomeng Yang, Pengtao Xie, Hugh Leather, Yuandong Tian

arXiv: 2301.05104v2 - DOI (cs.PL)

License: CC BY 4.0

Abstract: Finding the optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency. Prior works on compilation pass ordering have two major drawbacks. They either require an excessive budget (in terms of compilation steps) at compile time or fail to generalize to unseen programs. In this paper, for code-size reduction tasks, we propose a novel pipeline to find program-dependent pass sequences within 45 compilation calls. It first identifies a coreset of 50 pass sequences via greedy optimization of a submodular function, and then learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting the normalized values of the pass sequences in the coreset. Despite its simplicity, our pipeline outperforms the default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. In comparison, previous approaches like reinforcement learning on the raw pass sequence space may take days to train due to sparse reward, and may not generalize well in held-out ones from different domains. Our results demonstrate that existing human-designed compiler flags can be improved with a simple yet effective technique that transforms the raw action space into a small one with denser rewards.

Submitted to arXiv on 09 Jan. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2301.05104v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

This paper proposes a novel pipeline for finding program-dependent pass sequences to optimize code-size reduction tasks. The optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency. However, prior works on compilation pass ordering have two major drawbacks: they either require an excessive budget (in terms of compilation steps) at compile time or fail to generalize to unseen programs. The proposed pipeline first identifies a coreset of 50 pass sequences via greedy optimization of a submodular function. Then, it learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting the normalized values of the pass sequences in the coreset. Despite its simplicity, this pipeline outperforms the default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. Previous approaches like reinforcement learning on the raw pass sequence space may take days to train due to sparse reward and may not generalize well in held-out ones from different domains. In contrast, this proposed technique transforms the raw action space into a small one with denser rewards and improves existing human-designed compiler flags. In experiments, hyper-parameters such as temperature T in Eq. 3, number of layers, embedding dimension of node/edge features, and output dimension in hidden layers in MLPs are searched over for each method. The best model is selected based on validation results using mean squared loss. Overall, this paper presents an effective approach for optimizing code-size reduction tasks that outperforms existing techniques while being simple and efficient.

- Proposed pipeline for finding program-dependent pass sequences to optimize code-size reduction tasks
- Optimal pass sequence can significantly reduce program size and/or improve program efficiency
- Prior works on compilation pass ordering have two major drawbacks: excessive budget or fail to generalize to unseen programs
- Pipeline identifies a coreset of 50 pass sequences via greedy optimization of a submodular function
- Learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting normalized values of the pass sequences in the coreset
- Outperforms default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets
- Proposed technique transforms raw action space into a small one with denser rewards and improves existing human-designed compiler flags
- Hyperparameters such as temperature T, number of layers, embedding dimension, and output dimension are searched over for each method
- Best model is selected based on validation results using mean squared loss
- Effective approach for optimizing code-size reduction tasks that outperforms existing techniques while being simple and efficient

There is a way to make computer programs smaller and faster by using a special order of steps called a pass sequence. Some people have tried to figure out the best order, but they either use too much time or only work for certain programs. A new way has been created that finds 50 good pass sequences quickly and then uses a special computer program to pick the best one. This new way works better than other ways and makes programs an average of 4.7% better. People also made changes to how the computer program works so it can do this job even better. They tested different settings until they found the best one. Definitions- Proposed: suggested or put forward for consideration - Pipeline: a series of connected steps or processes - Pass sequence: an order of steps used in optimizing code-size reduction tasks - Optimization: making something as good as possible - Generalize: apply to a wider range of situations or cases - Core set: a small group chosen from a larger group based on certain criteria - Greedy optimization: choosing the option that seems best at each step without considering long-term effects - Submodular function: a mathematical function that measures how much adding another item will increase its value - Policy: a set of rules guiding decisions - Graph Neural Network (GNN): a type of machine learning algorithm used for analyzing graph data - Outperforms: does better than - Unseen programs/repositories/domains/datasets: ones that haven

Optimizing Code-Size Reduction Tasks with a Novel Pipeline

The optimization of program-dependent pass sequences for code-size reduction tasks is an important task in software engineering. Finding the optimal pass sequence can lead to significant reductions in program size and/or improvements in program efficiency. However, prior works on compilation pass ordering have two major drawbacks: they either require an excessive budget (in terms of compilation steps) at compile time or fail to generalize to unseen programs. In this paper, researchers propose a novel pipeline that addresses these issues and outperforms existing techniques while being simple and efficient. The proposed pipeline first identifies a coreset of 50 pass sequences via greedy optimization of a submodular function. Then, it learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting the normalized values of the pass sequences in the coreset. Despite its simplicity, this pipeline outperforms the default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets.

Coreset Selection

The coreset selection process begins with generating all possible combinations of passes within each dataset using greedy optimization on submodular functions which are known to be effective for set cover problems like this one. This produces many more than 50 passes but only those that are most likely to produce good results are kept as part of the coreset; thus reducing computational costs associated with training and inference time when searching for optimal solutions later on during testing phase.

Graph Neural Network Policy Learning

Once the coresets have been identified, GNNs are used to learn policies that predict normalized values for each pass sequence within them based on their features such as number of instructions executed or memory usage etc., These predictions help determine which ones should be chosen during testing phase when searching for optimal solutions given input programs from different domains outside our training set distribution range (i.e., held-out ones). To ensure accuracy, hyperparameters such as temperature T in Eq 3, number layers, embedding dimension node/edge features and output dimension hidden layers MLPs were searched over for each method before selecting best model based on validation results using mean squared loss criterion measure performance metrics like accuracy rate etc..

Conclusion

This paper presents an effective approach for optimizing code-size reduction tasks that outperforms existing techniques while being simple and efficient due its ability transform raw action space into small one denser rewards improve existing human designed compiler flags without requiring excessive budget at compile time or failing generalize unseen programs like reinforcement learning would take days train sparse reward may not well held out ones different domains contrast proposed technique does both quickly accurately making ideal choice developers looking optimize their codes reduce size increase efficiency end result better user experience products services

Created on 24 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

50.3%

Learning to Program with Natural Language

cs.CL

49.9%

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Dev…

cs.AR

49.6%

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL

49.4%

Graph Neural Networks with Learnable Structural and Positional Representations

cs.LG

49.2%

Instruction Tuning with GPT-4

cs.CL

48.9%

Evaluating the Robustness of Interpretability Methods through Explanation Inv…

cs.LG

48.6%

Human Motion Diffusion Model

cs.CV

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.