Compiler Optimization for Irregular Memory Access Patterns in PGAS Programs

AI-generated keywords: Chapel PGAS Distributed-Memory Inspector-Executor Performance

AI-generated Key Points

  • Irregular memory access patterns can be challenging for distributed-memory systems, leading to fine-grained remote communication and unknown data access patterns until runtime.
  • The Partitioned Global Address Space (PGAS) programming model provides users with a view of a distributed-memory system that resembles a single shared address space, but this approach often results in poor performance due to fine-grained remote communication caused by the code written by programmers.
  • Previous research has shown that manually applying optimizations can improve the performance of irregular applications written in Chapel, a high-level PGAS language. However, such manual optimization reduces productivity advantages provided by Chapel and the PGAS model.
  • This paper presents an inspector-executor based compiler optimization for Chapel programs that automatically performs remote data replication for irregular memory accesses to distributed arrays. This work is unique as it presents the first such optimization within the Chapel compiler.
  • The paper evaluates the performance of the optimization across two irregular applications on two different distributed-memory systems. Results show that the optimization can improve performance by up to 52x on a Cray XC system with low-latency interconnect and 364x on a standard Linux cluster with Infiniband interconnect without sacrificing user productivity.
  • Chapel is a high-level language designed for productive parallel computing at scale through constructs for distributed arrays, remote communication, and both data and task parallelism. Tasks in Chapel are executed in parallel through multiple threads implemented by a tasking layer. For distributed-memory programming, Chapel introduces locales as units of machine resources on which tasks execute.
  • In conclusion, this paper presents an automatic compiler optimization for irregular memory accesses in Chapel programs that significantly improves their performance without reducing productivity advantages provided by Chapel and PGAS model. Future work could explore the application of this optimization to other PGAS languages and investigate its performance on larger-scale systems.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Thomas B. Rolinger, Christopher D. Krieger, Alan Sussman

Accepted to the 35th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2022)
License: CC BY 4.0

Abstract: Irregular memory access patterns pose performance and user productivity challenges on distributed-memory systems. They can lead to fine-grained remote communication and the data access patterns are often not known until runtime. The Partitioned Global Address Space (PGAS) programming model addresses these challenges by providing users with a view of a distributed-memory system that resembles a single shared address space. However, this view often leads programmers to write code that causes fine-grained remote communication, which can result in poor performance. Prior work has shown that the performance of irregular applications written in Chapel, a high-level PGAS language, can be improved by manually applying optimizations. However, applying such optimizations by hand reduces the productivity advantages provided by Chapel and the PGAS model. We present an inspector-executor based compiler optimization for Chapel programs that automatically performs remote data replication. While there have been similar compiler optimizations implemented for other PGAS languages, high-level features in Chapel such as implicit processor affinity lead to new challenges for compiler optimization. We evaluate the performance of our optimization across two irregular applications. Our results show that the total runtime can be improved by as much as 52x on a Cray XC system with a low-latency interconnect and 364x on a standard Linux cluster with an Infiniband interconnect, demonstrating that significant performance gains can be achieved without sacrificing user productivity.

Submitted to arXiv on 24 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.13954v1

Irregular memory access patterns can pose significant challenges for distributed-memory systems, leading to fine-grained remote communication and unknown data access patterns until runtime. The Partitioned Global Address Space (PGAS) programming model addresses these challenges by providing users with a view of a distributed-memory system that resembles a single shared address space. However, this approach often results in poor performance due to fine-grained remote communication caused by the code written by programmers. Previous research has shown that manually applying optimizations can improve the performance of irregular applications written in Chapel, a high-level PGAS language. However, such manual optimization reduces productivity advantages provided by Chapel and the PGAS model. To address this issue, this paper presents an inspector-executor based compiler optimization for Chapel programs that automatically performs remote data replication for irregular memory accesses to distributed arrays. This work is unique as it presents the first such optimization within the Chapel compiler. The paper also discusses how features specific to Chapel, such as implicit processor affinity, require a different approach to implementing inspector-executor technique compared to other PGAS languages. The paper evaluates the performance of the optimization across two irregular applications on two different distributed-memory systems. Results show that the optimization can improve performance by up to 52x on a Cray XC system with low-latency interconnect and 364x on a standard Linux cluster with Infiniband interconnect without sacrificing user productivity. Chapel is a high-level language designed for productive parallel computing at scale through constructs for distributed arrays, remote communication, and both data and task parallelism. Tasks in Chapel are executed in parallel through multiple threads implemented by a tasking layer. For distributed-memory programming, Chapel introduces locales as units of machine resources on which tasks execute. In conclusion, this paper presents an automatic compiler optimization for irregular memory accesses in Chapel programs that significantly improves their performance without reducing productivity advantages provided by Chapel and PGAS model. The evaluation results demonstrate its effectiveness across two irregular applications and two different distributed-memory systems. Future work could explore the application of this optimization to other PGAS languages and investigate its performance on larger-scale systems.
Created on 27 Mar. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.