AXS: A framework for fast astronomical data processing based on Apache Spark

AI-generated keywords: AXS Apache Spark astronomical data processing distributed systems scalability

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Introduction of AXS (Astronomy eXtensions for Spark) framework
  • Leveraging Apache Spark for efficient data processing
  • Features include online positional cross-matching support and Python library
  • Development of distributed ZONES algorithm and data partitioning scheme
  • Achievements in fast cross-matching between Gaia DR2 and AllWise datasets
  • Facilitation of end-user analyses of large datasets like LSST
  • Emphasis on efficiency, scalability, and performance across diverse datasets
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Petar Zečević, Colin T. Slater, Mario Jurić, Andrew J. Connolly, Sven Lončarić, Eric C. Bellm, V. Zach Golkhou, Krzysztof Suberlack

arXiv: 1905.09034v1 - DOI (astro-ph.IM)

Abstract: We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source astronomical data analysis framework built on Apache Spark, a widely used industry-standard engine for big data processing. Building on capabilities present in Spark, AXS aims to enable querying and analyzing almost arbitrarily large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame APIs, and SQL statements. We achieve this by i) adding support to Spark for efficient on-line positional cross-matching and ii) supplying a Python library supporting commonly-used operations for astronomical data analysis. To support scalable cross-matching, we developed a variant of the ZONES algorithm \citep{there-goes_gray_2004} capable of operating in distributed, shared-nothing architecture. We couple this to a data partitioning scheme that enables fast catalog cross-matching and handles the data skew often present in deep all-sky data sets. The cross-match and other often-used functionalities are exposed to the end users through an easy-to-use Python API. We demonstrate AXS' technical and scientific performance on SDSS, ZTF, Gaia DR2, and AllWise catalogs. Using AXS we were able to perform on-the-fly cross-match of Gaia DR2 (1.8 billion rows) and AllWise (900 million rows) data sets in ~ 30 seconds. We discuss how cloud-ready distributed systems like AXS provide a natural way to enable comprehensive end-user analyses of large datasets such as LSST.

Submitted to arXiv on 22 May. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1905.09034v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper "AXS: A framework for fast astronomical data processing based on Apache Spark" introduces AXS (Astronomy eXtensions for Spark), an open-source framework designed to analyze vast astronomical catalogs using familiar Python/AstroPy concepts and SQL statements. Leveraging the power of Apache Spark, AXS offers efficient online positional cross-matching support and a Python library for essential operations. The authors also developed a distributed variant of the ZONES algorithm and a sophisticated data partitioning scheme to enhance cross-matching speed and manage data skew. Notable achievements include performing an on-the-fly cross-match between Gaia DR2 and AllWise datasets in just 30 seconds. The paper discusses how AXS can facilitate end-user analyses of large datasets like LSST and highlights its efficiency, scalability, and performance across diverse datasets. In conclusion, AXS is a cutting-edge solution that harnesses the capabilities of Apache Spark to revolutionize astronomical data analysis processes.
Created on 30 Dec. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.