ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments

AI-generated keywords: ALOJA Hadoop Machine Learning Predictive Analytics Big Data

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • The ALOJA project is a collaboration between BSC and Microsoft
  • It has developed analytics tools to interpret Big Data benchmark performance data and tuning
  • The project focuses on Hadoop, which presents a complex run-time environment where costs and performance depend on numerous configuration choices
  • ALOJA has created an open vendor-neutral repository featuring over 40,000 Hadoop job executions and their performance details
  • The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters, and cloud services
  • The predictive analytics extension ALOJA-ML provides an automated system that models environments from observed executions allowing knowledge discovery
  • The resulting models can forecast execution behaviors such as predicting execution times for new configurations or hardware choices enabling model-based anomaly detection or efficient benchmark guidance by prioritizing executions.
  • This work was partially funded by European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No 639595) - HiEST Project.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Josep Ll. Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green

Submitted to IEEE Transactions on Emerging Topics in Computing (TETC). Part of the Aloja Project. Partially funded by European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 639595) - HiEST Project. arXiv admin note: substantial text overlap with arXiv:1511.02030

Abstract: This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.

Submitted to arXiv on 06 Nov. 2015

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1511.02037v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The ALOJA project is a collaboration between BSC and Microsoft that has developed analytics tools to interpret Big Data benchmark performance data and tuning. It focuses on Hadoop, which presents a complex run-time environment where costs and performance depend on numerous configuration choices. To address this challenge, the ALOJA project has created an open vendor-neutral repository featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters, and cloud services. To further improve the study of large resource-constrained search spaces, the predictive analytics extension ALOJA-ML provides an automated system that models environments from observed executions allowing knowledge discovery. The resulting models can forecast execution behaviors such as predicting execution times for new configurations or hardware choices enabling model-based anomaly detection or efficient benchmark guidance by prioritizing executions. The community can benefit from ALOJA datasets and framework to improve the design and deployment of Big Data applications. This work was partially funded by European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No 639595) - HiEST Project.
Created on 29 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.