Empirical Study on the Software Engineering Practices in Open Source ML Package Repositories

AI-generated keywords: ML package repositories TFHub PyTorch Hub npm PyPI CRAN

AI-generated Key Points

  • Recent advances in AI, particularly in ML, have led to practical applications like virtual personal assistants and autonomous cars.
  • Development, training, and deployment of modern ML technologies require technical expertise and resources.
  • Public ML package repositories have emerged as a way for practitioners and researchers to discover and reuse pre-trained ML models.
  • This study analyzes two popular ML package repositories - TFHub and PyTorch Hub - comparing their features, policies, package organization, package manager functionalities, and usage contexts against established software package repositories.
  • The study identifies unique software engineering practices and challenges associated with sharing ML packages.
  • Limited empirical data is available on the current state and challenges of these repositories due to their recent emergence.
  • These repositories play a crucial role in facilitating effective reuse of ML models.
  • The findings provide valuable insights for data scientists, researchers, and software developers looking to utilize shared ML packages.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Minke Xiu (Jack), Ellis E. Eghan (Jack), Zhen Ming (Jack), Jiang, Bram Adams

License: CC BY-NC-SA 4.0

Abstract: Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages.

Submitted to arXiv on 02 Dec. 2020

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2012.01403v2

Recent advances in Artificial Intelligence (AI), particularly in Machine Learning (ML), have led to the development of various practical applications such as virtual personal assistants and autonomous cars. However, the development, training and deployment of modern ML technologies like Deep Learning require significant technical expertise and resources. To address this challenge, public ML package repositories have emerged as a means for practitioners and researchers to discover and reuse pre-trained ML models. This paper presents an exploratory study that analyzes two popular ML package repositories - TFHub and PyTorch Hub - comparing their features, policies, package organization, package manager functionalities and usage contexts against well-established software package repositories like npm, PyPI and CRAN. By examining these repositories, the authors identify unique software engineering practices and challenges associated with sharing ML packages. The study reveals that while there is limited empirical data on their current state and challenges due to their recent emergence as a phenomenon; they play a crucial role in facilitating the effective reuse of ML models. The findings from this study provide valuable insights for data scientists, researchers and software developers who intend to utilize shared ML packages. This refined summary highlights the significance of these repositories in enabling effective reuse of ML models; emphasizing that they offer valuable insights for practitioners looking to leverage shared ML packages.
Created on 19 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.