Empirical Study on the Software Engineering Practices in Open Source ML Package Repositories
AI-generated Key Points
- Recent advances in AI, particularly in ML, have led to practical applications like virtual personal assistants and autonomous cars.
- Development, training, and deployment of modern ML technologies require technical expertise and resources.
- Public ML package repositories have emerged as a way for practitioners and researchers to discover and reuse pre-trained ML models.
- This study analyzes two popular ML package repositories - TFHub and PyTorch Hub - comparing their features, policies, package organization, package manager functionalities, and usage contexts against established software package repositories.
- The study identifies unique software engineering practices and challenges associated with sharing ML packages.
- Limited empirical data is available on the current state and challenges of these repositories due to their recent emergence.
- These repositories play a crucial role in facilitating effective reuse of ML models.
- The findings provide valuable insights for data scientists, researchers, and software developers looking to utilize shared ML packages.
Authors: Minke Xiu (Jack), Ellis E. Eghan (Jack), Zhen Ming (Jack), Jiang, Bram Adams
Abstract: Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their information elements (features and policies), package organization, package manager functionalities and usage contexts against popular software package repositories (npm, PyPI, and CRAN). Through these studies, we have identified unique SE practices and challenges for sharing ML packages. These findings and implications would be useful for data scientists, researchers and software developers who intend to use these shared ML packages.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.