AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem Solving

AI-generated keywords: AutoOpt-11k image dataset mathematical optimization models AutoOpt framework machine learning

AI-generated Key Points

  • AutoOpt-11k dataset:
  • Comprises over 11,000 handwritten and printed mathematical optimization models
  • Includes single-objective, multi-objective, multi-level, and stochastic optimization problems with diverse complexities
  • Labels in LaTeX representation provided for all images
  • Modeling language representation for a subset of images
  • AutoOpt framework:
  • Machine learning-based automated approach for solving optimization problems
  • Users provide an image of the formulation to AutoOpt for efficient solving without human intervention
  • Comprises three modules: M1 (Image_to_Text), M2 (Text_to_Text), and M3 (Optimization)
  • Performance results:
  • MER task model (M1) outperforms ChatGPT, Gemini, and Nougat based on BLEU score metric
  • Hybrid BOBD method (M3) demonstrates superior performance on complex test problems compared to traditional approaches
  • Conclusion:
  • Introduces an extensive dataset and innovative framework that streamlines automation of solving optimization problems using machine learning techniques
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ankur Sinha, Shobhit Arora, Dhaval Pujara

NeurIPS 2025, 28 pages, 11 figures, 11 tables
License: CC BY-NC-SA 4.0

Abstract: This study presents AutoOpt-11k, a unique image dataset of over 11,000 handwritten and printed mathematical optimization models corresponding to single-objective, multi-objective, multi-level, and stochastic optimization problems exhibiting various types of complexities such as non-linearity, non-convexity, non-differentiability, discontinuity, and high-dimensionality. The labels consist of the LaTeX representation for all the images and modeling language representation for a subset of images. The dataset is created by 25 experts following ethical data creation guidelines and verified in two-phases to avoid errors. Further, we develop AutoOpt framework, a machine learning based automated approach for solving optimization problems, where the user just needs to provide an image of the formulation and AutoOpt solves it efficiently without any further human intervention. AutoOpt framework consists of three Modules: (i) M1 (Image_to_Text)- a deep learning model performs the Mathematical Expression Recognition (MER) task to generate the LaTeX code corresponding to the optimization formulation in image; (ii) M2 (Text_to_Text)- a small-scale fine-tuned LLM generates the PYOMO script (optimization modeling language) from LaTeX code; (iii) M3 (Optimization)- a Bilevel Optimization based Decomposition (BOBD) method solves the optimization formulation described in the PYOMO script. We use AutoOpt-11k dataset for training and testing of deep learning models employed in AutoOpt. The deep learning model for MER task (M1) outperforms ChatGPT, Gemini and Nougat on BLEU score metric. BOBD method (M3), which is a hybrid approach, yields better results on complex test problems compared to common approaches, like interior-point algorithm and genetic algorithm.

Submitted to arXiv on 24 Oct. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2510.21436v1

This study introduces AutoOpt-11k, a groundbreaking image dataset comprising over 11,000 handwritten and printed mathematical optimization models. The dataset includes single-objective, multi-objective, multi-level, and stochastic optimization problems with diverse complexities such as non-linearity, non-convexity, non-differentiability, discontinuity, and high-dimensionality. Labels in LaTeX representation are provided for all images and modeling language representation for a subset of images. The dataset was created by 25 experts adhering to ethical data creation guidelines and verified through a two-phase process to minimize errors. The study also presents the AutoOpt framework - a machine learning-based automated approach for solving optimization problems. Users can simply provide an image of the formulation to AutoOpt which then efficiently solves it without requiring further human intervention. The framework comprises three modules: M1 (Image_to_Text), M2 (Text_to_Text), and M3 (Optimization). M1 utilizes deep learning for Mathematical Expression Recognition (MER) to generate LaTeX code from the optimization formulation in the image. M2 employs a fine-tuned LLM to produce PYOMO script (optimization modeling language) from LaTeX code. And finally, M3 uses a Bilevel Optimization based Decomposition (BOBD) method to solve the optimization formulation described in the PYOMO script. The study leverages the AutoOpt-11k dataset for training and testing deep learning models within the AutoOpt framework. Results show that the MER task model (M1) outperforms ChatGPT, Gemini, and Nougat based on BLEU score metric. Additionally, the hybrid BOBD method (M3) demonstrates superior performance on complex test problems compared to traditional approaches like interior-point algorithm and genetic algorithm. In conclusion, not only introduces an extensive dataset but also presents an innovative framework that streamlines the automation of solving optimization problems using machine learning techniques. This advancement holds significant potential for enhancing efficiency in various fields where mathematical programming formulations are prevalent.
Created on 20 Jan. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.