AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem Solving

AI-generated keywords: AutoOpt-11k image dataset mathematical optimization models AutoOpt framework machine learning

AI-generated Key Points

AutoOpt-11k dataset:
Comprises over 11,000 handwritten and printed mathematical optimization models
Includes single-objective, multi-objective, multi-level, and stochastic optimization problems with diverse complexities
Labels in LaTeX representation provided for all images
Modeling language representation for a subset of images
AutoOpt framework:
Machine learning-based automated approach for solving optimization problems
Users provide an image of the formulation to AutoOpt for efficient solving without human intervention
Comprises three modules: M1 (Image_to_Text), M2 (Text_to_Text), and M3 (Optimization)
Performance results:
MER task model (M1) outperforms ChatGPT, Gemini, and Nougat based on BLEU score metric
Hybrid BOBD method (M3) demonstrates superior performance on complex test problems compared to traditional approaches
Conclusion:
Introduces an extensive dataset and innovative framework that streamlines automation of solving optimization problems using machine learning techniques

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ankur Sinha, Shobhit Arora, Dhaval Pujara

arXiv: 2510.21436v1 - DOI (cs.AI)

NeurIPS 2025, 28 pages, 11 figures, 11 tables

License: CC BY-NC-SA 4.0

Abstract: This study presents AutoOpt-11k, a unique image dataset of over 11,000 handwritten and printed mathematical optimization models corresponding to single-objective, multi-objective, multi-level, and stochastic optimization problems exhibiting various types of complexities such as non-linearity, non-convexity, non-differentiability, discontinuity, and high-dimensionality. The labels consist of the LaTeX representation for all the images and modeling language representation for a subset of images. The dataset is created by 25 experts following ethical data creation guidelines and verified in two-phases to avoid errors. Further, we develop AutoOpt framework, a machine learning based automated approach for solving optimization problems, where the user just needs to provide an image of the formulation and AutoOpt solves it efficiently without any further human intervention. AutoOpt framework consists of three Modules: (i) M1 (Image_to_Text)- a deep learning model performs the Mathematical Expression Recognition (MER) task to generate the LaTeX code corresponding to the optimization formulation in image; (ii) M2 (Text_to_Text)- a small-scale fine-tuned LLM generates the PYOMO script (optimization modeling language) from LaTeX code; (iii) M3 (Optimization)- a Bilevel Optimization based Decomposition (BOBD) method solves the optimization formulation described in the PYOMO script. We use AutoOpt-11k dataset for training and testing of deep learning models employed in AutoOpt. The deep learning model for MER task (M1) outperforms ChatGPT, Gemini and Nougat on BLEU score metric. BOBD method (M3), which is a hybrid approach, yields better results on complex test problems compared to common approaches, like interior-point algorithm and genetic algorithm.

Submitted to arXiv on 24 Oct. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2510.21436v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

This study introduces AutoOpt-11k, a groundbreaking image dataset comprising over 11,000 handwritten and printed mathematical optimization models. The dataset includes single-objective, multi-objective, multi-level, and stochastic optimization problems with diverse complexities such as non-linearity, non-convexity, non-differentiability, discontinuity, and high-dimensionality. Labels in LaTeX representation are provided for all images and modeling language representation for a subset of images. The dataset was created by 25 experts adhering to ethical data creation guidelines and verified through a two-phase process to minimize errors. The study also presents the AutoOpt framework - a machine learning-based automated approach for solving optimization problems. Users can simply provide an image of the formulation to AutoOpt which then efficiently solves it without requiring further human intervention. The framework comprises three modules: M1 (Image_to_Text), M2 (Text_to_Text), and M3 (Optimization). M1 utilizes deep learning for Mathematical Expression Recognition (MER) to generate LaTeX code from the optimization formulation in the image. M2 employs a fine-tuned LLM to produce PYOMO script (optimization modeling language) from LaTeX code. And finally, M3 uses a Bilevel Optimization based Decomposition (BOBD) method to solve the optimization formulation described in the PYOMO script. The study leverages the AutoOpt-11k dataset for training and testing deep learning models within the AutoOpt framework. Results show that the MER task model (M1) outperforms ChatGPT, Gemini, and Nougat based on BLEU score metric. Additionally, the hybrid BOBD method (M3) demonstrates superior performance on complex test problems compared to traditional approaches like interior-point algorithm and genetic algorithm. In conclusion, not only introduces an extensive dataset but also presents an innovative framework that streamlines the automation of solving optimization problems using machine learning techniques. This advancement holds significant potential for enhancing efficiency in various fields where mathematical programming formulations are prevalent.

- AutoOpt-11k dataset:
- Comprises over 11,000 handwritten and printed mathematical optimization models
- Includes single-objective, multi-objective, multi-level, and stochastic optimization problems with diverse complexities
- Labels in LaTeX representation provided for all images
- Modeling language representation for a subset of images
- AutoOpt framework:
- Machine learning-based automated approach for solving optimization problems
- Users provide an image of the formulation to AutoOpt for efficient solving without human intervention
- Comprises three modules: M1 (Image_to_Text), M2 (Text_to_Text), and M3 (Optimization)
- Performance results:
- MER task model (M1) outperforms ChatGPT, Gemini, and Nougat based on BLEU score metric
- Hybrid BOBD method (M3) demonstrates superior performance on complex test problems compared to traditional approaches
- Conclusion:
- Introduces an extensive dataset and innovative framework that streamlines automation of solving optimization problems using machine learning techniques

Summary- The AutoOpt-11k dataset has over 11,000 handwritten and printed math problems. - It includes different types of optimization problems with various levels of difficulty. - Each image in the dataset has labels in LaTeX representation for better understanding. - The AutoOpt framework is a machine learning tool that helps solve these math problems automatically. - It consists of three modules to convert images to text and optimize the solutions. Definitions- Dataset: A collection of data or information on a specific topic. - Optimization: Finding the best solution among many possible options. - Machine learning: A type of artificial intelligence where machines learn from data to improve their performance. - Framework: A structure or system designed to achieve specific goals efficiently.

Introduction Optimization problems are ubiquitous in various fields such as engineering, economics, and computer science. These problems involve finding the best solution from a set of feasible options to maximize or minimize an objective function while satisfying certain constraints. Traditional approaches for solving optimization problems require significant human intervention and can be time-consuming and error-prone. However, recent advancements in machine learning have shown great potential for automating this process. In this context, a team of researchers has introduced AutoOpt-11k - a groundbreaking image dataset comprising over 11,000 handwritten and printed mathematical optimization models. This dataset not only provides a vast collection of diverse optimization problems but also includes labels in LaTeX representation for all images and modeling language representation for a subset of images. The creation of this dataset involved 25 experts adhering to ethical data creation guidelines and was verified through a two-phase process to minimize errors. AutoOpt Framework The study also presents the AutoOpt framework - an innovative approach that leverages machine learning techniques for automated solving of optimization problems. This framework comprises three modules: M1 (Image_to_Text), M2 (Text_to_Text), and M3 (Optimization). Each module performs specific tasks that collectively lead to the efficient solution of an optimization problem. M1 utilizes deep learning for Mathematical Expression Recognition (MER) to generate LaTeX code from the optimization formulation in the image provided by the user. This module is crucial as it converts the visual representation into a format that can be understood by machines. M2 employs a fine-tuned Language Model (LM) specifically designed for mathematical expressions to produce PYOMO script - an optimization modeling language used by popular solvers like Gurobi and CPLEX. This step is essential as it translates the LaTeX code generated by M1 into executable code that can be solved using traditional methods. Finally, M3 uses Bilevel Optimization based Decomposition (BOBD) method - a hybrid approach that combines the advantages of both bilevel optimization and decomposition techniques. This module solves the optimization formulation described in the PYOMO script, ultimately providing the optimal solution to the user. Performance Evaluation The study leverages the AutoOpt-11k dataset for training and testing deep learning models within the AutoOpt framework. Results show that M1 outperforms existing state-of-the-art methods such as ChatGPT, Gemini, and Nougat based on BLEU score metric - a widely used evaluation metric for machine translation tasks. This demonstrates the effectiveness of using deep learning for mathematical expression recognition. Furthermore, M3 shows superior performance on complex test problems compared to traditional approaches like interior-point algorithm and genetic algorithm. This highlights the potential of using hybrid methods like BOBD for solving optimization problems. Conclusion In conclusion, this study not only introduces an extensive dataset but also presents an innovative framework that streamlines the automation of solving optimization problems using machine learning techniques. The AutoOpt-11k dataset provides a valuable resource for researchers working in this field and can aid in developing more efficient algorithms for solving optimization problems. The AutoOpt framework holds significant potential for enhancing efficiency in various fields where mathematical programming formulations are prevalent, such as supply chain management, logistics planning, and financial portfolio management. Future research could focus on expanding this dataset to include more diverse types of optimization problems and further improving the performance of each module within the AutoOpt framework. With continued advancements in machine learning technology, we can expect to see even more efficient solutions to complex optimization problems in various industries.

Created on 20 Jan. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

49.5%

Ten Hard Problems in Artificial Intelligence We Must Get Right

cs.AI

49.4%

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-…

cs.AI

47.9%

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthe…

cs.AI

47.4%

Data Interpreter: An LLM Agent For Data Science

cs.AI

46.6%

Improving Contextual Congruence Across Modalities for Effective Multimodal Ma…

cs.AI

44.5%

State of the Art on Diffusion Models for Visual Computing

cs.AI

44.0%

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathemat…

cs.AI

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.