RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

AI-generated keywords: Machine Translation

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Large Language Models (LLMs) in Machine Translation have shown impressive performance but face deployment challenges due to high costs.
  • Hybrid system paradigm is a common approach, using a small model for most translation requests and selectively directing some to a larger model for cost-effectiveness and quality optimization.
  • Existing routing strategies often lack precision in determining if the large model offers significant improvements over the small one.
  • RouteLMT (Learned Sample Routing for Hybrid LLM Translation Deployment) is introduced as an in-model router that uses marginal gain to make budgeted decisions in routing, eliminating the need for external models or hypothesis decoding.
  • RouteLMT outperforms heuristic approaches and quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu

Accepted to ACL 2026 Industry Track

Abstract: Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected gain by probing the small translators prompt-token representation, without requiring external models or hypothesis decoding. Extensive experiments demonstrate that our RouteLMT outperforms heuristics, quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Furthermore, we analyze regression risks and show that a simple guarded variant can mitigate severe quality losses.

Submitted to arXiv on 24 Apr. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2604.22520v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In the field of Machine Translation (MT), Large Language Models (LLMs) have demonstrated impressive performance, but their widespread deployment is hindered by high costs. To address this challenge, a common approach is the hybrid system paradigm, which involves using a small model for most translation requests and selectively directing some to a larger model to optimize cost-effectiveness and quality. However, existing routing strategies often lack precision in determining whether the large model truly offers significant improvements over the small one. In response to this issue, this paper introduces RouteLMT (Learned Sample Routing for Hybrid LLM Translation Deployment), a novel in-model router that leverages marginal gain as the key signal for making budgeted decisions in routing. By analyzing the expected improvement of the large model over the small one through probing prompt-token representations, RouteLMT eliminates the need for external models or hypothesis decoding. Extensive experiments showcase RouteLMT's superiority over heuristic approaches and quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Additionally, regression risks are examined, highlighting how a guarded variant can effectively mitigate potential significant quality losses. Authored by Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang, Bei Li, Quan Du, Tong Xiao, and Jingbo Zhu; this research has been accepted for presentation at ACL 2026 Industry Track.
Created on 23 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.