RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

AI-generated keywords: Machine Translation

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Large Language Models (LLMs) in Machine Translation have shown impressive performance but face deployment challenges due to high costs.
Hybrid system paradigm is a common approach, using a small model for most translation requests and selectively directing some to a larger model for cost-effectiveness and quality optimization.
Existing routing strategies often lack precision in determining if the large model offers significant improvements over the small one.
RouteLMT (Learned Sample Routing for Hybrid LLM Translation Deployment) is introduced as an in-model router that uses marginal gain to make budgeted decisions in routing, eliminating the need for external models or hypothesis decoding.
RouteLMT outperforms heuristic approaches and quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu

arXiv: 2604.22520v1 - DOI (cs.CL)

Accepted to ACL 2026 Industry Track

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected gain by probing the small translators prompt-token representation, without requiring external models or hypothesis decoding. Extensive experiments demonstrate that our RouteLMT outperforms heuristics, quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Furthermore, we analyze regression risks and show that a simple guarded variant can mitigate severe quality losses.

Submitted to arXiv on 24 Apr. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2604.22520v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , In the field of Machine Translation (MT), Large Language Models (LLMs) have demonstrated impressive performance, but their widespread deployment is hindered by high costs. To address this challenge, a common approach is the hybrid system paradigm, which involves using a small model for most translation requests and selectively directing some to a larger model to optimize cost-effectiveness and quality. However, existing routing strategies often lack precision in determining whether the large model truly offers significant improvements over the small one. In response to this issue, this paper introduces RouteLMT (Learned Sample Routing for Hybrid LLM Translation Deployment), a novel in-model router that leverages marginal gain as the key signal for making budgeted decisions in routing. By analyzing the expected improvement of the large model over the small one through probing prompt-token representations, RouteLMT eliminates the need for external models or hypothesis decoding. Extensive experiments showcase RouteLMT's superiority over heuristic approaches and quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Additionally, regression risks are examined, highlighting how a guarded variant can effectively mitigate potential significant quality losses. Authored by Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang, Bei Li, Quan Du, Tong Xiao, and Jingbo Zhu; this research has been accepted for presentation at ACL 2026 Industry Track.

- Large Language Models (LLMs) in Machine Translation have shown impressive performance but face deployment challenges due to high costs.
- Hybrid system paradigm is a common approach, using a small model for most translation requests and selectively directing some to a larger model for cost-effectiveness and quality optimization.
- Existing routing strategies often lack precision in determining if the large model offers significant improvements over the small one.
- RouteLMT (Learned Sample Routing for Hybrid LLM Translation Deployment) is introduced as an in-model router that uses marginal gain to make budgeted decisions in routing, eliminating the need for external models or hypothesis decoding.
- RouteLMT outperforms heuristic approaches and quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier.

SummaryLarge Language Models (LLMs) are very good at translating languages, but it can be expensive to use them all the time. A hybrid system uses a mix of small and large models to save money and make translations better. Sometimes it's hard to know when to use the big model for the best results. RouteLMT is a new way to decide which model to use based on how much it helps without needing extra tools. RouteLMT works better than other methods and finds the best balance between quality and cost. Definitions- Large Language Models (LLMs): Advanced computer programs that are really good at translating languages. - Hybrid system paradigm: A way of combining different models or approaches for better results. - Routing strategies: Plans for deciding how data should be directed through a system. - RouteLMT: A new method that helps choose the best translation model based on its benefits. - Pareto frontier: The best possible trade-off between two factors, like quality and cost.

Introduction

Machine Translation (MT) has made significant strides in recent years with the development of Large Language Models (LLMs). These models have shown impressive performance in translating text from one language to another, but their widespread deployment is hindered by high costs. To address this challenge, a common approach is the hybrid system paradigm, which involves using a small model for most translation requests and selectively directing some to a larger model to optimize cost-effectiveness and quality. However, existing routing strategies often lack precision in determining whether the large model truly offers significant improvements over the small one. This can lead to inefficient use of resources and ultimately impact the overall quality of translations. In response to this issue, a team of researchers from various institutions including Alibaba Group and Tsinghua University has developed RouteLMT (Learned Sample Routing for Hybrid LLM Translation Deployment), a novel in-model router that leverages marginal gain as the key signal for making budgeted decisions in routing.

The Problem

The main challenge faced by MT systems is finding an optimal balance between cost and quality. While LLMs have shown great potential in improving translation quality, they come at a high cost due to their large size and computational requirements. As such, it is not feasible or practical to use them for every translation request. To overcome this challenge, hybrid systems were introduced where smaller models are used for most translation requests while larger models are reserved for more complex or difficult translations. However, existing routing strategies often rely on heuristics or external models which may not accurately determine if using the larger model will significantly improve translation quality.

The Solution: RouteLMT

RouteLMT addresses these issues by utilizing marginal gain as the key signal for making budgeted decisions in routing. This means that instead of relying on external models or heuristics, RouteLMT analyzes the expected improvement of the large model over the small one through probing prompt-token representations. This eliminates the need for additional models or hypothesis decoding, making it a more efficient and cost-effective solution.

Experimental Results

The researchers conducted extensive experiments to compare RouteLMT with existing routing strategies and quality/difficulty estimation baselines. The results showed that RouteLMT outperformed these approaches, achieving a superior quality-budget Pareto frontier. This means that RouteLMT was able to achieve better translation quality while using fewer resources compared to other methods. Additionally, the researchers also examined regression risks, which refers to potential significant quality losses when using a smaller model instead of a larger one. They found that by using a guarded variant of RouteLMT, these risks can be effectively mitigated without sacrificing too much in terms of translation quality.

Conclusion

In conclusion, this research paper introduces an innovative solution for optimizing cost-effectiveness and improving translation quality in hybrid LLM systems. By leveraging marginal gain as the key signal for routing decisions, RouteLMT eliminates the need for external models or heuristics and achieves superior performance compared to existing approaches. With its successful experimental results and potential applications in industry settings, RouteLMT is set to make significant contributions in advancing MT technology.

Created on 23 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

68.1%

Steering Large Language Models for Machine Translation with Finetuning and In…

cs.CL

68.1%

Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabiliti…

cs.CL

67.8%

Technical Report: Large Language Models can Strategically Deceive their Users w…

cs.CL

67.4%

Multilingual Machine Translation with Large Language Models: Empirical Result…

cs.CL

66.6%

A Survey of Large Language Models

cs.CL

65.7%

Artificial Impressions: Evaluating Large Language Model Behavior Through the Le…

cs.CL

65.5%

Adapting Large Language Models for Document-Level Machine Translation

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.