Exécution de notre outil de synthèse sur un nouvel article

C'est la première fois que cet article est demandé et nos outils de synthèse d'IA n'ont jamais été exécutés dessus. Nous pouvons exécuter nos outils maintenant si vous cliquez sur le bouton "Exécuter" sur la page mais assurez-vous d'abord que c'est le bon article.

Exploring the Limitations of Large Language Models in Compositional Relation Reasoning

Jinman Zhao, Xueyan Zhang

arXiv: 2403.02615v1 - DOI (cs.CL)

20 pages, 7 figures, 7 tables, submitted to ICML 2024

Licence : CC BY 4.0

Résumé : We present a comprehensive evaluation of large language models(LLMs)' ability to reason about composition relations through a benchmark encompassing 1,500 test cases in English, designed to cover six distinct types of composition relations: Positional, Comparative, Personal, Mathematical, Identity, and Other. Acknowledging the significance of multilingual capabilities, we expanded our assessment to include translations of these cases into Chinese, Japanese, French, and Korean. Our Multilingual Composition Relation (MCR) benchmark aims at investigating the robustness and adaptability of LLMs in handling composition relation reasoning across diverse linguistic contexts.

Soumis à arXiv le 05 Mar. 2024