How to Steer LLM Latents for Hallucination Detection?

Résumés déjà disponibles dans d'autres langues : en

Auteurs : Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, Yixuan Li

arXiv: 2503.01917v1 - DOI (cs.LG)

ICLR Workshop on Quantify Uncertainty and Hallucination in Foundation Models (QUESTION), 2025

Résumé : Hallucinations in LLMs pose a significant concern to their safe deployment in real-world applications. Recent approaches have leveraged the latent space of LLMs for hallucination detection, but their embeddings, optimized for linguistic coherence rather than factual accuracy, often fail to clearly separate truthful and hallucinated content. To this end, we propose the Truthfulness Separator Vector (TSV), a lightweight and flexible steering vector that reshapes the LLM's representation space during inference to enhance the separation between truthful and hallucinated outputs, without altering model parameters. Our two-stage framework first trains TSV on a small set of labeled exemplars to form compact and well-separated clusters. It then augments the exemplar set with unlabeled LLM generations, employing an optimal transport-based algorithm for pseudo-labeling combined with a confidence-based filtering process. Extensive experiments demonstrate that TSV achieves state-of-the-art performance with minimal labeled data, exhibiting strong generalization across datasets and providing a practical solution for real-world LLM applications.

Soumis à arXiv le 01 Mar. 2025

Posez des questions sur cet article à notre assistant IA

Vous pouvez aussi discutez avec plusieurs papiers à la fois ici.

Instructions pour utiliser l'assistant IA ?

Résultats du processus de synthèse de l'article arXiv : 2503.01917v1

Résumé Complet
Points clés
Résumé vulgarisé
Article de blog

Le résumé n'est pas encore prêt

Les points clés ne sont pas encore prêts

Le résumé vulgarisé n'est pas encore prêt

L'article de blog n'est pas encore prêt

Créé le 25 Mar. 2025

Disponible dans d'autres langues : en

Évaluez la qualité du contenu généré par l'IA en votant

Note : 0

Le résumé précédent a été créé il y a plus d'un an et peut être réexécuté (si nécessaire) en cliquant sur le bouton Exécuter ci-dessous.

How to Steer LLM Latents for Hallucination Detection?

Posez des questions sur cet article à notre assistant IA

Résultats du processus de synthèse de l'article arXiv : 2503.01917v1

Articles similaires résumés avec nos outils d'IA