Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Sign Language Translation (SLT) faces challenges in translating visual-gestural language into text due to its complex nature
- Existing methods use gloss sequences as an intermediate representation, dividing the process into sign language recognition (SLR) and sign language translation (SLT)
- Limited availability of gloss-annotated data and constraints of mid-level gloss representation hinder further advancements in SLT development
- Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-VLP) is a groundbreaking approach that enhances SLT without relying on gloss annotations
- GFSLT-VLP framework includes integrating Contrastive Language-Image Pre-training (CLIP) with masked self-supervised learning and constructing an end-to-end architecture with an encoder-decoder-like structure
- GFSLT-VLP has achieved significant improvements in BLEU-4 score on PHOENIX14T dataset (>+5) and CSL-Daily dataset (>+3) compared to state-of-the-art gloss-free SLT methods
- Competitive results were demonstrated on the PHOENIX14T dataset when compared against most gloss-based methods
- Developed by Benjia Zhou, Zhigang Chen, Albert Clapés, Jun Wan, Yanyan Liang, Sergio Escalera, Zhen Lei, and Du Zhang; accepted for presentation at ICCV'23
- Code for implementing GFSLT-VLP is accessible at https://github.com/zhoubenjia/GFSLT-VLP
Authors: Benjia Zhou, Zhigang Chen, Albert Clapés, Jun Wan, Yanyan Liang, Sergio Escalera, Zhen Lei, Du Zhang
Abstract: Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. Many previous methods employ an intermediate representation, i.e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT). However, the scarcity of gloss-annotated sign language data, combined with the information bottleneck in the mid-level gloss representation, has hindered the further development of the SLT task. To address this challenge, we propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-VLP), which improves SLT by inheriting language-oriented prior knowledge from pre-trained models, without any gloss annotation assistance. Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training (CLIP) with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual Encoder and Text Decoder from the first stage. The seamless combination of these novel designs forms a robust sign language representation and significantly improves gloss-free sign language translation. In particular, we have achieved unprecedented improvements in terms of BLEU-4 score on the PHOENIX14T dataset (>+5) and the CSL-Daily dataset (>+3) compared to state-of-the-art gloss-free SLT methods. Furthermore, our approach also achieves competitive results on the PHOENIX14T dataset when compared with most of the gloss-based methods. Our code is available at https://github.com/zhoubenjia/GFSLT-VLP.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.