Improved Text Classification via Test-Time Augmentation
AI-generated Key Points
- Test-time augmentation (TTA) is a technique used in image classification to improve model performance without additional training.
- TTA has seen limited adoption in natural language processing (NLP) due to the difficulty of identifying label-preserving transformations.
- The authors present augmentation policies that yield significant accuracy improvements with language models using TTA.
- Augmentation policy design, such as the number of samples generated from a single non-deterministic augmentation, has a considerable impact on the benefit of TTA.
- The authors apply an augmentation policy containing M transforms to generate M transformed inputs from a text input t.
- A single prediction is generated by applying a simple average to the M + 1 logit predictions.
- The study evaluates the performance of their method across the WILDS CivilComments dataset which consists of 448,000 comments made on Wikipedia talk pages labeled for toxicity and identity-based hate speech detection tasks.
- Experiments show that test-time augmentation can deliver consistent improvements over current state-of-the-art approaches across binary classification tasks and datasets.
- Certain combinations of augmentations yield better results than others.
- This study demonstrates how test-time augmentation can be applied effectively to improve text classification models' performance without additional training.
Authors: Helen Lu, Divya Shanmugam, Harini Suresh, John Guttag
Abstract: Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP due in part to the difficulty of identifying label-preserving transformations. In this paper, we present augmentation policies that yield significant accuracy improvements with language models. A key finding is that augmentation policy design -- for instance, the number of samples generated from a single, non-deterministic augmentation -- has a considerable impact on the benefit of TTA. Experiments across a binary classification task and dataset show that test-time augmentation can deliver consistent improvements over current state-of-the-art approaches.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through atree representation
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.