Reviving Threshold-Moving: a Simple Plug-in Bagging Ensemble for Binary and Multiclass Imbalanced Data
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Dealing with class imbalance in data mining is a significant challenge that can hinder method effectiveness.
- Creating ensembles of classifiers trained on resampled balanced data is a common approach to address class imbalance.
- Resampling methods like bagged decision trees with random undersampling or synthetic minority oversampling are typically used, but they may introduce biases and require predefined performance measures.
- An alternative strategy involves using a threshold-moving method to adjust the decision threshold after training to counteract imbalance and adapt to specific performance metrics.
- Probability thresholding bagging (PT-bagging) is introduced as a versatile plug-in method that preserves natural class distribution, resulting in well-calibrated posterior probabilities.
- PT-bagging has been extended to handle multiclass data and rigorously validated on binary and multiclass benchmark datasets.
- The combination of ensemble techniques with threshold-moving approaches offers a promising solution for addressing class imbalance issues in data mining tasks.
Authors: Guillem Collell, Drazen Prelec, Kaustubh Patil
Abstract: Class imbalance presents a major hurdle in the application of data mining methods. A common practice to deal with it is to create ensembles of classifiers that learn from resampled balanced data. For example, bagged decision trees combined with random undersampling (RUS) or the synthetic minority oversampling technique (SMOTE). However, most of the resampling methods entail asymmetric changes to the examples of different classes, which in turn can introduce its own biases in the model. Furthermore, those methods require a performance measure to be specified a priori before learning. An alternative is to use a so-called threshold-moving method that a posteriori changes the decision threshold of a model to counteract the imbalance, thus has a potential to adapt to the performance measure of interest. Surprisingly, little attention has been paid to the potential of combining bagging ensemble with threshold-moving. In this paper, we present probability thresholding bagging (PT-bagging), a versatile plug-in method that fills this gap. Contrary to usual rebalancing practice, our method preserves the natural class distribution of the data resulting in well calibrated posterior probabilities. We also extend the proposed method to handle multiclass data. The method is validated on binary and multiclass benchmark data sets. We perform analyses that provide insights into the proposed method.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.