A Robust AUC Maximization Framework with Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification

AI-generated keywords: PU Classification AUC Maximization Outlier Detection Feature Selection Robust Learning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Positive-Unlabeled (PU) classification is common in real-world applications like healthcare, text classification, and bioinformatics
  • In PU classification, there are few labeled positive samples and a large volume of unlabeled samples that may contain both positive and negative samples
  • The authors propose a robust learning framework for the PU problem that combines AUC maximization, outlier detection, and feature selection
  • AUC maximization helps handle imbalanced data effectively
  • Outlier detection improves the accuracy of the model by excluding wrong labels from training
  • Feature selection aims to identify and exclude corrupted features that negatively impact classification performance
  • The proposed model provides generalization error bounds and practical guidance for training
  • Empirical comparisons on surgical site infection (SSI) and EEG seizure detection show that the proposed model outperforms existing methods
  • This research presents a comprehensive framework for addressing the challenges of PU classification in healthcare and bioinformatics.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ke Ren, Haichuan Yang, Yu Zhao, Mingshan Xue, Hongyu Miao, Shuai Huang, Ji Liu

Abstract: The positive-unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics, in which we only observe a few samples labeled as "positive" together with a large volume of "unlabeled" samples that may contain both positive and negative samples. Building robust classifier for the PU problem is very challenging, especially for complex data where the negative samples overwhelm and mislabeled samples or corrupted features exist. To address these three issues, we propose a robust learning framework that unifies AUC maximization (a robust metric for biased labels), outlier detection (for excluding wrong labels), and feature selection (for excluding corrupted features). The generalization error bounds are provided for the proposed model that give valuable insight into the theoretical performance of the method and lead to useful practical guidance, e.g., to train a model, we find that the included unlabeled samples are sufficient as long as the sample size is comparable to the number of positive samples in the training process. Empirical comparisons and two real-world applications on surgical site infection (SSI) and EEG seizure detection are also conducted to show the effectiveness of the proposed model.

Submitted to arXiv on 18 Mar. 2018

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1803.06604v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The Positive-Unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics. In this scenario, we only have access to a few samples labeled as "positive" along with a large volume of "unlabeled" samples that may contain both positive and negative samples. Building a robust classifier for the PU problem is challenging, particularly when dealing with complex data where negative samples overwhelm and mislabeled samples or corrupted features exist. To address these challenges, the authors propose a robust learning framework that combines three key components: AUC maximization, outlier detection, and feature selection. AUC maximization is used as a robust metric for biased labels, allowing the classifier to handle imbalanced data effectively. Outlier detection helps exclude wrong labels from the training process, improving the overall accuracy of the model. Feature selection aims to identify and exclude corrupted features that may negatively impact classification performance. The proposed model provides generalization error bounds that offer valuable insight into its theoretical performance. These bounds also provide practical guidance for training the model; it is found that including unlabeled samples in the training process is sufficient as long as their sample size is comparable to the number of positive samples. To demonstrate the effectiveness of their approach, empirical comparisons are conducted on two real-world applications: surgical site infection (SSI) and EEG seizure detection. The results show that the proposed model outperforms existing methods in these applications. Overall, this research presents a comprehensive framework for addressing the challenges of PU classification by combining AUC maximization, outlier detection, and feature selection. The theoretical analysis provides insights into its performance characteristics while empirical evaluations on real-world datasets validate its effectiveness in practical applications such as healthcare and bioinformatics.
Created on 03 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.