Machine Learning Approaches for Mental Illness Detection on Social Media: A Systematic Review of Biases and Methodological Challenges

AI-generated keywords: Machine learning Mental illness Social media data Bias assessment Depression detection

AI-generated Key Points

Systematic review on the use of machine learning (ML) models for detecting depression through social media data
Identified 47 relevant studies published after 2010
Utilized Prediction model Risk Of Bias ASsessment Tool (PROBAST) to assess methodological quality and bias
Significant biases found in studies, including heavy reliance on Twitter and English-language content, limiting diversity
Non-probability sampling methods used in 80% of studies, affecting representativeness
Only 23% of studies addressed linguistic nuances crucial for accurate sentiment analysis
Risks identified: inconsistent hyperparameter tuning, inadequate data partitioning, class imbalance issues
Future research recommendations: diversifying data sources, standardizing preprocessing methods, addressing class imbalance, enhancing reporting transparency

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuchen Cao, Jianglai Dai, Zhongyan Wang, Yeyubei Zhang, Xiaorui Shen, Yunchong Liu, Yexin Tian

Journal of Behavioral Data Science, 5(1)

arXiv: 2410.16204v3 - DOI (cs.LG)

License: CC BY 4.0

Abstract: The global increase in mental illness requires innovative detection methods for early intervention. Social media provides a valuable platform to identify mental illness through user-generated content. This systematic review examines machine learning (ML) models for detecting mental illness, with a particular focus on depression, using social media data. It highlights biases and methodological challenges encountered throughout the ML lifecycle. A search of PubMed, IEEE Xplore, and Google Scholar identified 47 relevant studies published after 2010. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was utilized to assess methodological quality and risk of bias. The review reveals significant biases affecting model reliability and generalizability. A predominant reliance on Twitter (63.8%) and English-language content (over 90%) limits diversity, with most studies focused on users from the United States and Europe. Non-probability sampling (80%) limits representativeness. Only 23% explicitly addressed linguistic nuances like negations, crucial for accurate sentiment analysis. Inconsistent hyperparameter tuning (27.7%) and inadequate data partitioning (17%) risk overfitting. While 74.5% used appropriate evaluation metrics for imbalanced data, others relied on accuracy without addressing class imbalance, potentially skewing results. Reporting transparency varied, often lacking critical methodological details. These findings highlight the need to diversify data sources, standardize preprocessing, ensure consistent model development, address class imbalance, and enhance reporting transparency. By overcoming these challenges, future research can develop more robust and generalizable ML models for depression detection on social media, contributing to improved mental health outcomes globally.

Submitted to arXiv on 21 Oct. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2410.16204v3

Comprehensive Summary
Key points
Layman's Summary
Blog article

This systematic review examines the use of machine learning (ML) models for detecting mental illness, specifically depression, through analysis of social media data. A comprehensive search of PubMed, IEEE Xplore, and Google Scholar identified 47 relevant studies published after 2010. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was utilized to assess methodological quality and risk of bias in these studies. However, significant biases were found that may affect the reliability and generalizability of the ML models. These include a heavy reliance on Twitter (63.8%) and English-language content (over 90%), limiting diversity with a focus on users from the United States and Europe. Additionally, non-probability sampling methods were used in 80% of the studies which may limit representativeness. Only 23% of the studies explicitly addressed linguistic nuances such as negations which are crucial for accurate sentiment analysis. Inconsistent hyperparameter tuning (27.7%) and inadequate data partitioning (17%) were also identified as risks for overfitting. While most studies used appropriate evaluation metrics for imbalanced data (74.5%), some relied solely on accuracy without addressing class imbalance which could skew results. Reporting transparency varied across studies with many lacking critical methodological details. To address these challenges, future research should focus on diversifying data sources, standardizing preprocessing methods, ensuring consistent model development practices, addressing class imbalance issues, and enhancing reporting transparency. Through this review process involving title and abstract screening by two authors with expertise in machine learning and mental health research followed by full-text screening to ensure unbiased study selection; detailed information on study characteristics such as author names, publication details, study designs; machine learning models used; social media platforms analyzed; performance metrics measured; potential biases identified; limitations of the studies reviewed were extracted. Analytical methods were employed to systematically synthesize findings across different stages of the machine learning lifecycle in the selected studies including sampling techniques, data preprocessing methods, model construction, tuning, and evaluation based on quantitative metrics such as accuracy, precision, recall, F1 scores, and AUROCs. The importance of reporting transparency and completeness in scientific research was emphasized throughout the review process to ensure integrity, reproducibility, and reliability of findings reported in each study. By addressing these challenges and improving reporting standards, more robust and generalizable ML models can be developed for depression detection on social media platforms to contribute to improved mental health outcomes globally.

- Systematic review on the use of machine learning (ML) models for detecting depression through social media data
- Identified 47 relevant studies published after 2010
- Utilized Prediction model Risk Of Bias ASsessment Tool (PROBAST) to assess methodological quality and bias
- Significant biases found in studies, including heavy reliance on Twitter and English-language content, limiting diversity
- Non-probability sampling methods used in 80% of studies, affecting representativeness
- Only 23% of studies addressed linguistic nuances crucial for accurate sentiment analysis
- Risks identified: inconsistent hyperparameter tuning, inadequate data partitioning, class imbalance issues
- Future research recommendations: diversifying data sources, standardizing preprocessing methods, addressing class imbalance, enhancing reporting transparency

SummaryResearchers looked at many studies about using computer programs to find out if people are feeling sad by looking at what they say online. They found that some of the studies were not very fair because they mostly used Twitter and English posts, which made them miss out on different kinds of people. Also, most of the studies did not choose people randomly, so the results might not be true for everyone. Some important things were missing in many studies, like understanding how words can have different meanings and problems with how the computer program was set up. Definitions- Systematic review: A detailed study that looks at all the available information on a specific topic in a structured way. - Machine learning (ML): A type of technology where computers learn from data and improve their performance without being explicitly programmed. - Depression: A medical condition where a person feels very sad and hopeless for a long time. - Social media: Websites or apps where people can share information, pictures, and messages with others online. - Bias: Unfairness or prejudice that affects the results of a study or experiment. - Representativeness: How well a sample group represents the larger population it is supposed to represent. - Sentiment analysis: The process of analyzing text to determine the emotions or opinions expressed within it. - Hyperparameter tuning: Adjusting settings in machine learning models to improve their performance. - Class imbalance: When one class (group) of data is much more common than another in a dataset.

Introduction

Mental illness, particularly depression, is a growing concern globally. According to the World Health Organization (WHO), over 264 million people of all ages suffer from depression worldwide. The prevalence of this mental health disorder has increased by nearly 20% in the last decade alone, making it one of the leading causes of disability and disease burden globally. However, due to stigma and lack of access to mental health services, many individuals with depression go undiagnosed and untreated. In recent years, there has been a growing interest in using machine learning (ML) models for detecting mental illness through analysis of social media data. Social media platforms such as Twitter have become popular sources for gathering large amounts of user-generated content that can provide insights into an individual's mental state. This systematic review aims to examine the use of ML models for detecting depression through social media data analysis.

Methodology

A comprehensive search was conducted on three databases – PubMed, IEEE Xplore, and Google Scholar – to identify relevant studies published after 2010. A total of 47 studies were included in this review based on specific inclusion criteria. To assess methodological quality and risk of bias in these studies, the Prediction model Risk Of Bias ASsessment Tool (PROBAST) was utilized. PROBAST is a widely used tool for evaluating prediction model studies and consists of four domains: participants selection, predictors measurement, outcome assessment, and analysis methods.

Bias Identification

The review found significant biases that may affect the reliability and generalizability of ML models developed for depression detection through social media data analysis. These include: - Heavy reliance on Twitter: Out of the 47 selected studies, 63.8% focused solely on Twitter data which limits diversity. - English-language content: Over 90%of the selected studies analyzed English-language content, further limiting diversity and generalizability. - Limited geographic representation: Most studies focused on users from the United States and Europe, neglecting other regions of the world. - Non-probability sampling methods: 80% of the selected studies used non-probability sampling methods which may limit representativeness. - Lack of consideration for linguistic nuances: Only 23% of the studies explicitly addressed linguistic nuances such as negations which are crucial for accurate sentiment analysis.

Risks for Overfitting

Inconsistent hyperparameter tuning (27.7%) and inadequate data partitioning (17%) were also identified as risks for overfitting in the selected studies. Overfitting occurs when a model is too closely fitted to a specific dataset, resulting in poor performance on new data.

Evaluation Metrics

While most studies used appropriate evaluation metrics for imbalanced data (74.5%), some relied solely on accuracy without addressing class imbalance issues. Class imbalance refers to an unequal distribution of classes in a dataset, which can lead to biased results if not properly addressed.

Findings

The review employed analytical methods to systematically synthesize findings across different stages of the ML lifecycle in the selected studies including: - Sampling techniques - Data preprocessing methods - Model construction - Tuning - Evaluation based on quantitative metrics such as accuracy, precision, recall, F1 scores, and AUROCs The importance of reporting transparency and completeness in scientific research was emphasized throughout the review process to ensure integrity, reproducibility, and reliability of findings reported in each study.

Conclusion

This systematic review highlights several challenges that need to be addressed in future research using ML models for depression detection through social media data analysis. These include diversifying data sources beyond Twitter and English-language content, standardizing preprocessing methods, ensuring consistent model development practices, addressing class imbalance issues, and enhancing reporting transparency. By addressing these challenges and improving reporting standards, more robust and generalizable ML models can be developed for depression detection on social media platforms. This can contribute to improved mental health outcomes globally by providing timely and accurate identification of individuals in need of support and treatment. It is crucial for researchers to consider these recommendations in future studies to ensure the reliability and validity of their findings.

Created on 01 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

65.1%

Mental Illness Classification on Social Media Texts using Deep Learning and T…

cs.LG

57.1%

Foundational Challenges in Assuring Alignment and Safety of Large Language Mo…

cs.LG

57.0%

Leveraging Machine Learning for Early Autism Detection via INDT-ASD Indian Da…

cs.LG

56.2%

Common human diseases prediction using machine learning based on survey data

cs.LG

53.7%

Will we run out of data? Limits of LLM scaling based on human-generated data

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.