ENVÍOS, XIX Reunión Nacional y VIII Encuentro Internacional de la AACC

Tamaño de fuente: 
Machine Learning to Predict Depression in College Students during the Argentinean Quarantine for the COVID-19 Pandemic
Lorena Cecilia López Steinmetz, Margarita Sison, Rustam Zhumagambetov, Stefan Haufe, Juan Carlos Godoy

Última modificación: 2023-07-07

Resumen


Objectives: 1) To develop and compare machine learning (ML) classifiers for predicting depression during COVID-19 quarantine in Argentinean college students. 2) To assess performance of classifiers. 3) To identify key features driving depression prediction. Methods: Data from 1492 college students, previously analyzed using mixed effects modeling, was re-examined with ML algorithms (logistic regression, random forest, and support vector machine [SVM]). Dummy models (uniform random, most frequent, and stratified random baselines) were used for baseline comparisons. Input features included psychological inventory scores (depression and anxiety at T1), clinical information, quarantine sub-periods, and demographics. The target variable was depression (depressed, non-depressed; at T2). Data was randomly split into training (75%) and test (25%) sets. Quantile transformation and principal component analysis were applied. Performance of classifiers was assessed using area under the precision-recall curve (AUPRC), area under the receiver operating characteristic curve (AUROC), balanced accuracy, precision, recall, F1 score, Brier score, and average Hamming loss. Permutation feature importance with average precision score was used. Bootstrapping with replacement (100 times) was employed. Results: Across all metrics, the trained models consistently outperformed baselines. SVM and logistic regression classifiers showed similar performance, with SVM slightly outperforming in most metrics. Random forest also yielded positive results, though not as good as SVM and logistic regression. Permutation feature importance analysis revealed depression (at T1) as the most influential predictor, followed by anxiety. Discussion: Surprisingly, commonly reported depression-related features (e.g., mental disorder history, suicidal behavior), had little impact on model performance. This may be due to differences between traditional statistical methods used in psychology (e.g., mixed effects modeling, primarily designed for inference, not prediction) and ML algorithms (designed for prediction). ML has the potential to complement/enrich traditional methods in mental health, but further research is needed to fully utilize it in identifying individuals at risk for depression.

Palabras clave


machine learning; depression: college students; COVID-19 pandemic

Se necesita una cuenta en este sitio para poder ver los documentos. Haga clic aquí para crear una cuenta.