top of page

Identifying Associated Factors of Substance Use in Adolescents Using Machine Learning

Jocelyn Gao
02/01/2026

Adolescent substance use remains a pressing public health concern with long-term implications for individuals’ physical and mental health. Research in adolescent development suggests that family structure, school environment, and socioeconomic status influence substance use (1), yet the relationship of different factors with substance use is not fully understood. This study applies machine learning techniques to identify environmental, economic, and demographic factors associated with adolescent substance use using data from the 2023 National Survey on Drug Use and Health. It features three different machine learning models: LASSO L1 Penalized Logistic Regression (LASSO), Random Forest, and Light Gradient Boosting Machine (LightGBM), for their predictive accuracy in identifying environmental and demographic correlations for substance use among adolescents. The most accurate model methodology identified was Random Forest based on Area Under the Curve (AUC) values, Area Under the Precision-Recall Curve (AUPRC) values, and Kolmogorov-Smirnov (K-S) statistics. The leading association factors identified by Random Forest were the respondent’s school attendance and the number of times the respondent moved in the past year, among others.

bottom of page