Predicting Next-Year Homelessness Levels Using HUD Point-in-Time Data

Jaden Cheung

30/06/2026

https://doi.org/10.65161/rec2XJTJ2appjfVvj

This study investigates whether machine learning models can perform better than a simple persistence baseline when used to predict next-year homelessness using the U.S. Department of Housing and Urban Development Point-in-Time dataset. The target variable is the next-year overall homelessness count by Continuum of Care region. All models were evaluated using a chronological split, where earlier years were used for training and later years were used for testing. This study also conducted two different experiments. Experiment 1 included the current-year overall homelessness and other count variables as features, but in Experiment 2, those features were removed, leaving only a few features left. In Experiment 1, the linear regression model did slightly better in RMSE than the persistence baseline, but their difference was very small. The persistence baseline also had a lower MAE and MAPE than the linear regression. The baseline’s MAE and MAPE was 112.58 and 0.145, while the linear regression’s MAE and MAPE was 138.76 and 0.232. In Experiment 2, the persistence baseline did much better than all machine learning models. Therefore, these results suggest that predicting next year’s count as the current year’s count is a strong predictive signal in the PIT dataset. These models may need more external variables to ultimately perform better than the simple persistence baseline.