Machine learning to identify persons at high-risk of HIV acquisition in rural Kenya and Uganda.


Machine learning improved classification of individuals at risk of HIV acquisition compared to a model-based approach or reliance on known risk groups, and could inform targeting of prevention strategies in generalized epidemic settings.

Between 2013-2017, >75% of residents in 16 communities in the SEARCH Study tested annually for HIV. In this population, we evaluated three strategies for using demographic factors to predict the one-year risk of HIV seroconversion: (1) membership in ≥1 known "Risk Group" (e.g., young woman or HIV-infected spouse); (2) a "Model-based" risk score constructed with logistic regression; (3) a "Machine Learning" risk score constructed with the Super Learner algorithm. We hypothesized Machine Learning would identify high-risk individuals more efficiently (fewer persons targeted for a fixed sensitivity) and with higher sensitivity (for a fixed number of persons targeted) than either other approach.

In generalized epidemic settings, strategies are needed to prioritize individuals at higher risk of HIV acquisition for prevention services such as pre-exposure prophylaxis. We used population-level HIV testing data from rural Kenya and Uganda to construct HIV risk scores and assessed their ability to identify seroconversions.

75,558 HIV-negative persons contributed 166,723 person-years of follow-up; 519 seroconverted. Machine Learning improved efficiency; to achieve a fixed sensitivity of 50%, the Risk Group strategy targeted 42% of the population, Model-based 27%, and Machine Learning 18%. Machine Learning also improved sensitivity; with an upper limit of 45% targeted, the Risk Group strategy correctly classified 58% of seroconversions, Model-based 68%, and Machine Learning 78%.

MIDAS Network Members