Semi-Supervised Record Linkage for Construction of Large-Scale Sociocentric Networks in Resource-limited Settings: An application to the SEARCH Study in Rural Uganda and Kenya


This paper presents a novel semi-supervised algorithmic approach to creating large scale sociocentric networks in rural East Africa. We describe the construction of 32 large-scale sociocentric social networks in rural Sub-Saharan Africa. Networks were constructed by applying a semi-supervised record-linkage algorithm to data from census-enumerated residents of the 32 communities included in the SEARCH study (NCT01864603), a community-cluster randomized HIV prevention trial in Uganda and Kenya. Contacts were solicited using a five question name generator in the domains of emotional support, food sharing, free time, health issues and money issues. The fully constructed networks include 170; 028 nodes and 362; 965 edges aggregated across communities (ranging from 4449 to 6829 nodes and from 2349 to 31,779 edges per community). Our algorithm matched on average 30% of named contacts in Kenyan communities and 50% of named contacts in Ugandan communities to residents named in census enumeration. Assortative mixing measures for eight different covariates reveal that residents in the network have a very strong tendency to associate with others who are similar to them in age, sex, and especially village. The networks in the SEARCH Study will provide a platform for improved understanding of health outcomes in rural East Africa. The network construction algorithm we present may facilitate future social network research in resource-limited settings.

MIDAS Network Members