Embedded model discrepancy: A case study of Zika modeling


Mathematical models of epidemiological systems enable investigation of and predictions about potential disease outbreaks. However, commonly used models are often highly simplified representations of incredibly complex systems. Because of these simplifications, the model output, of, say, new cases of a disease over time or when an epidemic will occur, may be inconsistent with the available data. In this case, we must improve the model, especially if we plan to make decisions based on it that could affect human health and safety, but direct improvements are often beyond our reach. In this work, we explore this problem through a case study of the Zika outbreak in Brazil in 2016. We propose an embedded discrepancy operator—a modification to the model equations that requires modest information about the system and is calibrated by all relevant data. We show that the new enriched model demonstrates greatly increased consistency with real data. Moreover, the method is general enough to easily apply to many other mathematical models in epidemiology. Potential epidemics of communicable diseases are a major health concern of the modern world, especially as city density, air and water pollution, and worldwide travel steadily increase. A stark example of this is the global coronavirus outbreak, already responsible for more than 2000 deaths around the world at the time of this article’s submission and more than 13 000 at the time of revision, approximately one month later. When faced with a potential outbreak, decision-makers such as health officials and medical professionals rely on mathematical models to aid their decision-making processes. However, oftentimes, these models are not consistent with the dynamical system they are designed to represent. The discrepancy between the output of a model and the real system is then a serious impediment, as it may decrease our confidence in the model, or even invalidate it entirely, so that it can no longer be used to aid in decision-making. When such a discrepancy is observed, we, as modelers, must either improve the model or somehow account for the discrepancy itself. While a direct model improvement is usually the most desirable solution, how to do so may be infeasible because of computational reasons, time constraints, or lack of domain knowledge. This paper provides a systematic method to instead account for the discrepancy itself, explored via a case study of the Brazilian Zika epidemic of 2016. The method is not a correction of the model output to data, but rather a modification of the model equations themselves by a so-called embedded discrepancy operator. The operator is designed with three critical properties in mind: interpretability, domain-consistency, and robustness. We show that including the embedded discrepancy operator greatly increases the fidelity of the model so that the model output and real data are now in fact consistent.

MIDAS Network Members