Tracking the COVID-19 infected population is of great interest to the public health community as they look to monitor the spread of the infection. How to best estimate population infection totals, however, is not under consensus in the academic community, especially as testing remains a strained resource, and various research groups have taken different approaches to estimating the prevalence of the virus in the general population. While many of these approaches rely on serosurvey methodologies others leverage mortality timay be used me series data to not only estimate prevalence but also measure how prevalence has changed over time.
The back calculation method, developed by a group led by Martina Morris at the University of Washington, is one such approach which in addition to mortality data, relies on estimates of age specific infection fatality rate (IFR), an estimate of the mortality lag between infection, age specific population data, and mortality, and optionally age specific case fatality rate (CFR) for distinguishing between cases, individuals who are symptomatic, and infections, those who are asymptomatic or only show mild symptoms. In their demonstration the team estimates the total infection count for King county using COVID-19 mortality data from Washington state’s Department of Health. This methodology may be applied to any COVID-19 mortality time series given that the data accurately captures nearly all COVID-19 related deaths for the associated geography. Age specific population data for the state of Washington was taken from the State Office of Financial Management. Data and uncertainty for IFR, CFR, and lag were taken from a recent publication by Verity et al from a study measuring epidemic statistics from Wuhan after the epidemic had passed. At the time of the articles publication the epidemic in Wuhan had passed although more recent writings hint at new cases developing and data being updated retroactively.
To arrive at the total number of infections age specfic population data is used to create proportional weights that some to one and match the grouping of the age ranges in the Verity et al publication. The weights are then multiplied by age specific IFR and summed to get at the population level IFR for King county. This approach is done rather than use the total population IFR from the Verity study because of differences in the age composition of the two populations. The newly derived population IFR is then multiplied with the mortality time series to construct a lagged time series of the total number of infections. The study uses several time lags in their analysis, however, the choice of mortality time lag has little effect on the end result. The estimated infected population time series is a single time series vector, however, parameter selection uncertainty can be induced by running the same process using the uncertainty of the IFR estimates.
A similar process can optionally be used with CFR to estimate the population that is symptomatic and from this we can distinguish between infected and cases in the general population. The end results is a time series with uncertainty of symptomatic and mild/asymptomatic individuals in King county. The time series estimates up to April 1st, as of writing, with an estimated 45,000 individuals who were infected with COVID-19. Compare this to the much smaller number of 3000 confirmed cases from testing results and we see that their is a big discrepancy between the observed mortality numbers and what testing information is providing us.
The groups work is ongoing and the latest efforts involve pulling age specific mortality data in order to reduce the uncertainty of the estimates. You can stay up to date with her teams work and get a more in depth overview of the methodology and data from her working group’s website.