Bayesian modeling of COVID-19 cases with a correction to account for under-reported cases (2024)

Journal List
Infect Dis Model
v.5; 2020
PMC7513875

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Infect Dis Model. 2020; 5: 699–713.

Published online 2020 Sep 24. doi:10.1016/j.idm.2020.09.005

PMCID: PMC7513875

PMID: 32995681

Anderson Castro Soares de Oliveira,^a Lia Hanna Martins Morita,^a,^∗ Eveliny Barroso da Silva,^a Luiz André Ribeiro Zardo,^a Cor Jesus Fernandes Fontes,^c and Daniele Cristina Tita Granzotto^b

Author information Article notes Copyright and License information PMC Disclaimer

Abstract

The novel of COVID-19 disease started in late 2019 making the worldwide governments came across a high number of critical and death cases, beyond constant fear of the collapse in their health systems. Since the beginning of the pandemic, researchers and authorities are mainly concerned with carrying out quantitative studies (modeling and predictions) overcoming the scarcity of tests that lead us to under-reporting cases. To address these issues, we introduce a Bayesian approach to the SIR model with correction for under-reporting in the analysis of COVID-19 cases in Brazil. The proposed model was enforced to obtain estimates of important quantities such as the reproductive rate and the average infection period, along with the more likely date when the pandemic peak may occur. Several under-reporting scenarios were considered in the simulation study, showing how impacting is the lack of information in the modeling.

Keywords: COVID-19, Under-reporting, SIR model, Bayesian aproach

1. Introduction

The COVID-19 epidemic disease is caused by the new SARS-CoV-2 coronavirus associated with the severe acute respiratory syndrome (SARS) that began in Wuhan, China, late 2019 (Rodríguez-Morales etal., 2020). After the first detected case in China, the disease continued to spread globally with exported cases confirmed in all of the continents worldwide. In a matter of a few months, the disease overtook 80 thousand reported cases until early April 2020. On March 12th, 2020 the World Health Organization (WHO) declared COVID-19 as pandemic disease, when more than 20 thousand cases and almost a thousand deaths were registered in the European Region - the center of this pandemic according to the Europe’s Standing Committee (WHO, 2020).

There are still many unknowns about COVID-19 and the lack of evidence complicates the design of appropriate response policies - for example, it is impossible to precisely say something about the mortality rate and determine the disease recurrence rate (Lenzer, 2020).

Despite uncertainties, the frightening speed through which this disease spreads across communities and the collapse that it is capable of causing to the health systems are facts that must be faced. The exponential growth of the cases and the consequent number of deaths had been observed in a short period. In mid-January 2020, a few weeks after the first detected COVID-19 case in the world, the countries that are close to the territory of the virus origin, on the Asian continent, as well in European and American Regions also began to report cases of the disease. Five months later, more than 200 countries and territories around the world have reported over to 3 million confirmed cases of COVID-19 and a death toll of about 200 thousand people.

In Brazil, the first confirmed COVID-19 case occurred on February 25th, 2020. This first case was a 61 years-old male, who stayed from February 9th to February 20th, 2020 in Lombardy - an Italian region were a significant outbreak was ongoing at that time. On March 17th, the health authorities in São Paulo confirmed the Brazilian death from the new coronavirus. The victim, whose identity has not been disclosed, had been hospitalized in São Paulo city.

Preserving due proportions, COVID-19 is not the first experienced significant outbreaks of infections that were declared Public Health Emergencies of International Concern by the WHO. Year after year we also have experimented with the Zika and Chikungunya outbreaks in the last decade and continue facing the huge consequences of dengue. Confronting outbreaks in the large Brazilian territory is a twofold problem. The first is the demographic and territorial size of the country, with an estimated population of 210 million according to the Brazilian Institute for Geography and Statistics and the heterogeneity intrinsic to its extensive territory. Another problem pointed out by the past epidemics run into a recurring problem of under-reporting (de Oliveira etal., 2017; Stoner etal., 2019).

The COVID-19, given its complexity and behavior, exposed the problem of under-reporting disease occurrence not only in Brazil but in several countries worldwide. As a consequence, the lack of information has launched a warning about the researchers of the world concerning models and estimates, since the database available may not be reliable from what had indeed been observed.

Focusing on the modeling and estimating, aiming to preview the behavior and the speed of the COVID-19 growth, this paper presents an approach to address the problem of under-registration of COVID-19 cases in Brazil, proposing methodologies to work on the inaccuracy of the official reported cases. Then, we investigate a general framework for correcting under-reporting data making it possible to perform a model, in a Bayesian framework, which allows great flexibility and leads to complete predictive distributions for the true counts, therefore quantifying the uncertainty in correcting the under-reporting. Several scenarios of under-reporting were considered in a simulation study, presenting the real lack of data impact.

This paper is organized as follows. Section 2 describes the methodology for estimating the reported rates. In Section 3, we introduce the SIR model for modeling epidemics. In Section 4, we introduce the Bayesian framework for the SIR model with a modification to account for under-reporting. In Section 5 we show the model application for COVID-19 cases in Brazil and in Section 6, we present a simulation study of the proposed model. Finally, in Section 7, we give some concluding remarks.

2. Reported rate estimation

Although in the first moment there was a real hunt for the size and the moment of the COVID-19 cases peak, the most important aspects of the outbreak are the growth rate of the infection. Statistical and mathematical models are being used to preview the rates and analyze the growth curve behavior to assist health public managers in decision-making (Cotta etal., 2020).

According to Kim etal. (2020), estimating the case fatality rate (CFR) is a high priority in response to this pandemic. This fatality rate is the proportion of deaths among all confirmed patients with the disease, which has been used to assess and compare the severity of the epidemic between countries. The rates can also be used to assess the healthcare capacity in response to the outbreak. Indeed, several researchers are interested in estimating the CFR in the peak of the outbreak, analyzing its variation among different countries, and check the influence of other features as ages, gender, and physical characteristics in the CFR of the COVID-19.

Aiming to estimate the CFR, first of all, lets set up the Brazilian scenario of COVID-19 case notification: the Brazilian Ministry of Health collects daily all confirmed cases data for Brazil and all its states. Although the data presented by the health authorities are official, they are only from patients with COVID-19 confirmed by blood and/or swab positive tests. Given the scarcity of tests for all the suspected individuals, the notified patients are only those with severe disease or that demanding hospitalization. It is relevant to highlight that no clinically diagnosed patient, even those with symptoms compatible with the disease have been officially counted, evidencing an under-reporting of the case frequency.

Faced with the lack of COVID-19 tests, which naturally leads to the under-reporting data, before any modeling purpose we have the desire to correct and update the current numbers, bringing them as close as possible to reality.

Following Russel etal. (2020), we also based on a delay-adjusted case fatality ratio to estimate under-reporting, using the incidence of cases and deaths to estimate the number of notified cases by

$Equation 1.$

(1)

where $c_{t}$ is the daily incidence of cases at the moment t, $f_{j}$ is the proportion of cases with a delay between the confirmation and the death, and $μ_{t}$ represents the underestimation proportion of cases with known outcomes, (Nishiura etal., 2009).

Then, the corrected CFR is given by

$Equation 2.$

(2)

where $m_{t}$ is the cumulative number of deaths.

To estimate the potential for under-reporting, we assume that the CFR is $1.4 %$ with a $95 %$ confidence interval from $1.2 %$ up to $1.7 %$ found in China (Guan etal., 2020; WHO, 2020). Thus, the potential for reporting rate is given by

$Equation 3.$

(3)

3. The SIR model

Epidemic models are tools widely used to study the mechanisms by which diseases spread, to predict the course of an outbreak, and to evaluate strategies to control an epidemic disease. Several analyses of an epidemic spreading disease can be found in the literature that applies the time series model (given the historical data), the log-logistic family of models (the Chapman, Richards, among others), and compartments models (Bjørnstad, 2018).

Kermack and McKendrick (1927) proposed a class of compartmental models that simplified the mathematical modeling of infectious disease transmission. Entitled as SIR model, it is a set of general equations which explains the dynamics of an infectious disease spreading through a susceptible population. Essentially, the standard SIR model is a set of differential equations that can suit the Susceptible (if previously unexposed to the pathogen), Infected (if currently colonized by the pathogen), and Removed (either by death or recovery) as follows:

$\frac{d S}{d t} = - β S I,$

$\frac{d I}{d t} = β S I - γ I,$

It is important to note that

$\frac{d S}{d t} + \frac{d I}{d t} + \frac{d R}{d t} = 0$

and so, the total population, $S (t) + I (t) + R (t)$ remains constant for all $t \geq 0$ .

For the practical point of view, the most interesting issue is to estimate $\frac{1}{γ}$ , which determines the average infection period, and the basic reproductive ratio $R_{0}$ . For the simple SIR model, all individuals in the population are susceptible, that is, $S (t) = 1$ , then $R_{0}$ is defined as the expected number of secondary infections from a single index case and given by the expression $R_{0} = \frac{β}{γ}$ ().

4. Bayesian approach

The Bayesian methods are used in several works (Gelman etal., 1995); (Paulino etal., 2018). The Bayesian approach in the context of the SIR model is a flexible way to account for uncertainty in the parameters, in the form of the disease transmission dynamic. The Dirichlet-Beta state-space model appears in some papers as Osthus etal. (2017) and Song etal. (2020). The target distribution for inference is the a posteriori distribution of the quantities of interest, more specifically β, γ, and $R_{0}$ : the infectious contact rate, the removal rate, and the propagation rate, respectively. The application of this methodology is through Markov chain Monte Carlo methods (MCMC) through Gibbs Sampling and the Metropolis-Hastings algorithm ().

The use of Dirichlet distribution for the proportions of susceptible, infected, and removed individuals in the target population are a feasible way to guarantee that the support set of these quantities has boundaries, for example, the number of infected individuals must be always positive.

4.1. Model specification

In this section, we present a modification to account for under-reporting in the context of the Dirichlet-Beta state-space model from Osthus etal. (2017). This adaptation is based on a reparametrization of Beta distribution that includes the reported rate estimate, η, from equation (3).

The Beta distribution, as is well known, is very flexible for proportions modeling since its density can have quite different shapes depending on the values of the two parameters that index this distribution (). For this reason, we made a reparametrization to the Beta model in such a way that we could obtain a regression structure for the means of the response variables associated with a precision parameter.

Let $Y_{t}^{I}$ be the reported infected proportion, $Y_{t}^{R}$ be the reported removed proportion and $θ_{t} = (θ_{t}^{S}, θ_{t}^{I}, θ_{t}^{R})$ be the true but unobservable susceptible, infectious, and removed proportions of the population, respectively.

Hence, we rewrite the SIR model in terms of these unobservable proportions as the following

$Equation 4.$

(4)

Then, the distributions for $Y_{t}^{I}$ , $Y_{t}^{R}$ , and $θ_{t}$ are given below

$Y_{t}^{I} | θ_{t}^{I}, φ ~ Beta (λ_{I} η θ_{t}^{I}, λ_{I} (1 - η θ_{t}^{I})),$

$Y_{t}^{R} | θ_{t}^{R}, φ ~ Beta (λ_{R} η θ_{t}^{R}, λ_{R} (1 - η θ_{t}^{R})),$

$θ_{t} | θ_{t - 1}, φ ~ Dirichlet (κ f (θ_{t - 1}, β, γ)),$

where $φ = (β, γ, θ_{0}, κ, λ_{I}, λ_{R})$ is the parameter vector for this model. Since we consider the beta distribution, we are assuming that $E [Y_{t}^{I}] = η θ_{t}^{I}$ and $E [Y_{t}^{R}] = η θ_{t}^{R}$ and the parameters $λ_{I} > 0$ and $λ_{R} > 0$ are responsible for controlling of the distribution variance. Besides that, the parameter $κ > 0$ controls the variance of the Dirichlet distribution. The solution for the differential equations in (4) is given by $f (θ_{t - 1}, β, γ)$ , that have the role of propagating the latent state $θ_{t}$ forward in one time step.

Note that it is necessary to obtain the solutions for the proportions $θ_{t}^{S}$ $θ_{t}^{I}$ and $θ_{t}^{R}$ . These solutions can be found using the Runge-Kutta fourth-order method, in short RK4, for solving non-linear ordinary differential equations (Mathews, 1992) and can be seen in Appendix A.

5. Case study: the COVID-19 Brazilian data

The official Brazilian data consists of daily collections carried out by the national health department with records of infected individuals and deaths in all states and national territory, from February 26th, 2020 when the first case of COVID-19 was registered up to May 20th, 2020.

It is notable in Brazil a lack of testing due to the registry of only severe cases and consequently under-reporting cases of COVID-19. Taking this fact into account, we consider for this research not only the official data but also the estimates of reported rate.

5.1. Reported rate of COVID-19

In order to obtain the estimate of reported rate, assume that delay in confirmation until death follows the same estimated distribution of hospitalization until death. Using data from COVID-19 in Wuhan, China, between December 17th, 2019, and January 22nd, 2020, it has a lognormal distribution with mean of 13, median of 9.1 and standard deviation of 12.7 days (). This methodology based on the information of delay from hospitalization until death is reasonable since China was considered as one of the countries that most tested the population for the virus, and consequently, it is supposed to have a tiny under-reporting rate.

Using the methodology presented in section 2 and assuming that $c_{t}$ in (1) is the daily incidence of official cases reported by the Brazilian Ministry of Health, the reporting rate in Brazil, η, was estimated to be 0.07 with $95 %$ confidence interval from 0.06 up to 0.08. Prado etal. (2020) obtained a reporting rate of 0.08 with data from Brazil until April 10th, 2020. These results are similar to the analysis from Ribeiro and Bernardes (2020), which present a $7.7 : 1$ Funder-reporting rate, meaning that the real cases in Brazil should be, at least, seven times the published number.

Table 1 presents the rates for all states of Brazil, from which we can observe that Paraíba has the lowest reported rate 0.06 and while Roraima presents the highest reported rate 0.52. Indeed, Prado etal. (2020) found that Paraíba and Pernambuco had a low reporting rate comparing with other states.

Table 1

Reported rate estimates and $95 %$ confidence interval ( $95 %$ CI) for COVID-19 Brazilian data.

State	Rate ( $\hat{η}$ )	Lower $95 %$ CI	Upper $95 %$ CI
Acre	0.14	0.12	0.17
Alagoas	0.10	0.09	0.12
Amapa	0.20	0.17	0.24
Amazonas	0.08	0.07	0.10
Bahia	0.20	0.17	0.24
Ceará	0.11	0.10	0.13
Distrito Feral	0.40	0.34	0.48
Espírito Santo	0.19	0.16	0.23
Goiás	0.19	0.17	0.24
Maranhão	0.11	0.09	0.13
Mato Grosso	0.22	0.19	0.27
Mato Grosso do Sul	0.24	0.21	0.30
Minas Gerais	0.19	0.16	0.23
Parã	0.10	0.08	0.12
Paraíba	0.06	0.06	0.08
Paraná	0.15	0.13	0.18
Pernambuco	0.07	0.06	0.09
Piauí	0.10	0.09	0.12
Rio de Janeiro	0.08	0.07	0.10
Rio Grande do Norte	0.14	0.12	0.17
Rio Grande do Sul	0.23	0.20	0.28
Rondônia	0.17	0.15	0.21
Roraima	0.52	0.44	0.63
Santa Catarina	0.30	0.25	0.36
São Paulo	0.09	0.07	0.10
Sergipe	0.12	0.10	0.14
Tocantins	0.17	0.15	0.21

Open in a separate window

5.2. Estimation: Dirichlet-Beta state-space model

For the adjustment of the Bayesian model, the prioris and hyper-parameters are specified:

•
γ - We assume that the average infection period is equal to 15 days. Thus, the γ a priori belongs to lognormal distribution with mean of 0.07 and variance of 0.01.

$γ ~ LogN (- 3.215, 1.112) .$

•
The average infection period ρ comes directly from γ parameter, that is, $ρ = \frac{1}{γ}$ .
•
β - The reproduction number $R_{0}$ of the disease is estimated by the ratio $R_{0} = \frac{β}{γ}$ . We assume that $R_{0}$ a priori belongs to lognormal distribution with mean of 3 and variance of 9. Thus β values were obtained from $β = R_{0} γ$

$R_{0} ~ LogN (0.752, 0.693)$

•
The a priori distributions for k, $λ_{I}$ and $λ_{R}$ and $θ_{0}$ were obtained according to Osthus etal. (2017), that is, k. $~ Gamma (2,0.0001),$

$λ_{I} ~ Gamma (2,0.0001),$

Table 2

P-values for Geweke, and Heidelberger and Welch convergence diagnostics.

parameter	Geweke	Heidelberger and Welch
$R_{0}$	0.7222	0.2026
β	0.8900	0.1898
γ	0.8210	0.2611
κ	0.2965	0.1455
$λ_{I}$	0.8205	0.2462
$λ_{R}$	0.1118	0.2568

Open in a separate window

The parameter estimates are shown in Table 3, in which $\hat{β} = 0.1125$ and $\hat{γ} = 0.0308$ are the major characteristics from SIR model and $\hat{k} = 52,535.34$ , $\hat{λ_{I}} = 217,894.30$ and $\hat{λ_{R}} = 223,431.60$ express the magnitude of the process error for the unknown proportions $(θ)$ in Bayesian approach.

Table 3

Point estimates and 95% Credible Interval.

parameter	Mode	$95 %$ Credible Interval
		lower	upper
γ	0.0308	0.0272	0.0343
β	0.1125	0.1067	0.1201
$R_{0}$	3.6243	3.3528	4.0335
ρ	32.1667	29.1268	36.7576
κ	52535.34	38384.26	71244.52
$λ_{I}$	217894.30	148822.20	310111.60
$λ_{R}$	223431.60	147997.80	320880.00

Open in a separate window

The inference results show that $\hat{R_{0}} = 3.6243$ which expresses a high reproductive rate of the virus. Also, $\hat{ρ} = 32.1667$ days shows that the time for virus infection is very close to one month period.

Furthermore, Fig.B.6 shows the charts of the estimated a posteriori densities for the parameters β, γ, $R_{0}$ , ρ, $λ_{I}$ , and $λ_{R}$ , from which we conclude that the curves have a symmetrical shape around its modes.

Using the parameter estimates from Table 3 and the latent proportion $(θ)$ , we reached information about the peak from SIR curve for the COVID-19 transmission in Brazil, that is the time when the proportion of infected individuals reaches its maximum. The peak estimate is June 18th, 2020, occurring between June 12th and June 22nd, 2020 and it is shown in Fig.1.

Open in a separate window

Fig.1

Estimated SIR curves for COVID-19 Brazilian data from February 26th to May 20th, 2020.

Finally, to evaluate the goodness of fit from the SIR model, Fig.B.5 on Appendix B shows the chart with observed COVID-19 cases and the cases estimated by the SIR model, along with the corresponding highest probability density (HPD)interval. The cases fitted by the SIR model accounts for under-reporting and the HPD intervals are obtained through Chen and Shao algorithm (). From Fig.B.5 we can observe that the SIR estimates for reported cases are close to observed cases reported by the Brazilian Ministry of Health.

5.3. Estimation: Dirichlet-Beta state-space model considering CFR unknown

Additionally, we conducted a Bayesian analysis considering the case fatality rate (CFR) being unknown and assumed to have a uniform distribution. To achieve this goal, we focus our attention on the η parameter, which is the under-reporting rate. It means to say that CFR varies according to the uniform distribution, it is equivalent to saying that the η varies according to the uniform distribution:

$η ~ U (0.0579, 0.0821)$

The results are shown in Table B5 on Appendix B. We can observe that the point estimates for the model parameters are very similar to the estimates from Table 3. However, the $95 %$ credible intervals for β and $R_{0}$ are wider than in Table 3 due to the flexibility that is given to η parameter. Moreover, the peak estimate under this more flexible Bayesian approach is June 16th, 2020, which is very close to the first model when CFR is supposed to be fixed.

6. Simulation study

Concerning to evaluate the effect of the notification rate on the model’s estimates, a simulation study was carried out. The model was estimated considering COVID-19 data in Brazil, assuming a reporting rate between 0.05 and 1.00, varying every 0.05. Aiming the practical point of view, we conduct a simulation study to investigate the effects of under-reporting in the parameters of the SIR model and how it impacts on the pandemic curve behavior. For each value of η, a chain of 300,000 interactions was generated, with a burn-in of 10,000 and a thin of 300.

Fig.2 shows the point estimates and $95 %$ credible intervals for β and γ versus the reported rate values. It can be observed that as reported rate increases, β estimate becomes lower, which means that the infectious contact rate is underestimated when under-reporting is ignored. Additionally, the removal rate γ remains almost constant when the reported rate increases, which means that it is not influenced by the rates.

Open in a separate window

Fig.2

Point estimates and 95% credible intervals for β and γ versus reported rates.

The graphics with the point estimates and $95 %$ credible intervals for $R_{0}$ and infection period ρ versus the reported rates are shown in Fig.3, from which we observe that $R_{0}$ decreases as the reported rate increases and ρ keeps roughly invariant, then we can conclude that the reproduction rate and infection period can be underestimated when under-reporting is ignored, affording an unreal impression on a tiny mean number of secondary individuals that a primary individual can infect, when in fact it is large.

Open in a separate window

Fig.3

Point estimates and 95% credible intervals for $R_{0}$ and infection period versus reported rates.

Fig.4 shows the estimated SIR curves for COVID-19 versus reported rate, from which we observe that the lower the reported rate, the earlier the peak is reached with a higher proportion of infected individuals. It is also observed that the contagion curves become similar to each other as the reported rates increase. These results reveal that the peak estimate of the COVID-19 transmission curve in Brazil is compromised when the presence of under-reporting is ignored.

Open in a separate window

Fig.4

Estimated SIR curves versus reported rate for COVID-19 Brazilian data.

Finally, Table 4 presents the deviance information criterion (DIC) (Spiegelhalter etal., 2002), which indicates the SIR model with the reported rate of 0.1 as the best one that fitted the simulated data, since its DIC value is the lowest. These results suggest that the notification rate is very low.

Table 4

DIC values for COVID-19 Brazilian data.

rate $(η)$	DIC	rate $(η)$	DIC
0.05	3197.96	0.55	3260.65
0.10	3183.47	0.60	3266.44
0.15	3208.17	0.65	3272.19
0.20	3237.96	0.70	3277.76
0.25	3241.55	0.75	3279.27
0.30	3248.51	0.80	3283.36
0.35	3249.57	0.85	3295.20
0.40	3258.51	0.90	3296.90
0.45	3259.62	0.95	3306.84
0.50	3259.71	1.00	3307.88

Open in a separate window

7. Concluding remarks

In this paper, we show that the method of adjusting cases by delay can be used to determine the reported rate of COVID-19 cases. Thus, it was possible that the rate of cases reported in Brazil is 0.07 and thus underestimates the real spreading of pandemic in the country.

Thus we proposed a SIR model with correction for under-reporting. The Bayesian approach is a feasible way to deal with the parameters inherent to the SIR model.

The methods reached convergence in the application with the Brazilian COVID-19 data set. Thus, a reproductive rate of 3.6243 was obtained, indicating that the epidemic is still booming in Brazil.

The simulation study revealed that the parameters estimates from the SIR model and the peak estimate which is a concern of several researchers and health authorities are sensitive to reporting rates. Future work may include considering the use of extended SIR models like the SEIR model (with the compartments of susceptible, exposed, infected, and removed individuals), and further, consider different scenarios of isolation and quarantine for the strategy of the COVID-19 transmission control.

Notes

Handling editor: Dr. J Wu

Footnotes

^☆Fully documented templates are available in the elsarticle package on CTAN.

Peer review under responsibility of KeAi Communications Co., Ltd.

Appendix A. Numerical Solution for SIR model

Let $f (θ_{t - 1}, β, γ)$ be the Runge-Kutta RK4 approximation to the SIR model. Thus,

$f (θ_{t - 1}, β, γ) = [\begin{array}{l} θ_{t - 1}^{S} + \frac{1}{6} (k_{t - 1}^{S_{1}} + 2 k_{t - 1}^{S_{2}} + 2 k_{t - 1}^{S_{3}} + k_{t - 1}^{S_{4}}) \\ θ_{t - 1}^{I} + \frac{1}{6} (k_{t - 1}^{I_{1}} + 2 k_{t - 1}^{I_{2}} + 2 k_{t - 1}^{I_{3}} + k_{t - 1}^{I_{4}}) \\ θ_{t - 1}^{R} + \frac{1}{6} (k_{t - 1}^{R_{1}} + 2 k_{t - 1}^{R_{2}} + 2 k_{t - 1}^{R_{3}} + k_{t - 1}^{R_{4}}) \end{array}]$

where

$k_{t}^{S_{1}} = - β θ_{t}^{S} θ_{t}^{I}$

$k_{t}^{S_{2}} = - β (θ_{t}^{S} + \frac{1}{2} k_{t}^{S_{1}}) (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{1}})$

$k_{t}^{S_{3}} = - β (θ_{t}^{S} + \frac{1}{2} k_{t}^{S_{2}}) (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{2}})$

$k_{t}^{S_{4}} = - β (θ_{t}^{S} + k_{t}^{S_{2}}) (θ_{t}^{I} + k_{t}^{I_{2}})$

$k_{t}^{I_{1}} = β θ_{t}^{S} θ_{t}^{I} - γ θ_{t}^{I}$

$k_{t}^{I_{2}} = β (θ_{t}^{S} + \frac{1}{2} k_{t}^{S_{1}}) (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{1}}) - γ (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{1}})$

$k_{t}^{I_{3}} = β (θ_{t}^{S} + \frac{1}{2} k_{t}^{S_{2}}) (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{2}}) - γ (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{2}})$

$k_{t}^{I_{4}} = β (θ_{t}^{S} + k_{t}^{S_{3}}) (θ_{t}^{I} + k_{t}^{I_{3}}) - γ (θ_{t}^{I} + k_{t}^{I_{3}})$

$k_{t}^{R_{1}} = - γ θ_{t}^{I}$

$k_{t}^{R_{2}} = - γ (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{1}})$

$k_{t}^{R_{3}} = - γ (θ_{t}^{I} + \frac{1}{2} k_{t}^{I_{2}})$

$k_{t}^{R_{4}} = - γ (θ_{t}^{I} + k_{t}^{I_{3}})$

Appendix B. Figures and tables

Fig.B.6

Open in a separate window

Estimated a posteriori densities for the parameters.

Fig.B.5

Open in a separate window

Comparison of observed COVID-19 cases and the cases estimated by SIR model with a correction to account for under-reporting.

Table B.5

Point estimates and 95% Credible Interval considering CFR unknown.

parameter	Mode	$95 %$ Credible Interval
		lower	upper
Γ	0.0309	0.0270	0.0341
Β	0.1153	0.1078	0.1220
$R_{0}$	3.7320	3.3600	4.1200
Ρ	32.300	28.9280	36.4790
Κ	48478.87	33976.36	67350.45
$λ_{I}$	219245.20	142043.10	308019.10
$λ_{R}$	219204.00	149087.40	312584.00

Open in a separate window

References

Bjørnstad O.N. Springer; 2018. Epidemics: Models and data using R. [Google Scholar]
Chen M.-H., Shao Q.-M. Monte Carlo estimation of bayesian credible and hpd intervals. Journal of Computational & Graphical Statistics. 1999;8:69–92. http://www.jstor.org/stable/1390921 [Google Scholar]
Chib S., Greenberg E. Understanding the metropolis-hastings algorithm. The American Statistician. 1995;49:327–335. http://www.jstor.org/stable/2684568 [Google Scholar]
Cotta R.M., Naveira-Cotta C.P., Magal P. 2020. Parametric identification and public health measures influence on the covid-19 epidemic evolution in Brazil. medRxiv. arXiv: https://www.medrxiv.org/content/early/2020/05/12/2020.03.31.20049130.full.pdf. [PMC free article] [PubMed] [Google Scholar]
Ferrari S., Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of Applied Statistics. 2004;31:799–815. [Google Scholar]
Gelman A., Robert C., Chopin N., Rousseau J. 1995. Bayesian data analysis. [Google Scholar]
Geman S., Geman D. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Recognition. 1984;6:721–741. [PubMed] [Google Scholar]
Geweke J. Bayesian statistics. University Press; 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments; pp. 169–193. [Google Scholar]
Guan W-j, Ni Z-y, Hu Y. Vol. 382. New England Journal of Medicine; 2020. pp. 1708–1720. (Clinical characteristics of coronavirus disease 2019 in China). 18, In this issue. [PMC free article] [PubMed] [Google Scholar]
Heidelberger P., Welch P.D. Simulation run length control in the presence of an initial transient. Operations Research. 1983;31:1109–1144. [Google Scholar]
Keeling M.J., Rohani P. Princeton University Press; 2011. Modeling infectious diseases in humans and animals. [Google Scholar]
Kermack W.O., McKendrick A.G. Acontribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character. 1927;115:700–721. [Google Scholar]
Kim D.-H., Choe Y.J., Jeong J.-Y. Understanding and interpretation of case fatality rate of coronavirus disease 2019. Journal of Korean Medical Science. 2020;35 [PMC free article] [PubMed] [Google Scholar]
Lenzer J. Covid-19: US gives emergency approval to hydroxychloroquine despite lack of evidence. 369:m1335. BMJ Publishing Group Ltd; 2020. In this issue. [PubMed] [Google Scholar]
Linton NM Y.Y.e. a., Kobayashi T. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine. 2020;9:538. [PMC free article] [PubMed] [Google Scholar]
Mathews J.H. 2nd ed. Prentice-Hall International; Englewood Cliffs: 1992. Numerical methods for mathematics, science and engineering. [Google Scholar]
Nishiura H., Klinkenberg D., Roberts M., Heesterbeek J.A.P. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One. 2009;4 [PMC free article] [PubMed] [Google Scholar]
de Oliveira G.L., Loschi R.H., Assunção R.M. Arandom-censoring Poisson model for underreported data. Statistics in Medicine. 2017;36:4873–4892. [PubMed] [Google Scholar]
Osthus D., Hickmann K.S., Caragea P.C., Higdon D., Del Valle S.Y. Forecasting seasonal influenza with a state-space sir model. Annals of Applied Statistics. 2017;11 doi:10.1214/16-AOAS1000. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
Paulino C.D., Amaral Turkman M.A., Murteira B., Silva G.L. 2nd ed. Fundação Calouste Gulbenkian; Lisboa: 2018. Estatística bayesiana. [Google Scholar]
Plummer M. rjags: Bayesian graphical models using MCMC. 2019. https://CRAN.R-project.org/package=rjags r package version 4-10
Plummer M., Best N., Cowles K., Vines K. Coda: Convergence diagnosis and output analysis for mcmc. RNews. 2006;6:7–11. https://journal.r-project.org/archive/ [Google Scholar]
Prado M., Bastos L., Batista A., Antunes B., Baião F., Maçaira P., Hamacher S., Bozza F. Technical Report NOIS; 2020. Análise de subnotificação do número de casos confirmados da COVID-19 no Brasil. [Google Scholar]
R Core Team . R Foundation for Statistical Computing Vienna; Austria: 2020. R: A language and environment for statistical computing.https://www.R-project.org/ [Google Scholar]
Raftery A.E., Lewis S.M. [practical Markov chain Monte Carlo]: Comment: One long run with diagnostics: Implementation strategies for Markov chain Monte Carlo. Statistical Science. 1992;7:493–497. doi:10.1214/ss/1177011143. [CrossRef] [Google Scholar]
Ribeiro L.C., Bernardes A.T. Technical Report UFMG; UFOP: 2020. Estimate of underreporting of COVID-19 in Brazil by acute respiratory syndrome hospitalization reports. [Google Scholar]
Rodríguez-Morales A.J., MacGregor K., Kanagarajah S., Patel D., Schlagenhauf P. Going global–travel and the 2019 novel coronavirus. Travel Medicine and Infectious Disease. 2020;33(101578):101578. doi:10.1016/j.tmaid.2020.101578. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
Russel T., Hellewell J., Abbot S. Using a delay-adjusted case fatality ratio to estimate under-reporting. Available at the Centre for Mathematical Modelling of Infectious Diseases Repository. 2020 [Google Scholar]
Song P.X., Wang L., Zhou Y., He J., Zhu B., Wang F., Tang L., Eisenberg M. 2020. An epidemiological forecast model and software assessing interventions on covid-19 epidemic in China. medRxiv.https://www.medrxiv.org/content/early/2020/03/03/2020.02.29.20029421 arXiv: https://www.medrxiv.org/content/early/2020/03/03/2020.02.29.20029421.full.pdf. [CrossRef] [Google Scholar]
Spiegelhalter D.J., Best N.G., Carlin B.P., Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583–639. [Google Scholar]
Stoner O., Economou T., Drummond Marques da Silva G. Ahierarchical framework for correcting under-reporting in count data. Journal of the American Statistical Association. 2019;(1–17) [Google Scholar]
WHO WHO announces COVID-19 outbreak a pandemic. 2020. http://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/news/news/2020/3/who-announces-covid-19-outbreak-a-pandemic

Articles from Infectious Disease Modelling are provided here courtesy of KeAi Publishing

Bayesian modeling of COVID-19 cases with a correction to account for under-reported cases (2024)

Abstract

1. Introduction

2. Reported rate estimation

Then, the corrected CFR is given by

3. The SIR model

It is important to note that

4. Bayesian approach

4.1. Model specification

Then, the distributions for YtI, YtR, and θt are given below

5. Case study: the COVID-19 Brazilian data

5.1. Reported rate of COVID-19

Table 1

5.2. Estimation: Dirichlet-Beta state-space model

Table 2

Table 3

5.3. Estimation: Dirichlet-Beta state-space model considering CFR unknown

6. Simulation study

Table 4

7. Concluding remarks

Notes

Footnotes

Appendix A. Numerical Solution for SIR model

Appendix B. Figures and tables

Fig.B.6

Fig.B.5

Table B.5

References

Then, the distributions for $Y_{t}^{I}$ , $Y_{t}^{R}$ , and $θ_{t}$ are given below