Primary author: David Jorgensen
Olivia Boyd, Lily Geidelberg, David Jorgensen, Manon Ragonnet, Igor Siveroni, Erik Volz and the Imperial College COVID-19 Response Team
Report prepared on 2020-05-18
This report uses full genome sequence data shard publicly by Gujarat Biotechnology Research Centre and a set of international background sequences from GISAID (laboratory acknowledgements)
- This is a preliminary analysis based on a small number of genetic sequences sampled in Gujarat uploaded to gisaid before 2020-05-13.
- We estimate a number of key epidemiological parameters based on the genetic diversity of these samples alongside a set of closely related sequences from elsewhere which act as a global reservoir.
- We estimate a low initial R (R0) of 1.99 which falls to 1.55 [95% CrI 0.62-1.95] by May 4th. As only a small number of sequences are available it is difficult to estimate changes in R(t) over time and these estimates are likely to change with more sequence data.
This is analysis is based on :
- 46 whole genomes sampled from within Gujarat
- 50 whole genomes sampled from outside of Gujarat
- Samples within Gujarat were collected between 2020-04-05 and 2020-05-04
As duplicate sequences, those with likely sequencing errors or significant gaps were removed prior to analysis this represents a smaller sample than the total number of Gujarat samples uploaded to gisaid. Figure 1 shows the distribution of the included sequences over time, including external sequences. As we were not able to acquire information on the sampling strategy used when collecting these sequences there may be biases introduced where sequences are more closely related than would be expected in the general population (eg. those collected from contact tracing) or where they are targeted to specific groups in the population who may have different characteristics to the general population (eg. travelers or healthcare workers).
Reported cases for comparison to our model predictions are taken from covid19india.org, a crowdsourced database for SARS-CoV-2 data from India and so may not match reported cases from government sources exactly. These data are used for comparison purposes only and do not influence the analysis.
Figure 1: Sampling distributions over time of number of sequences included within the region versus sequences included from the international reservoir.
In this preliminary analysis we estimate 12005 [3539-53978] median [95%CI] cumulative infections at the time of the last sample (2020-05-04) by fitting a phylodynamic model to SARS-CoV-2 sequence data. 5804 Cases of coronavirus were confirmed
Figure 2: Estimated cumulative infections through time represented by solid black line (median) and 95% CrI (ribbon). Gold points represent reported cases in Gujarat. The dashed line indicates the date of last sample in Gujarat in this analysis.
Figure 3: Estimated daily infections through time represented by solid black line (median) and 95% CrI (ribbon). Gold points represent reported cases in Gujarat. The dashed line indicates the date of last sample in Gujarat in this analysis.
Figure 4: Estimated percentage of daily cases reported in Gujarat. Error bars represent the 95% credible interval.
We estimate reporting rate for Gujarat over time based on comparison of our model predictions to reported case data. These estimates have high uncertainty due to the small number of local sequences used in this analysis.
*Figure 5: Reproduction number through time. The black vertical dashed line indicates the date of last sample in Gujarat in this analysis. The red dashed line indicates the date of lockdown in India. *
Reproduction number at last sample (2020-05-04): 1.55 [0.62-1.95] median [95% CrI]
How quickly has the epidemic in Gujarat grown?
|Quantile||Reproduction number||Growth rate (per day)||Doubling time (days)|
Table 1: Reproduction number, growth rate and doubling times
How has SARS-CoV 2 evolved in Gujarat?
Figure 6: Maximum likelihood phylogeny with the x-axis representing NT substitutions per site. The colour of the tips corresponds to sampling location; red tips were sampled from within Gujarat, grey tips from outside
Figure 7: Time scaled phylogeny co-estimated with epidemiological parameters. The colour of the tips corresponds to location sampling; red tips were sampled from within Gujarat.
We present here a time scaled phylogeny of SARS-CoV-2 in Gujarat and the included international reservoir. As few nodes have high posterior support we also present a maximum likelhood phylogeny showing genetic distance between the included sequences. Gujarat sequences tend to cluster closely together suggesting that the majority of sequenced SARS-CoV-2 results from transmission within the region rather than introductions from elsewhere. This could be the result of significant local transmission or due to a local sampling strategy targeted at interconnected individuals.
Molecular clock rate of evolution: 0.00105 [0.000825-0.00134] median [95% CrI]
Details on methods and priors can be found here.
| Statistic | mean | ESS | |:-------------------:|:--------:|:-----:| | posterior | -42946 | 1033 | | likelihood | -42847 | 13584 | | prior | -98.92 | 944 | | treeLikelihood.algn | -42847 | 13584 | | TreeHeight | 0.3144 | 726 | | clockRate | 0.00106 | 3352 | | kappa | 4.49 | 16492 | | PhydynSEIR | -67.35 | 1036 | | seir.E | 58.45 | 10373 | | seir.S | 106394 | 5379 | | seir.b | 14.98 | 7672 | | seir.exog | 0.001132 | 820 | | seir.exogGrowthRate | 28.32 | 146 | | seir.importRate | 2.444 | 17249 | | seir.p_h | 0.2161 | 12488 | | seir.tau | 73.28 | 23737 | | freqParameter.1 | 0.2978 | 6067 | | freqParameter.2 | 0.1826 | 6304 | | freqParameter.3 | 0.1951 | 6319 | | freqParameter.4 | 0.3245 | 5658 | | gamma0 | 73 | NA | | gamma1 | 121.7 | 27 |
Table 2: Effective sample size of model parameters
Model version: seijr_0.1.1_coupled
Report version: 20200518-175634-7ce14862
This work was supported by the MRC Centre for Global Infectious Disease Analysis at Imperial College London.