Primary author: David Jorgensen

Olivia Boyd, Lily Geidelberg, David Jorgensen, Manon Ragonnet, Igor Siveroni, Erik Volz and the Imperial College COVID-19 Response Team

Report prepared on 2020-05-18

This report uses full genome sequence data shard publicly by Gujarat Biotechnology Research Centre and a set of international background sequences from GISAID (laboratory acknowledgements)

Key points

  • This is a preliminary analysis based on a small number of genetic sequences sampled in Gujarat uploaded to gisaid before 2020-05-13.
  • We estimate a number of key epidemiological parameters based on the genetic diversity of these samples alongside a set of closely related sequences from elsewhere which act as a global reservoir.
  • We estimate a low initial R (R0) of 1.99 which falls to 1.55 [95% CrI 0.62-1.95] by May 4th. As only a small number of sequences are available it is difficult to estimate changes in R(t) over time and these estimates are likely to change with more sequence data.

This is analysis is based on :

  • 46 whole genomes sampled from within Gujarat
  • 50 whole genomes sampled from outside of Gujarat
  • Samples within Gujarat were collected between 2020-04-05 and 2020-05-04

As duplicate sequences, those with likely sequencing errors or significant gaps were removed prior to analysis this represents a smaller sample than the total number of Gujarat samples uploaded to gisaid. Figure 1 shows the distribution of the included sequences over time, including external sequences. As we were not able to acquire information on the sampling strategy used when collecting these sequences there may be biases introduced where sequences are more closely related than would be expected in the general population (eg. those collected from contact tracing) or where they are targeted to specific groups in the population who may have different characteristics to the general population (eg. travelers or healthcare workers).

Reported cases for comparison to our model predictions are taken from covid19india.org, a crowdsourced database for SARS-CoV-2 data from India and so may not match reported cases from government sources exactly. These data are used for comparison purposes only and do not influence the analysis.

plot of chunk sampling dist

Figure 1: Sampling distributions over time of number of sequences included within the region versus sequences included from the international reservoir.

Estimated infections

In this preliminary analysis we estimate 12005 [3539-53978] median [95%CI] cumulative infections at the time of the last sample (2020-05-04) by fitting a phylodynamic model to SARS-CoV-2 sequence data. 5804 Cases of coronavirus were confirmed

plot of chunk Cumulative estimated infections through time

plot of chunk Cumulative estimated infections through time log scale

Figure 2: Estimated cumulative infections through time represented by solid black line (median) and 95% CrI (ribbon). Gold points represent reported cases in Gujarat. The dashed line indicates the date of last sample in Gujarat in this analysis.

plot of chunk daily estimated infections through time

plot of chunk daily estimated infections through time log scale

Figure 3: Estimated daily infections through time represented by solid black line (median) and 95% CrI (ribbon). Gold points represent reported cases in Gujarat. The dashed line indicates the date of last sample in Gujarat in this analysis.

plot of chunk reporting

Figure 4: Estimated percentage of daily cases reported in Gujarat. Error bars represent the 95% credible interval.

We estimate reporting rate for Gujarat over time based on comparison of our model predictions to reported case data. These estimates have high uncertainty due to the small number of local sequences used in this analysis.

plot of chunk Rt

*Figure 5: Reproduction number through time. The black vertical dashed line indicates the date of last sample in Gujarat in this analysis. The red dashed line indicates the date of lockdown in India. *

Reproduction number at last sample (2020-05-04): 1.55 [0.62-1.95] median [95% CrI]

How quickly has the epidemic in Gujarat grown?

Quantile Reproduction number Growth rate (per day) Doubling time (days)
50% 1.99 0.104 6.67
2.5% 1.6 0.0671 4.68
97.5% 2.52 0.148 10.3

Table 1: Reproduction number, growth rate and doubling times

How has SARS-CoV 2 evolved in Gujarat?

plot of ml tree

Figure 6: Maximum likelihood phylogeny with the x-axis representing NT substitutions per site. The colour of the tips corresponds to sampling location; red tips were sampled from within Gujarat, grey tips from outside

plot of chunk mcc_tree

Figure 7: Time scaled phylogeny co-estimated with epidemiological parameters. The colour of the tips corresponds to location sampling; red tips were sampled from within Gujarat.

We present here a time scaled phylogeny of SARS-CoV-2 in Gujarat and the included international reservoir. As few nodes have high posterior support we also present a maximum likelhood phylogeny showing genetic distance between the included sequences. Gujarat sequences tend to cluster closely together suggesting that the majority of sequenced SARS-CoV-2 results from transmission within the region rather than introductions from elsewhere. This could be the result of significant local transmission or due to a local sampling strategy targeted at interconnected individuals.

Molecular clock rate of evolution: 0.00105 [0.000825-0.00134] median [95% CrI]

Methods summary

Details on methods and priors can be found here.



|      Statistic      |   mean   |  ESS  |
|:-------------------:|:--------:|:-----:|
|      posterior      |  -42946  | 1033  |
|     likelihood      |  -42847  | 13584 |
|        prior        |  -98.92  |  944  |
| treeLikelihood.algn |  -42847  | 13584 |
|     TreeHeight      |  0.3144  |  726  |
|      clockRate      | 0.00106  | 3352  |
|        kappa        |   4.49   | 16492 |
|     PhydynSEIR      |  -67.35  | 1036  |
|       seir.E        |  58.45   | 10373 |
|       seir.S        |  106394  | 5379  |
|       seir.b        |  14.98   | 7672  |
|      seir.exog      | 0.001132 |  820  |
| seir.exogGrowthRate |  28.32   |  146  |
|   seir.importRate   |  2.444   | 17249 |
|      seir.p_h       |  0.2161  | 12488 |
|      seir.tau       |  73.28   | 23737 |
|   freqParameter.1   |  0.2978  | 6067  |
|   freqParameter.2   |  0.1826  | 6304  |
|   freqParameter.3   |  0.1951  | 6319  |
|   freqParameter.4   |  0.3245  | 5658  |
|       gamma0        |    73    |  NA   |
|       gamma1        |  121.7   |  27   |


Table 2: Effective sample size of model parameters

Model version: seijr_0.1.1_coupled

Report version: 20200518-175634-7ce14862

Acknowledgements

This work was supported by the MRC Centre for Global Infectious Disease Analysis at Imperial College London.

Sequence data were provided by GISAID and these laboratories.