Absence of caution: The European vaccine suspension fiasco

[Repeated from commoninfirmities blog.]

Multiple European countries have now suspended use of the Oxford/AstraZeneca vaccine, because of scattered reports of rare clotting disorders following vaccination. In all the talk of “precautionary” approaches the urgency of the situation seems to be suddenly ignored. Every vaccine triggers serious side effects in some small number of individuals, occasionally fatal, and we recognise that in special systems for compensating the victims. It seems worth considering, when looking at the possibility of several-in-a-million complications, how many lives may be lost because of delayed vaccinations.

I start with the case fatality rate (CFR) from this metaanalysis, and multiply them by the current overall weekly case rate, which is 1.78 cases/thousand population in the EU (according to data from the ECDC). This ignores the differences between countries, and differences between age groups in infection rate, and certainly underestimates the infection rate for obvious reasons of selective testing.

Age group0-3435-4445-5455-6465-7475-8485+
CFR (per thousand)0.040.682.37.52585283
Expected fatalities per week per million population0.071.24.11345151504
Number of days delay to match VFR120070206.

Let’s assume now that all of the blood clotting problems that have occurred in the EEA — 30 in total, according to this report — among the 5 million receiving the AZ vaccine were actually caused by the vaccine, and suppose (incorrectly) that all of those people had died.* That would produce a vaccine fatality rate (VFR) of 6 per million. We can double that to account for possible additional unreported cases, or other kinds of complications that have not yet been recognised. We can then calculate how many days of delay would cause as many extra deaths as the vaccine itself might cause.

The result is fairly clear: even the most extreme concerns raised about the AZ vaccine could not justify even a one-week delay in vaccination, at least among the population 55 years old and over. (I am also ignoring here the compounding effect of onward transmission prevented by vaccination, which makes the delay even more costly.) As is so often the case, “abundance of caution” turns out to be fundamentally reckless.

* I’m using only European data here, to account for the contention that there may be a specific problem with European production of the vaccine. The UK has used much more of the AZ vaccine, with even fewer problems.

What’s the matter with Santa Clara?

I can’t get out of my head the instantly infamous paper by Bendavid et al. at Stanford Medical School from a few weeks back, claiming to estimate the prevailing exposure to SARS-Cov-2 in Santa Clara County, California, by antibody tests on a random sample of the population. The study has garnered a lot of attention because of its optimistic outcome: Something between 1.5 and 4 percent of the population were estimated to have been already infected, far more than the official number of cases, which, if true, would imply an infection fatality rate below 0.2%.

The paper has attracted a great deal of criticism, and has already been revised. Much of the critique was targeted at the problematic sampling procedure — which I think is not such a big deal — and the weird ethical lapses. But the gaping hole in the paper is in its treatment of test specificity.

The problem, in a nutshell, is this: 1.5% of the tests they carried out (50 out of 3,330) yielded positive results. In order to estimate the true rate of seropositivity in the population, we need to know approximately how many of those positive test results were correct. They estimate the specificity from their laboratory tests of samples known to be negative (because they were produced before the origin of the virus, or, at least, its entrance into the US), of which 368 out of 371 were indeed negative. (Strangely, the first version of the paper reported 369 out of 371, which is a pretty substantial difference. This figure has not been consistently updated.) This suggests the test specificity is 99.2%, which would make about half the positive results false. But it’s even worse than that, because 3 false results out of 371 might be a randomly low number for a much higher failure rate. The paper recognises this, appealing to the Clopper-Pearson method for computing confidence intervals for binomial probabilities, which yields at a 95% confidence intervals of about (97.6%, 99.9%). If the specificity is anything below 98.5% and they measured 1.5% positives, then the maximum likelihood estimate for the true positive rate is 0. This raises two questions:

  1. They claim to have taken the specificity into account, saying “We use a bootstrap procedure to estimate confidence bounds for the unweighted and weighted prevalence, while accounting for sampling error and propagating the uncertainty in the sensitivity and specificity.” They produced a 95% CI of (0.7%, 1.8%). What did they think they were doing?
  2. How should this be analysed?

The bootstrap is a brilliant tool, but since it resamples from the data, or from the fitted model, it tends to miss out on uncertainty from what could have been observed but wasn’t. It’s not clear, though, whether the bootstrap has anything to do with the problem here. The updated version includes an appendix with a calculation from Bayes’ Theorem that the probability of a sampled person being infected is \frac{q+s-1}{r+s-1} , where s is the specificity, r is the sensitivity (the probability that a seropositive subject receives a positive test) and q is the fraction of the population with a positive test. They claim to have generated a bootstrap estimate sampling their s from the “empirical distribution” of specificity, but they don’t say what they do with the negative estimates for the infection incidence that must inevitably arise. Instead, they give (in the appendix to the updated version of the paper) a long and (to my mind) incoherent argument against critics who say that the confidence interval should include 0 — since the false positive rate has a reasonable chance of being larger than the fraction of positive test results — mainly because we know from other evidence that the infection incidence can’t actually be 0. True enough, but the point is not that the rate is actually 0%, but that these data give us no reason to believe in any particular lower bound above 0.

How should the data be analysed? The problem is, Bayes’ Theorem doesn’t play well with frequentist confidence intervals, so it’s easy to end up in a logical muddle if we try to calculate a confidence interval for the true probability of seropositive, given that we’ve observed 1.5% positive. What does it even mean? It would be quite reasonable to say, values of false-positive rate above 1.5% are perfectly plausible, in which case true positive rates down to 0% are plausible, hence there is no justification for claiming evidence for any positive lower bound on the true incidence of seropositivity, and then call it a day.

The “bootstrap” approach that they seem to have made up without much justification is a kind of incoherent mixed-Bayesian-frequentist approach: Inverting the normal approximation to the sampling distribution of the parameters (I think that’s what they mean by the “empirical distribution”) as though it were a posterior, that they then sample from to generate .

Instead, it would be most natural to analyse this in a purely Bayesian framework. You could start off with a uniform prior on the true seropositive rate π and the specificity s. Then the posterior will be proportional to \bigl( 1 - s(1-\pi)\bigr)^{50}\bigl(s(1 - \pi) \bigr)^{3280} s^{368}(1-s)^3. This doesn’t provide a nice closed-form solution, but it can be calculated numerically with various software tools, such as the Wolfram Alpha double integral calculator. The posterior distribution of π has mean 0.0067, and a symmetric 95% credible interval (0.049%, 1.41%) for the seropositive incidence. This could be repeated for each of their demographic groups, which I’m loath to take the time to do, but it would presumably inflate both ends of the confidence interval by a similar factor, producing something like (0.06%,3.8%).

The updated version of the paper includes a large number of additional tests on pre-Covid samples, bringing the total to 3324, with just 16 false positives, so again about a 99.5% specificity. If we believe this, we get a much more favourable credible interval — though still not as favourable as reported in the paper — of (0.55%,1.52%). (They report a 95% confidence interval for the unweighted incidence (0.7%, 1.8%).)

Should we believe their new results? I’m not sure. One argument against: They report on 13 different sources of negative samples, but these produce wildly variable specificities. The most extreme is a sample of 150 pre-Covid serum samples that yielded 4 false positives, and on the other side 1102 samples that yielded no false positives — both have substantially less than 1% probability if specificity is 99.5%, but they pull in different directions.

For the aficionados: Actually, there’s also a sensitivity that needs to be considered. In tests of 157 known seropositive samples, only 130 produced positive results. This isn’t as problematic as the specificity — which has the potential for wiping out all of the positive tests — but it will increase our estimates of the true positive rate, while widening the uncertainty. Writing r for the sensitivity (the probability of a positive test given that the individual is seropositive) the posterior likelihood now becomes

\bigl( 1-s+ \pi(r+s-1)\bigr)^{50}\bigl(s - \pi(r+s-1) \bigr)^{3280} s^{368}(1-s)^3 r^{130}(1-r)^{27}

with the original sensitivity data, or

\bigl( 1-s+ \pi(r+s-1)\bigr)^{50}\bigl(s - \pi(r+s-1) \bigr)^{3280} s^{3308}(1-s)^{16} r^{130}(1-r)^{27}

with the updated sensitivity data.

The 95% credible intervals for π (without demographic weighting) are then (.06%,1.7%) with the original sensitivity data, and (0.67%, 1.9%) with the updated sensitivity data. This last is fairly close to the reported results in the revised paper. Unsurprising, since with 3000 instead of 300 samples, the uncertainty in the sensitivity is no longer a major issue.

Fluctuations in reports of daily Covid deaths

As we are all watching the daily reports of Covid-19 cases and deaths, one conspicuous fact is that there are enormous fluctuations from day to day. It’s clear that we shouldn’t be too focused on the counts from an individual day to understand the trend. But why do they fluctuate so much?

Here are the numbers for the UK since the start of April (data from the European Centre for Disease Prevention and Control):

DateNo. casesNo. deaths

We have 381 deaths one day, then 743 deaths the next day, then 389, then 684. A natural explanation is that it has to do with delays in reporting weekends and/or holidays. But the 381 and 389 were on Wednesday and Friday, and the next peak came on a Saturday and Sunday. I don’t have an explanation, but I wanted to investigate whether this is a real issue. Are the day-to-day fluctuations more than you would expect purely by chance?

It’s not immediately obvious how many we should expect, since there is clearly a trend. I decided to estimate it in two different ways:

  1. Choose a range s of days — most naturally s=3 — and estimate the predicted cumulative number of events C[x] on day x as the geometric average of the cumulative number of events to day x+s, and the cumulative number to day x-s. This also gives us a natural estimate of the growth rate r over the period, r=(C[x+s]/C[x-s])^(1/2s) -1, and we calculate the Predicted number of events on day x as rC[x]/(1+r). We call this the geometric average method.
  2. As above, but we estimate r by the maximum likelihood estimator, assuming that the number events on day x is Poisson distributed with parameter rC[x-1], where C[x-1] is the observed cumulative number on day x-1. We call this the cumulative MLE method.
  3. An approach that addresses more directly the fluctuations in daily events computes the maximum likelihood estimators for a two-parameter model with parameters λ and r, where the number of events on day x+k is Poisson(λr^k), taking as observations the numbers on days x-s,x-s+1,…,x-1,x+1,…,x+s. We call this the single-day MLE method.


Here are the results for the UK. The discrepancy between observed and predicted has been standardised to standard deviation 1, so we should not expect to see results outside of the range (-2.5,+2.5) by chance.

There’s no obvious pattern. The geometric-average increase-rate is always larger, hence the discrepancies are smaller. Not surprisingly, the growth rates estimated from single-day counts are more variable, but the discrepancies are, surprisingly, less extreme. With any method the discrepancies are extreme in both directions, and don’t seem to have an obvious pattern with respect to the weekends. We see extreme positive deviations on weekends, and extreme negative deviations on weekdays. (I have shaded Sunday and Monday as weekends, supposing that most of the deaths in hospitals on Saturday and Sunday would appear in national totals the following day. According to some estimates, the most common delay is 2 days, suggesting that Monday and Tuesday counts should be expected to be low.)

Here is Germany:

These show, perhaps, a more consistent weekly pattern, with negative deviations on Sundays and Mondays.

The US:


Italy shows an odd alternating pattern:


The dynamics of new cases are expected to be different. Here is the equivalent plot for new cases in the UK

The fluctuations seem to be increasing, with huge excess number of cases reported on Easter, for some reason — 8719 new cases, where the largest reported on any other day was 5903.

The US also shows this intensification of fluctuations

Germany, Italy, and Canada don’t seem to show this pattern. The plots for Germany and Canada each is dominated by a single very extreme day, possibly corresponding to a shift in counting criteria.

Putting Covid-19 mortality into context

[Cross-posted from Common Infirmities blog.]

The age-specific estimates of fatality rates for Covid-19 produced by Riou et al. in Bern have gotten a lot of attention:

Estimated fatality in deaths per thousand cases (symptomatic and asymptomatic)

These numbers looked somewhat familiar to me, having just lectured a course on life tables and survival analysis. Recent one-year mortality rates in the UK are in the table below:

One-year mortality probabilities in the UK, in deaths per thousand population. Neonatal mortality has been excluded from the 0-9 class, and the over-80 class has been cut off at 89.

Depending on how you look at it, the Covid-19 mortality is shifted by a decade, or about double the usual one-year mortality probability for an average UK resident (corresponding to the fact that mortality rates double about every 9 years). If you accept the estimates that around half of the population in most of the world will eventually be infected, and if these mortality rates remain unchanged, this means that effectively everyone will get a double dose of mortality risk this year. Somewhat lower (as may be seen in the plots below) for the younger folk, whereas the over-50s get more like a triple dose.