In science, data is king. However, the problem with this becomes quickly evident to anyone clever enough to start asking questions like ‘what data? How was it collected? When? Where?’ and so on and so forth. Incomplete or incorrect data, or simply the lack of any data, means that the hypothesis cannot be supported, or should not be (although it often is claimed to be perched atop a heap of junky data and crowned a winner by those who really ought to know better).
Which brings me, obliquely, to current events. I have said, and will continue to say, that we do not know the true depth and breadth of this pandemic. We cannot know it. We simply do not have sufficient data, and we never will have it. Going back after the fact and testing for immunity is very important, and hopefully will begin any day now, for reasons that should be obvious to anyone reading these words. However, antibody testing will never be done on the entire population of the US for reasons that, while obvious to most, are rooted in some of the things that make Americans great: we are both ferociously independent, and maddeningly difficult to get to do anything en masse. People will simply not take part in the studies, or the studies will for lack of funding not generate a true picture of the population. The best and only way to have generated an accurate real time picture of the pandemic was to have been testing early and often. We have done neither. Therefore: we do not have data. And without data, you cannot know the truth. We can only get glimpses of it, and try to piece those flashes of insight into something that makes sense to us. Which might resemble reality about as well as a Lisa Franks painting does.
I have slim hopes that the coming wave of antibody tests will help light break through the clouds. Derek Lowe in his usual pithy and wonderfully clear way lays out how they work as follows:
the tests are looking for two antibody subclasses, IgG and IgM. The IgM ones are the first that get produced in an immune response, mostly coming from the spleen, but they’re also relatively short-lived, with a half-life of five or six days. So detection of IgM against coronavirus antigens indicates a recent (or still active) infection. The IgG antibodies are more numerous in the end, though, and for many infections (measles, chickenpox, mumps, hepatitis B and more) they indicate that a person is now immune to re-infection.
So on that paper strip, the plasma will hit a band of anti-IgM antibodies, bound to the paper, and then a band of anti-IgG antibodies, and finally a band of control antibodies that react with human antibodies in general. Remember, the plasma is carrying the test patient’s antibodies that are holding onto antigens with colloidal gold particles tied to them. When these hit one of those antibody-to-antibodies zones, they’ll come to a halt there, and the colloidal gold particles will pile up enough in that zone to show you a red-pink color. So the test strip can show red lines for either IgG or IgM, both, or neither, but if there’s no red line in the control strip then something has gone wrong and the test needs to be discarded and run again with a fresh kit.
You can realize, then, that if a person shows positive for IgM only then they may well be actively infected. And if they show only IgG, they may well have gone through an infection and could be immune (more about that in a minute). Showing both, well, you’re probably on the back end of an infection? And showing neither (but with a valid control line) could mean that you haven’t been exposed to the virus at all. (read the rest at In the Pipeline, and I recommend that you take a look at his ongoing series about the pandemic and drug/vaccine development as well)
The data collected so far on how many people are infected and how the epidemic is evolving are utterly unreliable. Given the limited testing to date, some deaths and probably the vast majority of infections due to SARS-CoV-2 are being missed. We don’t know if we are failing to capture infections by a factor of three or 300. Three months after the outbreak emerged, most countries, including the U.S., lack the ability to test a large number of people and no countries have reliable data on the prevalence of the virus in a representative random sample of the general population.
This evidence fiasco creates tremendous uncertainty about the risk of dying from Covid-19. Reported case fatality rates, like the official 3.4% rate from the World Health Organization, cause horror — and are meaningless. Patients who have been tested for SARS-CoV-2 are disproportionately those with severe symptoms and bad outcomes. As most health systems have limited testing capacity, selection bias may even worsen in the near future.
You should definitely read the rest of that STAT article.
What does all of this mean? Well, I keep saying it. Don’t panic. Panic does no one any good.
Comments
6 responses to “Data Not Collected is Not Data”
We do have one other important control experiment: Iceland tested a far larger percentage of their population than anyone else. They have a small population (though 2/3 concentrated around Reykjavik) and a vigorous biotech sector: local firm deCODE offers free testing to anyone who wants. https://spinstrangenesscharm.wordpress.com/2020/04/03/covid19-update-april-3-2020-what-does-icelands-unique-dataset-tell-us/
They found about 50% of infectees is asymptomatic. Their IFR (infection fatality rate) stands at 0.3% right now.
Thanks, Nitay. I knew Iceland had been doing good things with testing, but hadn’t found an article on it. That’s good news about the IFR.
Related to Mr. Arbel’s post, I’ve been getting hard core twitches when folks switch around the “death rates” for different populations. There’s an insane difference between, to pick an easy example, the Diamond Princess numbers (simplified, will note below) done in different ways.
Ignoring the need to adjust for population differences, you can just as honestly say that they had a .2%, a 1%, a 2%, and an 85% death rate, when you’re not talking in terms of art.
Numbers are 4000 on the ship, 800 infected, 400 had symptoms, 45 were hospitalized and 8 died. (Rounded up in all cases except for the hospitalized, because I’ve only ever seen exactly 45 noted.)
Obviously, using “deaths in those who had symptoms” on the entire population is going to produce incredible but inaccurate numbers.
Not only do we not have a lot of information, but a lot of the inputs are given without any of the important death rate. Sarah mentioning that Colorado at least in some places is diagnosing the kung flu by phone consultation? That’s going to be insanely different than, say, Okanogan County Washington where they did about a hundred tests to find a positive.
/sigh
Totally screwed up the hospitalized-to-dead number, should’ve been 18%.
[…] a blog post titled “Data Not Collected Is Not Data,” Cedar […]
Thanks, Cedar.