Data Not Collected is Not Data


In science, data is king. However, the problem with this becomes quickly evident to anyone clever enough to start asking questions like ‘what data? How was it collected? When? Where?’ and so on and so forth. Incomplete or incorrect data, or simply the lack of any data, means that the hypothesis cannot be supported, or should not be (although it often is claimed to be perched atop a heap of junky data and crowned a winner by those who really ought to know better).

Which brings me, obliquely, to current events. I have said, and will continue to say, that we do not know the true depth and breadth of this pandemic. We cannot know it. We simply do not have sufficient data, and we never will have it. Going back after the fact and testing for immunity is very important, and hopefully will begin any day now, for reasons that should be obvious to anyone reading these words. However, antibody testing will never be done on the entire population of the US for reasons that, while obvious to most, are rooted in some of the things that make Americans great: we are both ferociously independent, and maddeningly difficult to get to do anything en masse. People will simply not take part in the studies, or the studies will for lack of funding not generate a true picture of the population. The best and only way to have generated an accurate real time picture of the pandemic was to have been testing early and often. We have done neither. Therefore: we do not have data. And without data, you cannot know the truth. We can only get glimpses of it, and try to piece those flashes of insight into something that makes sense to us. Which might resemble reality about as well as a Lisa Franks painting does. 

I have slim hopes that the coming wave of antibody tests will help light break through the clouds. Derek Lowe in his usual pithy and wonderfully clear way lays out  how they work as follows:  

the tests are looking for two antibody subclasses, IgG and IgM. The IgM ones are the first that get produced in an immune response, mostly coming from the spleen, but they’re also relatively short-lived, with a half-life of five or six days. So detection of IgM against coronavirus antigens indicates a recent (or still active) infection. The IgG antibodies are more numerous in the end, though, and for many infections (measles, chickenpox, mumps, hepatitis B and more) they indicate that a person is now immune to re-infection.

So on that paper strip, the plasma will hit a band of anti-IgM antibodies, bound to the paper, and then a band of anti-IgG antibodies, and finally a band of control antibodies that react with human antibodies in general. Remember, the plasma is carrying the test patient’s antibodies that are holding onto antigens with colloidal gold particles tied to them. When these hit one of those antibody-to-antibodies zones, they’ll come to a halt there, and the colloidal gold particles will pile up enough in that zone to show you a red-pink color. So the test strip can show red lines for either IgG or IgM, both, or neither, but if there’s no red line in the control strip then something has gone wrong and the test needs to be discarded and run again with a fresh kit.

You can realize, then, that if a person shows positive for IgM only then they may well be actively infected. And if they show only IgG, they may well have gone through an infection and could be immune (more about that in a minute). Showing both, well, you’re probably on the back end of an infection? And showing neither (but with a valid control line) could mean that you haven’t been exposed to the virus at all. (read the rest at In the Pipeline, and I recommend that you take a look at his ongoing series about the pandemic and drug/vaccine development as well)

It’s not just me frustrated at the state of testing here in the US. Or rather, the lack of it. Even before I fell ill with a Flu-Like Illness (FLI) a couple of weeks ago (which I am still recovering from) it offended my inner scientist that we were basing public policies and reactions on very little to no data. We did not know, we do not know, we will never know. And the blame lies, as this NYTimes article lays it out, on the government. “The C.D.C. also tightly restricted who could get tested and was slow to conduct “community-based surveillance,” a standard screening practice to detect the virus’s reach. Had the United States been able to track its earliest movements and identify hidden hot spots, local quarantines might have confined the disease. Dr. Stephen Hahn, the commissioner of the Food and Drug Administration, enforced regulations that paradoxically made it tougher for hospitals, private clinics and companies to deploy diagnostic tests in an emergency. Other countries that had mobilized businesses were performing tens of thousands of tests daily, compared with fewer than 100 on average in the United States, frustrating local health officials, lawmakers and desperate Americans.”
Whatever else we learn from this whole debacle, the takeaway from what data we do have is clear: the bureaucracy failed, and people are dying because of it. “Not until Feb. 25 did state health labs receive permission from the Food and Drug Administration to develop their own tests. Almost immediately, new cases of COVID-19 were identified. Meanwhile, university hospital labs also capable of developing tests were eager to do so, but they were frustrated by FDA red tape that wasn’t eased until Feb. 29. Federal criteria for a coronavirus test were highly stringent at first — only to people with symptoms and connection to travel to China. That restriction has been scaled back. But for whatever reason, state health agencies — whether relying on federal standards or their own, or rationing — have been finding reasons to turn people away, including health care workers.” Despite many rushing to pin the blame on the president, it’s clear that the true faults are in the entrenched policies – existing long before the current figurehead  – and regulations that trapped states from caring for their own people while changes trickled down from the federal power centers. 
Where we do have data, as for instance from the cruise ship Diamond Princess, we find that the numbers are not as bleak as the mainstream media seems so eager to paint them. “Using the Diamond Princess data, a team reports in Eurosurveillance1 that by 20 February, 18% of all infected people on the ship had no symptoms. “That is a substantial number,” says co-author Gerardo Chowell, a mathematical epidemiologist at Georgia State University in Atlanta. But the passengers included a large number of elderly people, who are most likely to develop severe disease if infected, so the share of asymptomatic people in the general population is likely to be higher, he says.” Which means that by only testing those people who are sick, usually sick enough to be hospitalized, we simply cannot know the scope of the disease. Me? They refused to even see me, much less test me, as I was running a fever. I’ll never know what my FLI was. Probably another virus, although the timing is suspicious, and having had a flu vaccination does not ward you against all strains of influenza. It would be nice to know, and have that data point. Life rarely gives you all the answers, though. 

The data collected so far on how many people are infected and how the epidemic is evolving are utterly unreliable. Given the limited testing to date, some deaths and probably the vast majority of infections due to SARS-CoV-2 are being missed. We don’t know if we are failing to capture infections by a factor of three or 300. Three months after the outbreak emerged, most countries, including the U.S., lack the ability to test a large number of people and no countries have reliable data on the prevalence of the virus in a representative random sample of the general population.

This evidence fiasco creates tremendous uncertainty about the risk of dying from Covid-19. Reported case fatality rates, like the official 3.4% rate from the World Health Organization, cause horror — and are meaningless. Patients who have been tested for SARS-CoV-2 are disproportionately those with severe symptoms and bad outcomes. As most health systems have limited testing capacity, selection bias may even worsen in the near future.

You should definitely read the rest of that STAT article

What does all of this mean? Well, I keep saying it. Don’t panic. Panic does no one any good. 


6 responses to “Data Not Collected is Not Data”

  1. We do have one other important control experiment: Iceland tested a far larger percentage of their population than anyone else. They have a small population (though 2/3 concentrated around Reykjavik) and a vigorous biotech sector: local firm deCODE offers free testing to anyone who wants.
    They found about 50% of infectees is asymptomatic. Their IFR (infection fatality rate) stands at 0.3% right now.

    1. Thanks, Nitay. I knew Iceland had been doing good things with testing, but hadn’t found an article on it. That’s good news about the IFR.

  2. Related to Mr. Arbel’s post, I’ve been getting hard core twitches when folks switch around the “death rates” for different populations. There’s an insane difference between, to pick an easy example, the Diamond Princess numbers (simplified, will note below) done in different ways.

    Ignoring the need to adjust for population differences, you can just as honestly say that they had a .2%, a 1%, a 2%, and an 85% death rate, when you’re not talking in terms of art.

    Numbers are 4000 on the ship, 800 infected, 400 had symptoms, 45 were hospitalized and 8 died. (Rounded up in all cases except for the hospitalized, because I’ve only ever seen exactly 45 noted.)

    Obviously, using “deaths in those who had symptoms” on the entire population is going to produce incredible but inaccurate numbers.

    Not only do we not have a lot of information, but a lot of the inputs are given without any of the important death rate. Sarah mentioning that Colorado at least in some places is diagnosing the kung flu by phone consultation? That’s going to be insanely different than, say, Okanogan County Washington where they did about a hundred tests to find a positive.

    1. /sigh
      Totally screwed up the hospitalized-to-dead number, should’ve been 18%.

  3. […] a blog post titled “Data Not Collected Is Not Data,” Cedar […]

  4. Thanks, Cedar.