Accounting for bias when analyzing public data

We tend to overestimate the reliability of authority figures, and this impacts how we should analyze data for public policy.

Public data is an intrinsic appeal to authority

The CDC's WONDER database keeps track of causes of death within the United States. When a death certificate is created for a person in the United States, the certificate includes a special code indicating the cause of death. Through a lengthy process, that information makes its way from the funeral home or hospital to a state registry to the National Vital Statistics System and finally to the CDC. CDC tracks that information in WONDER, which can be partially queried by the public.

WONDER is used by scientists, researchers and journalists for all sorts of reasons. It was data from WONDER that largely provided the justification for the claim the the United States has been undergoing an epidemic of heroin addiction. And by any measure, the US has a serious problem with heroin and abuse of other opiate drugs. But WONDER can only provide us with a rough indication of what the problem looks like in reality.

Lets say you want to count the number of people who have overdosed from heroin specifically - you want to make sure that overdoses from oxycontin and fentanyl are counted separately, because the chemicals have vastly different chains of custody and responsibility. Heroin in the US in 2021 is provided largely by drug cartels in Mexico. Oxycontin is created by legal drugmakers in the US and diverted or stolen. Fentanyl is somewhat of a combination of these two pictures, with the added wrinkle that the primary source of precursors for illegal fentanyl manufacture are provided by China. Let's say you want to determine the damages a legal drugmaker is responsible for in one of the recent lawsuits surrounding that issue: there are many possible situations where knowing the difference between these numbers has profound consequences.

When we search WONDER for cause of death information, researchers tend to make several assumptions.

1. There is a consistent criteria for establishing cause of death

2. The cause of death criteria must reliably distinguish between various causes of death

3. The rate of processing errors must be minimized.

All three of those assumptions are wrong as it relates to our test query for heroin-only deaths within WONDER. Of course, that's quite a claim from someone who doesn't have a medical degree. What am I talking about? Let's review each complaint above in more detail:

There is a consistent criteria for establishing cause of death

The information within WONDER is the result of thousands upon thousands of judgements by individuals across the country. Ostensibly, the International Classification of Diseases (ICD-10) provides just that. The ICD is a labeling system that allows clinicians to use the same "language" to specify diagnosis. That should ensure that diagnosis are specific and consistent.

Although ICD-10 codes themselves can be specific, there is a substantial degree of ambiguity around . There is no consistent way of resolving that ambiguity, and so we lack a consistent methodology for establishing cause of death.

A related, but distinct, concern here is not only how the criteria is applied but who is applying the criteria. There tends to be an assumption that cause of death is ascertained by a physician. That is true for most cases, but in some States the Coroner is an elected position with no formal medical requirement. Trained doctors are capable of doing bad work, too. The consequences of incompetence among US coroners is tragic, ongoing and not widely understood, but we should assume that its impacts could be statistically significant.

The cause of death criteria must reliably distinguish between various causes of death

Let's assume for a moment that the problem with ICD-10 doesn't exist: the criteria are identical and applied the same way across the country. But what if the criteria don't work?

The Merck Manual describes drug metabolism as follows:

Some drugs are chemically altered by the body (metabolized). The substances that result from metabolism (metabolites) may be inactive, or they may be similar to or different from the original drug in therapeutic activity or toxicity. Some drugs, called prodrugs, are administered in an inactive form, which is metabolized into an active form. The resulting active metabolites produce the desired therapeutic effects. Metabolites may be metabolized further instead of being excreted from the body. The subsequent metabolites are then excreted. Excretion involves elimination of the drug from the body, for example, in the urine or bile.

How does this relate to how we detect overdoses in corpses? If drugs change when humans take them, how can we determine the sort of subtle measurements in concentration and causation to determine, for example, whether a patient with lung cancer stopped breathing because of their tumor or their medication? The abstract from Ferner's 2008 article in the British Journal of Clinical Pharmacology tells us we cannot:

Clinical pharmacology assumes that deductions can be made about the concentrations of drugs from a knowledge of the pharmacokinetic parameters in an individual; and that the effects are related to the measured concentration. Post-mortem changes render the assumptions of clinical pharmacology largely invalid, and make the interpretation of concentrations measured in post-mortem samples difficult or impossible. Qualitative tests can show the presence of substances that were not present in life, and can fail to detect substances that led to death. Quantitative analysis is subject to error in itself, and because post-mortem concentrations vary in largely unpredictable ways with the site and time of sampling, as a result of the phenomenon of post-mortem redistribution. Consequently, compilations of ‘lethal concentrations’ are misleading. There is a lack of adequate studies of the true relationship between fatal events and the concentrations that can be measured subsequently, but without such studies, clinical pharmacologists and others should be wary of interpreting post-mortem measurements.

Counting opiate-related morbidity is particularly difficult, because opiates tend to metabolize into other opiates. How can you count the number of people who die from a specific opiate if users of different drugs will possess metabolites for the drug you're looking for?

This particular error compounds with the previous error as doctors make their own determinations on how to resolve this particular ambiguity.

The rate of processing errors must be minimized.

This concern is somewhat different than the last two. The prior two concerns were about how the data within WONDER is collected. Solving those problems would involve changing the sampling process of WONDER. When we talk about processing errors, we ware talking about problems that are involved with the administration and storage of the data rather than its initial collection.

Individual states have different rules for determining cause of death and who can make that determination. They also have very different ways of collecting and storing that data at the state level. I have only reviewed data from this level of aggregation from a few states. In one state, I identified a miscounting in death data for a single county. Apparently there had been a typo that added a digit to the number of deaths for that county.

This example stuck with me years after the fact because of how easily this type of error could be detected with simple programming tools. The state data contained several tables with overlapping values. This created a situation where the value from the typo wasn't just unusual but inconsistent with the other data. I didn't need to write some sophisticated model of this information - the value wasn't obviously wrong in terms of mean/median/mode sort of stuff, either. But the value didn't match other instances in the same dataset, and when that value was assumed to be true it invalidated pre-calculated column sums. Looking for this sort of error is the bare minimum in terms of minimizing errors in process.

Ad verecundiam

Appeals to authority are a fundamental logical fallacy: ad verecundiam. It gets all of us from time to time. We tend to assume that information provided by a trusted authority is more reliable than the same information provided by another source. Government is the classic example of such an authority.

None of the individual problems I have listed here are new to statisticians. These are very old problems.

Because researchers tend to overestimate the reliability of information provided from authority figures, we tend to underestimate errors in public data. When confronted with evidence of substantial process errors in public data, we tend to assume that these errors have already been "accounted for" in some way.

Statistics provides us with the tools to account for bias introduced through both sampling and non-sampling errors. However, to accurately adjust for that bias it is necessary to accurately gauge the extent of that bias.

This isn't about the CDC

I've picked on a database that is administrated the CDC here. The CDC is just an example whose data I've worked with. The problem I am trying to illustrate isn't with the CDC but with the people who rely on information provided by the government: to some extent, that includes everyone.

None of the concerns here assume any sort of malevolence on the part of anyone in authority. Whether or not people in government are immoral people isn't germane. Bias must be accounted for any time we deal with statistics. My argument is precisely that sources of government information are not special in this respect. There would be bias problems whether the data was created by butchers or bakers or candle stick makers.

It is possible to underestimate reliability, as well. Even taking into account the issues I outlined above, the WONDER database is by far the most reliable source for national cause of death information. Weighting its reliability alongside sources with even less reliable processes does not get us closer to a clear representation of reality. The analysis of data compiled by the government can be a force for enormous social utility, but that capacity for utility is contingent upon our ability to truly listen to the numbers.

Josh Wieder

Search This Blog