It may not seem fair to pick on Doctor’s Data (DDI), since they are probably no worse (and no better) than any of their mail-order lab competitors. But Doctor’s Data has been the “lab of choice” for a number of mercury-autism “studies”, so its reliability has more bearing on the validity of the “research” supporting the mercury-autism connection.
Well, the first thing that pops up on a careful examination of the DDI website is that their reference ranges (sometimes called “normal ranges”) are different (lower) than those used by the more standard clinical laboratories (0 – 5 mcg/day vs 0 – 15 mcg/day). Now, I’ve heard it argued that their assays are more “sensitive”, and so a lower reference range is appropriate, but this doesn’t hold up to careful scrutiny.
To begin with, DDI is using the exact same equipment that the clinical lab at the local University hospital is using, and also the same as used by nationwide clinical laboratories (where clinics and smaller hospitals send their urine mercury samples). Besides, a greater sensitivity only lowers the minimum detectable level; it doesn’t affect the value.
Some of you are probably asking yourselves, “What does it matter? Mercury is bad, so any amount is bad…right?”
Mercury is a ubiquitous element – it is found everywhere on the planet, even in pristine alpine lakes. It is in the air you breathe, the food you eat and the water you drink. It comes from burning fossil fuels, industrial processes, broken fluorescent tubes (less with the newer ones) volcanic eruptions and deep sea hydrothermal vents. In short, everybody has mercury in their bodies from the time they are conceived until (and after) they die.
As a result, it is expected that any urine sample (or blood or hair or…) from any person will have some amount of mercury in it. Analyses that fail to show mercury are simply not sensitive enough – it’s there, they just aren’t seeing it. And none of this has any impact on DDI’s decision to use an upper limit of “normal” that is one third of the generally accepted value.
I suppose we ought to first talk a bit about how these “normal ranges” are determined. To begin with, they are not simply “agreed upon” or “generally accepted” values (unlike some of the EPA and OSHA limits). The “normal” or reference range is determined by measuring the values of at least one thousand “healthy, normal” volunteers. I put “healthy” and “normal” inside quotation marks because there is some debate about how these characteristics are defined and determined. The general practice is to use college students, often athletes, who report no health problems. For tests which may show age related changes (e.g. testosterone), volunteers of the appropriate age (and/or sex) are recruited. This is called “normalization” or “norming”.
In the end, all of the values for the thousand or more “healthy, normal” volunteers are collated and the “normal” or reference range is set to include 95% of the values obtained. This leaves 5% of the presumably “healthy, normal” population outside of the “normal” range, which is why I put “normal” in quotes and why the preferred term is “reference range”.
One of the conundrums in clinical laboratory medicine is this 5% of the “normal” population that – by definition – is outside of the reference range. This is particularly problematic when you are running a battery or “panel” of tests, since each one of them has a reference range that excludes 5% of the “healthy, normal” population. It is a relatively simple exercise in probability to find how many tests you have to do to reach the point where there is a greater than 50% chance that one of the results will be “abnormal” (i.e. outside the reference range) in a person who is “healthy” and “normal”[Answer: 14].
Good, responsible clinicians are aware of this fact and guide their interventions accordingly. Less responsible clinicians may see this as an opportunity. After all, if you order a “panel” of 30 tests, there is a 78.5% chance that a “healthy, normal” person will have at least one abnormality. And this applies when the tests have been properly normalized (tested with over one thousand “healthy, normal” volunteers) and have the appropriate reference ranges used. What happens if you change those assumptions?
Let’s start with proper normalization. One technique used by labs when they are performing a laboratory test that has never had a formal normalization process is to use a much smaller group of “healthy, normal” volunteers (up to one hundred) and create a provisional normalization. Because of the potential for errors and random variation with such a small test group, such tests are usually marked as “experimental”, “provisional” or other such verbiage – to warn the unwary that these tests should not be given too much credence.
Another, less valid approach (often used by mail-order or “do-it-yourself” labs) is to use the specimens provided as a source of the values for the normalization process. This is fraught with errors, since the specimens are usually being sent because the client suspects that they have a health problem. Although, for the mail-order or “do-it-yourself” laboratories, this may be less of a problem, since they seem to cater to the “worried well” segment of the population.
One rather less savory technique seen at some laboratories (of the mail-order or “do-it-yourself” variety, exclusively) is to set the reference range of a test as the mean (“average”) of the “normal” values (see above), plus and minus one standard deviation. For those not familiar with the standard deviation, it is a measure of the variation within a group – the bigger the number, the wider the range of values around the mean. A handy feature of the standard deviation is that – for a population with a “normal” (this time, “normal” means “bell shaped distribution curve”) distribution – the mean plus and minus two standard deviations encompasses about 95% of the population (95.5%). Plus and minus one standard deviation from the mean only encompasses 68.3% of the population, which would leave 31.7% of the ”healthy, normal” population outside of the reference range.
Now, if you use the +/- one standard deviation stratagem, it takes a lot fewer tests to get an abnormal result from a “healthy, normal” person. In fact, the chance of having one “abnormal” result is over 50% after only two tests. A meager panel of five tests raises the chance to over 85%. This can be a real boon to those looking for an abnormal result – any abnormal result – to show their worried-well patients.
This brings us back to DDI and their…”individualistic” reference ranges. Let’s see if we can estimate how changing the reference range might impact their reports.
Looking at urine mercury, DDI advertises that they use ICP-MS (inductively-coupled plasma – mass spectroscopy) for their analysis. Perkin-Elmer, a manufacturer of those machines, lists the limit of detection (LOD – the lowest concentration it can detect) of its ICP-MS as 0.1 ppb (parts per billion, or mcg/liter), which works out to around 0.2 mcg/day, given average urine output. Compared to DDI’s upper limit of 5 mcg/day, this is close enough to zero to make no significant difference, and that assumption will give DDI a bit more leeway.
Now, the distribution of urine mercury values – even in the “healthy, normal” population – is not going to be a normal distribution (the nice “bell-shaped curve” of basic statistics). The values will be clustered nearer zero, as has been shown by a number of studies. In fact, the distribution can be modeled as a normal distribution (“bell-shaped curve”) folded in half at the mean (the peak of the curve). This assumption greatly simplifies the math and, in the process, gives DDI yet another break, since it places the mean at zero and thus broadens the standard deviation (mathematical tricks – don’t try this at home, or on your homework).
If we do this, and assume that the reference range used by the grand majority of clinical laboratories is two standard deviations from the mean, this gives us the following:
Mean: 0 mcg/day (yes, I know this is unrealistic, but it gives the advantage to DDI)
Standard deviation: 7.5 mcg/day
In a two-tailed normal curve, the mean plus and minus two standard deviations encompasses 95.5% of the population. In this “folded”, one-tailed distribution, the mean (0 mcg/day) plus two standard deviations (no minus, since that would lead to negative numbers, which would be really ridiculous) will encompass 95.5% of the population and should define the reference range.
With these assumptions, which clearly give the advantage to DDI, the upper reference range used by DDI (5 mcg/day), is the mean plus 2/3 of a standard deviation. This range only encompasses 49.5% of the “healthy, normal” range, so you reach the 50% probability of a “false positive” (abnormal result in the absence of any disease) with a single test.
Remember, the assumptions used to construct this simple model were grossly biased in favor of showing DDI in a good light, so it is likely that their reference range is even worse than the model shows.
To be fair, DDI’s reference ranges are close to or identical to the “standards” on some of their tests (cadmium, for instance). However, their reference ranges for arsenic and antimony were three times higher than the standard ranges used by most clinical labs. As my son would say, “Go figure.”
DDI steps “over the line” again with their lead reference range, which is given as 0 – 20 mcg/day, against 0 – 31 mcg/day for the other laboratories. Using the same model as above, we get a pretty similar answer – DDI’s reference range only encompasses 81.8% of the “healthy, normal” population.
If DDI had stuck strictly to the established reference ranges for its DDI’s “Urine Toxic Metals” panel (15 metals, from aluminum/aluminium to uranium), there would be a 53.7% chance of a single false positive result (an abnormal result in the absence of disease). By changing the reference ranges on just mercury and lead (two “hot topics” in autism, by the way), and even giving them credit for raising the reference ranges on antimony and arsenic (assuming that these tests now have a 2% false positive rate), the overall risk of a single false positive goes to over 75% [77.9%].
Let me say that out again. By reducing the reference range on lead and mercury – even if we can assume that they are using valid reference ranges for the others (such as tin, platinum and thallium, for which large normative studies have not been done), the chance that a perfectly healthy, normal person would get back one abnormal test result goes to over 75%. And that abnormal result is most likely to be either mercury or lead.
Is it any wonder that DDI is the “lab of choice” for people trying to find a connection between mercury and autism?