Strange Loops - Conditional Probability: Why A Cancer Diagnosis Isn't All Bad

Conditional Probability: Why A Cancer Diagnosis Isn't All Bad

Blog || Politics || Philosophy || Science || Fiction || Quotes

If you are diagnosed positive for a cancer which only half of one percent of the population has, using a test which is 98 percent accurate, what are the chances you actually have cancer? The mistake rate of the test is only 2 percent, but the chances of a misdiagnosis in your case is actually around 80 percent! Chances are pretty good you do not actually have cancer, despite the rather accurate test.

John Allen Paulos, in a cool little book titled Innumeracy: Mathematical Illiteracy and Its Consequences, explains this strange result:

Assume that there is a test for cancer which is 98 percent accurate; i.e. if someone has cancer, the test will be positive 98 percent of the time, and if one doesn't have it, the test will be negative 98 percent of the time. Assume further that .5 percent - one out of two hundred people - actually have cancer.
Now imagine that 10,000 tests for cancer are administered. Of these, how many are positive? On the average, 50 of these 10,000 people (.5 percent of 10,000) will have cancer, and so, since 98 percent of them will test positive, we will have 49 positive tests. Of the 9,950 cancerless people, 2 percent of them will test positive, for a total of 199 positive tests (.02 x 9,950 = 199). Thus, of the total 248 positive tests (199 + 49 = 248), most (199) are false positives, and so the conditional probability of having cancer given that one tests positive is only 49/248, or about 20 percent! (This relatively low percentage is to be contrasted with the conditional probability that one tests positive, given that one has cancer, which by assumption is 98 percent.)

The reason for this unintuitive result is that our intuition often fails to take into account conditional probability. We tend to latch on to the initial numbers we are presented with ("98 percent accurate") and make the simplest possible inference (i.e. the test has a 2 percent chance of being wrong on any given person's diagnosis). The math shown above (which can be generalized using a population any size) demonstrates how the situation is more complicated, since the mistakes are more likely to happen most often to the most numerous segment of the population. In the cancer case, the majority of people do not have cancer, so most of the tests mistakes will actually happen to those who do not have cancer (giving them false positives).

If, on the other hand, a test with 98 percent accuracy was used to test for a common condition which a third of the population has, a positive result would likely be the correct diagnosis. Of 10,000 people tested, on the average, 3,333 of them will have the condition, and so, since 98 percent of them will test positive, we will have 3,266 positive tests. Of the 6,667 people without the condition, 2 percent of them will test positive, for a total of 133 positive tests. Thus, of the total 3,399 positive tests, only about 4 percent of the positives results are false positives. The false positive effect is therefore greatest when testing for something which is fairly rare in the tested population (or when the test is inaccurate).

This simple math shows a major problem with things like widespread mandatory drug tests. Even if the tests are highly accurate, for something relatively rare in the general population they will falsely blame more innocent people than the guilty who are correctly diagnosed. The same applies even more powerfully to tests which are less accurate such as polygraphs for detecting lies. With estimates ranging between 70 and 90 percent accuracy, the false positive problem can be through the roof and many more innocent people would be incorrectly convicted were lie detectors admissible in court than the guilty people who were correctly convicted by them.

Next time you are calculating the probability of something, be sure to take into account any relevant previous conditions. Conditional probability, where applicable, is much more accurate than the straightforward intuitive probability we often associate with events by ignoring vital conditional information.

Originally Written: 12-29-04
Last Updated: 12-29-04