Paul's pages

First published Thursday, July 12, 2007

The prosecutor's fallacy

There are various forms of the "prosecutor's paradox" or "the prosecutor's fallacy," in which probabilities are used to assign guilt to a defendant. But probability is a slippery subject.

For example, a set of circumstances may seem or be highly improbable. But defense attorneys might wish to avail themselves of something else: the more key facts there are in a string of facts, the higher the probability that at least one fact is false. Of course, that probability is difficult to establish unless one knows either the witnesses' rates of observational error or some standard rates of observational error, such as the rate typical of an untrained observer versus an error rate typical of a police officer.

(For a non-rigorous but useful example of likelihood of critical misstatement, please see the post Enrico Fermi and a 9/11 plausibility test. In that post we are testing plausibility which is far different from ironclad guilt or innocence. Also, for a discussion of probabilities of wrongful execution, please search Fatal flaws at Znewz1.blogspot.com.)

Suppose an eyewitness is tested for quick recall and shows a success rate of 96 percent and a 4 percent error rate. If the witness is testifying to 7 things he saw or heard with no prior knowledge concerning these things, the likelihood that the testimony is completely accurate is about 75 percent. So does the 25 percent probability of error constitute reasonable doubt -- especially if no fact can be expunged without forcing a verdict of not guilty? (Of course, this is why the common thread by several witnesses tends to have more accuracy; errors tend to cancel out.)

The prosecutor's paradox is well illustrated by The people v.
Collins, a case from 1964 in which independent probabilities were incorrectly used, the consequence being dismissal of the conviction on appeal.

To summarize, a woman was shoved to the ground and her purse snatched. She and a nearby witness gave a description to Los Angeles police which resulted in the arrest of a white woman and a black man. I do not intend to treat the specifics of this case, but rather just to look at the probability argument.

The prosecutor told the jury that the arrested persons matched the description given to police so closely that the probability of their innocence was about 1 in 12 million.

The prosecutor gave these probabilities:

Yellow auto, 1/10; mustached man, 1/4; woman with ponytail, 1/10; woman with blonde hair, 1/3; black man with beard, 1/10; interracial couple in car, 1/1000. With a math professor serving as an expert witness, these probabilities were multiplied together and the result was the astoundingly high probability of "guilt."

However, the prosecutor did not conduct a comparable test of witness error rate. Suppose the witnesses had an average observational error rate of 5 percent. The probability that at least one fact is wrong is about 26 percent. Even so, if one fact is wrong, the computed probability of a correct match remains very high. Yet, if that fact was essential to the case, then a not guilty verdict is still forced, probability or no.

But this is not the only problem with the prosecutor's argument. As the appellate court wrote, there seems to be little or no justification for the cited statistics, several of which appear imprecise. On the other hand, the notion that the reasoning is never useful in a legal matter doesn't tell the whole story.

Among criticisms leveled at the Los Angeles prosecutor's reasoning was that conditional probabilities weren't taken into account. However, I would say that conditional probabilities need not be taken into account if a method is found to randomize the collection of traits or facts and to limit the intrusion of confounding bias.

But also the circumstances of arrest are critical in such a probability assessment. If the couple was stopped in a yellow car within minutes and blocks of the robbery, a probability assessment might make sense (though of course jurors would then use their internalized probability calculators, or "common sense"). However, if the couple is picked up on suspicion miles away and hours later, the probability of a match may still be high. But the probability of error increases with time and distance.

Here we run into the issue of false positives. A test can have a probability of accuracy of 99 percent, and yet the probability that that particular event is a match can have a very low probability. Take an example given by mathematician John Allen Paulos. Suppose a terrorist profile program is 99 percent accurate and let's say that 1 in a million Americans is a terrorist. That makes 300 terrorists. The program would be expected to catch 297 of those terrorists. However, the program has an error rate of 1 percent. One percent of 300 million Americans is 3 million people. So a data-mining operation would turn up some 3 million "suspects" who fit the terrorist profile but are innocent nonetheless. So the probability that a positive result identifies a real terrorist is 297 divided by 3 million, or about one in 30,000 -- a very low likelihood.

But data mining isn't the only issue. Consider biometric markers, such as a set of facial features, fingerprints or DNA patterns. The same rule applies. It may be that if a person was involved in a specific crime or other event, the biometric "print" will finger him or her with 99 percent accuracy. Yet context is all important. If that's all the cops have got, it isn't much. Without other information, the odds are still tens of thousands to one that the cops or Border Patrol have the wrong person.

The probabilities change drastically however if the suspect is connected to the crime scene by other evidence. But weighing those probabilities, if they can be weighed, requires a case-by-case approach. Best to beware of some general method.

Turning back to People v. Collins: if the police stopped an interracial couple in a yellow car near the crime scene within a few minutes of the crime, we might be able to come up with a fair probability assessment. It seems certain that statistics were available, or could have been gathered, about hair color, facial hair, car color, hair style, and race. (Presumably the bandits would have had the presence of mind to toss the rifled purse immediately after the robbery.)

So let us grant the probabilities for yellow car at 0.1; woman with ponytail, 0.1; and woman with blonde hair, 0.333. Further, let us replace the "interracial couple in car" event with an event that might be easier to quantify. Instead we estimate the probability of two people of different races being paired. We'd need to know the racial composition of the neighborhood in which they were arrested. Let's suppose it's 60 percent white, 30 percent black, 10 percent other. If we were to check pairs of people in such a neighborhood randomly, the probability of such a pair is 0.6 x 0.3 = 0.18 or 18 percent. Not a big chance, but certainly not negligible either.

Also, we'll replace the two facial hair events with a single event: Man with facial hair, using a 20 percent estimate (obviously, the actual statistic should be easy to obtain from published data or experimentally).

So, the probability that the police stopped the wrong couple near the crime scene shortly after the crime would be 0.1 x 0.1 x 0.333 x 0.18 x 0.2 = about 1.2^-4, or about 1 chance in 8300 of a misidentification. Again, this probability requires that all the facts given to police were correct.

But even here, we must beware the possibility of a fluke. Suppose one of the arrestees had an enemy who used lookalikes to carry out the crime near a point where he knew his adversary would be. Things like that happen. So even in a strong case, the use of probabilities is a dicey proposition.

However, suppose the police picked up the pair an hour later. In that situation, probability of guilt may still be high -- but perhaps that probability is based in part on inadmissible evidence. Possibly the cops know the suspects' modus operandi and other traits and so their profiling made sense to them. But if for some reason the suspects' past behavior is inadmissible, then the profile is open to a strong challenge.

Suppose that a test is done of the witnesses and their averaged error rate is used. Suppose they are remarkably keen observers and their rate of observational error is an amazingly low 1 percent. Let us, for the sake of argument, say that 2 million people live within an hour's drive of the crime scene. How many people are there who could be mistakenly identified as fitting the profile of one of the assailants? One percent of 2 million is 20,000. So, absent other evidence, the probability of wrongful prosecution is in the ballpark of 20,000 to 1.

It's possible that the male or female associate of the innocent suspect's partner is guilty, of course. So one could be an innocent member of a pair while the other member is guilty.

It's possible the crime was by two people who did not normally associate, which again throws off probability analysis. But, let's assume that for some reason the witnesses had reason to believe that the two assailants were well known to each other. We would then look at the number of heterosexual couples among the 2 million. Let's put it at 500,000. Probability is in the vicinity of 5000 to 1 in favor of of wrong identification of the pair. Even supposing 1 in 1000 interracial couples among the 2 million, that's 2000 interracial couples. A one percent error rate turns up roughly 20 couples wrongly identified as suspects.

Things can get complicated here. What about "fluke" couples passing through the area? Any statistics about them would be shaky indeed, tossing all probabilities out the window, even if we were to find two people among the 20 who fit the profile perfectly and went on to multiply the individual probabilities. The astoundingly low probability number may be highly misleading -- because there is no way to know whether the real culprits escaped to San Diego.

If you think that sounds unreasonable, you may be figuring in the notion that police don't arrest suspects at random. But we are only using what is admissible here.

On the other hand, if the profile is exacting enough -- police have enough specific details of which they are confident -- then a probability assessment might work. However, these specific details have to be somehow related to random sampling.
After all, fluke events really happen and are the bane of statistical experiments everywhere. Not all probability distributions conform to the normal curve (bell curve) approximation. Some data sets contain extraordinarily improbable "ouliers." These flukes may be improbable, but they are known to occur for this specified form of information.

Also, not all events belong to a set containing ample statistical information. In such cases, an event may intuitively seem wonderfully unlikely, but the data are insufficient to do a statistical analysis. For example, the probability that three World Trade Center buildings -- designed to withstand very high stresses -- would collapse on the same day intuitively seems unlikely. In fact, if we only consider fire as the cause of collapse, we can gather all recorded cases of U.S. skyscraper collapses and all recorded cases of U.S. skyscraper fires. Suppose that in the 20th Century, there were 2,500 skyscraper fires in the United States. Prior to 9/11 essentially none collapsed from top to bottom as a result of fire. So the probability that three trade center buildings would collapse as a result of fire is 2,500^-3
or one chance in 156 billion.

Government scientists escape this harsh number by saying that the buildings collapsed as a result of a combination of structural damage and fire. Since few steel frame buildings have caught fire after being struck by aircraft, the collapses can be considered as flukes and proposed probabilities discounted.

Nevertheless, the NIST found specifically that fire caused the principle structural damage, and not the jet impacts. The buildings were well designed to absorb jet impact stresses, and did so, the NIST found. That leaves fire as the principle cause. So if we ignore the cause of the fires and only look at data concerning fires, regardless of cause, we are back to odds of billions to one in favor of demolition by explosives.

Is this fair? Well, we must separate the proposed causes. If the impacts did not directly contribute significantly to the collapses, as the federal studies indicate (at least for the twin towers), then jet impact is immaterial as a cause and the issue is fire as a cause of collapse. Causes of the fires are ignored. Still, one might claim that fire cause could be a confounding factor, introducing bias into the result. Yet, I suspect such a reservation is without merit.

Another point, however, is that the design of the twin towers was novel, meaning that they might justly be excluded from a set of data about skyscrapers. However, the NIST found that the towers handled the jet impacts well; still, there is a possibility the buildings were well-designed in one respect but poorly designed to withstand fire. Again, the NIST can use the disclaimer of fluke events by saying that there was no experience with fireproofing (reputedly) blown off steel supports prior to 9/11.

Paul's pages

Search This Blog

Monday, November 11, 2013

First published Thursday, July 12, 2007

No comments:

Post a Comment