Edward Ungvarsky doesn’t implicitly trust expert witnesses. As a criminal defense attorney, his clients are often evaluated by psychologists, who then present their findings in court. The results of the forensic psychology tests they administer can have a big effect on the outcome of a trial. They may determine whether someone should have custody of a child, or whether defendants can understand the legal system and aid in their own defense, a critical component of evaluating whether a person is competent to stand trial. Others determine whether a defendant was sane when they committed a crime, or whether they are eligible for the death penalty.
So Ungvarsky, who practices in Virginia, Maryland, and DC, tries to make sure the psychologists use the right tools to get at those questions. “I always ask what test they’re doing and why,” he says. For example, he’ll make sure the exams were developed by testing populations similar to his clients—a way of trying to avoid racial or class biases that might be inherent to the test—and that they address the specific issues he’s facing in court.
But Ungvarsky was surprised by a study published in February in the journal Psychology Science in the Public Interest by a group of lawyers and psychologists which found that many tests used in courtrooms are considered scientifically unreliable; in the headline, the journal dubbed them “junk science.” “This study shows that the rest of us need to catch up fast,” says Ungvarsky, “because the stakes are high.”
The team, led by Tess Neal, a psychology professor at Arizona State University, reviewed 364 assessment exams commonly used in the legal system, and came to the conclusion that a third of them were not “generally accepted in the field” of forensic mental health, which means that psychologists who had reviewed those tools didn’t think they were scientifically reliable or accurate. About 10 percent of those tools had not been subjected to empirical testing at all. The study also showed that it was rare for lawyers to challenge the scientific validity of those tests in court, and even rarer for them to succeed, meaning evidence from those questionable tests was used at trial.
“Each year, hundreds of thousands of psychological assessments are conducted and used in court to help judges make legal decisions that profoundly affect people’s lives,” Neal said when she presented the findings at this year's American Association for the Advancement of Science meeting in February. “There are some excellent psychologists doing scientifically informed, evidence-based assessments. However, many others are not.”
As a clinical psychologist, Neal had wanted to work in courts from the start of her career, and had long believed that psychologists had the right tools to help the legal system answer crucial questions about factors like sanity and competency. But as she worked her way through her doctoral training, she started to have doubts. "I was always kind of questioning and worried about some of what people were doing," she says.
In a survey she conducted during her graduate residency, Neal asked practicing forensic psychologists which tools they had used during their two most recent evaluations. The responses varied widely. In many cases, Neal and her coauthor, Thomas Grisso, had never heard of the tests psychologists mentioned. “This finding surprised—and worried—us both,” she says.
Those findings led to the current paper. Of the tests Neal and her coauthors reviewed, they found that only 67 percent were considered “generally accepted,” meaning that clinical psychologists believed they were effective tools. As a measure of acceptance Neal mainly used a resource called the Mental Measurements Yearbook (MMY), in which volunteer reviewers assess the scientific merits of tests. The MMY asks thousands of psychologists to research how the tests were created, what their technical characteristics are, and then give their professional opinions of the tests. Of the tests commonly used in court, Neal found that one third hadn’t been reviewed at all in the yearbook or other similar resources. Of those that had been reviewed, only 40 percent had favorable ratings, meaning that the volunteer reviewers concluded that they had strong scientific underpinnings.
There’s no evidence yet that any of these tests have directly contributed to unjust rulings—though Neal thinks that may be because nobody has scrutinized this issue yet. But she and other forensic psychologists are particularly concerned about what are known as “projective” tests, in which a person is asked to respond to an ambiguous prompt, and their reaction is supposed to reflect their cognitive style, personality, or other psychological traits.
For example, in the Kinetic Family Drawing test, which is used to assess how kids feel about themselves and their families, a subject is asked to draw themselves or their family members and then answer questions about the drawing. Clinicians then make conclusions based on the child’s answers and on the drawing: where people are placed, their relative sizes, or the amount of detail included about a specific person. A few other famous examples of projective tests include the Rorschach ink blots; the Thematic Apperception Test, which measures personality by asking subjects to make up stories based on a series of black-and-white images; and the House-Tree-Person test, which requires subjects to draw those three objects and then answer questions about their drawing. Neal says tests like this are worrisome because they leave room for subjectivity and bias in the psychologist’s interpretation.
She says most forensic psychologists prefer “objective” tests, which ask a series of true or false or multiple choice questions and have objective scoring criteria for the answers. These tests, like the Minnesota Multiphasic Personality Inventory, which uses 550 true or false questions to evaluate a variety of psychological problems including depression, schizophrenia, and paranoia, have rubrics that help ensure that even if different psychologists administer the test, they will all still reach the same conclusion.
Most worrisome of all, she says, are the roughly 25 percent of clinicians who don’t use any tests at all to structure their forensic psychological assessments, another finding from her 2014 survey coauthored with Grisso. “They’ll go and talk to the person and they’ll use no tool at all and just write a report based on their opinion and introduce that as evidence in a court,” she says. “That happens.”
Other experts have been worried for years about the issues raised by Neal’s study. “None of these issues was surprising to me,” says Eric Drogin, a forensic psychologist and lawyer, who teaches courses on mental health and the law at Harvard Law School and was not involved in the study. “It should not come as a surprise to any attorney or scientist who works in this area.”
Drogin says experts not only worry about which tests are used, but also about how those tests are interpreted by psychologists and how those results are understood by judges and juries. The first issue, he says, is that psychologists need to pick the right tools to answer a specific question. For example, it’s no good using a tool that measures family relationships when you actually need to understand whether someone is competent to stand trial. Then, he says, tests must satisfy two requirements to be scientifically accurate. One is reliability: Does the test give the same results every time, and can those results be interpreted the same way by different experts? The other is validity: Does the test actually measure what it’s supposed to measure?
Tests can meet one standard while still failing the other. For example, Drogin mentions a now-infamous early “forensic” test once used to determine a suspect’s guilt: dunking during the Salem witch trials of the 1690s. Women were thrown into bodies of water. If they floated, it was considered evidence that they were witches. If they sank, they were exonerated. In a sense, the test was very reliable. “The doctors were likely to agree: ‘drowned’ or ‘not-drowned,’” says Drogin. “We could nail that down pretty easily.” But they weren’t valid. The test, he says, “was measuring buoyancy and not witchcraft.”
In forensic psychology, tests are supposed to be put through a battery of studies both before and after they are used in court to measure whether they are reliable and valid. But Neal’s study found that some 10 percent of tests hadn’t undergone any empirical testing at all, meaning the authors couldn’t find any evidence that the test’s approach had been written about in peer-reviewed journals. And among those that had passed peer review, many still had negative reviews in the Mental Measurements Yearbook. Neal says their paper isn’t attempting to be the arbiter of what is or isn’t a good tool; its purpose is to show that tests that weren’t accepted as good tools by professionals in the field were still making their way into court.
Some tests endure because psychologists continue using tests that are out of date, or are no longer considered the best tool. “Old habits die hard sometimes,” says Kirk Heilbrun, a psychology professor at Drexel University. Those habits can have long-lasting repercussions, because precedent plays such a big part in the American legal system. If a test has been used in court before, it’s likely it will be allowed again and again.
In part two of their study, Neal and her coauthors showed that not only are some psychologists using unreliable tests, but judges and lawyers rarely challenged their use as evidence. Lawyers only challenged the scientific validity of tests in 5 percent of the cases Neal’s team reviewed. Even when lawyers did mount a challenge, they were often unsuccessful. Out of 372 cases that Neal examined, only 19 included a challenge. Out of those 19, only six challenges succeed. Neal urges lawyers and judges to be much more skeptical about the psychological tests that are used as evidence: “Don’t just assume that they’re valid.”
For lawyers unfamiliar with forensic psychology, that can be hard, especially for public defenders working with limited resources. Psychologists will sometimes use multiple tests to arrive at their conclusions. Researching the validity of each test can take up precious time. Ungvarsky says that except for some very experienced lawyers, most don’t challenge the underlying scientific merit of mental health evidence. “And because the lawyers don’t do it, then the judges don’t address it,” he says.
The American legal system is an adversarial one in which lawyers fight for their clients and judges act more like umpires policing behavior than like active players on the field. That structure also makes it difficult for judges to raise objections, says Judge Kevin Burke, who sits on the Hennepin County Court in Minnesota and who has written about the use of social sciences in the courtroom. Burke says that while judges do have an obligation to make sure trials are fair, he also would be reticent to tell a new judge to interject themselves too aggressively into proceedings: “If the lawyers don’t object, then why are you?”
Neal adds that while judges are supposed to be the gatekeepers who keep bad science out of court, many are ill-prepared to handle the technical specifics of the evidence put before them, whether that’s psychological tests or other forensic disciplines like bloodstain pattern analysis. “We’re putting the onus on the non-scientifically trained expert to evaluate the credibility of any possible discipline that might come to court and offer their opinion,” says Neal. “So this is a kind of a deep systemic problem.”
Heilbrun says there have been efforts to better train forensic psychologists through dedicated doctoral programs, and Drogin adds that national organizations like the American Psychological Association have started publishing ethics codes that tell psychologists what to do when they’re working with the legal system. He also suggests that lawyers who aren’t sure about which tests to trust can hire trial consultants in addition to expert witnesses. The psychologists who testify as expert witnesses have to remain unbiased so they can administer the tests fairly. But trial consultants could advise the attorney on which tests are useful and which should be challenged. Yet as Burke points out: “That’s for people who have lots and lots of money.”
Neal advises lawyers to do their own research, and to actually look at the primary source documents and peer-reviewed research on these tests. She also suggests that clinical and research psychologists create a database that people could use to easily look up which tests are recommended for which uses in court. That resource could be updated frequently so that it reflects the best tools the field can offer at any given time, she suggests.
She says forensic psychologists will have to work harder to improve the field as a whole. “The law has to answer these questions,” she says. “There’s reason for the field to exist, and I believe in it, and I believe we have a lot to offer. I just think we could be doing better.”