Wednesday, January 23, 2019

The History of Challenges Relevant to the Interpretation of DNA Mixtures


         As an expert consultant in forensic biology/DNA, I am frequently asked by defense attorneys (I am rarely contacted by counsel for the prosecution) to educate them on various basic aspects of the emerging technology referred to as ‘probabilistic genotyping’ (PG). I explain that—while I am quite familiar with the fundamentals of PG technology—they might be optimally served by also requesting insights from forensic DNA mathematicians and/or experts who have actually developed PG software. I also emphasize that, as part of a good legal education on PG technology, they should scrutinize the history of DNA mixture interpretations.

When individuals working within the criminal justice system ask for advice on the fundamentals of forensic biology/DNA, I recommend the following: Any documentation published by the Scientific Working Group on DNA Analysis Methods (SWGDAM) will prove to be extremely useful. Link: https://www.swgdam.org/publications  Additional resources: DNA for the Defense Bar. (June, 2012), from the U.S. Department of Justice. Link:  https://www.ncjrs.gov/pdffiles1/nij/237975.pdf Two recent resources are: Forensic DNA Analysis: A Primer for Courts. (November, 2017), published by the Royal Society of Edinburgh: https://www.semanticscholar.org/paper/Forensic-DNA-analysis%3A-a-primer-for-courts/b503fc9202c4b6cc2ac783d326430b9402450987
and The Litigator’s Handbook of Forensic Medicine, Psychiatry, and Psychology. Within this reference set, Chapter 8 is entitled: “Forensic Use of DNA”. This reference is slated for publication in September, 2019. Requests for an advance copy of Chapter 8 can be directed to Michael J. Spence, Ph.D., at mike@spenceforensics.com

Analysis of evidence items can lead to the characterization of DNA mixtures with endless gradations of complexity and variability. From locus-to-locus, no two DNA mixture electropherograms (e-grams) will exhibit the identical ‘allelic landscape’. As Short Tandem Repeat (STR) typing initially became the basis of forensic human identification, law enforcement labs were not provided with sufficient centralized standards for the appropriate interpretation of DNA mixtures. Forensic DNA laboratory managers and their analysts were lacking direction. As this scientific obstacle became progressively recognized, and efforts were made to implement various guidelines, controversies and legal debates ensued.

It is useful to explore the evolution of efforts to address the misinterpretation of DNA mixtures. In 2005, forensic DNA scientists worked in concert with the National Institute of Standards and Technology (NIST), with the purpose of organizing a widespread inter-laboratory study of DNA mixture assessments. This initiative was referred to as the NIST MIX05 Study. Sixty-nine forensic DNA laboratories were provided with the identical two-person mixture data. Alarmingly, forty of the reporting labs characterized the DNA mixtures as “inconclusive”. Among the twenty-nine labs that provided a statistic—based upon non-exclusions—the random match calculations ranged from 1 in 31,000 to 1 in 213,000,000,000,000.
   
In recognition of this astounding lack of consistency, in February 2008, a DNA mixture interpretation workshop was held in Washington, D.C. This was one of a multitude of similar workshops—focusing in part, on the results of the MIX05 Study. Dr. John Butler—the acting chairman of this D.C. workshop—summarized the disturbing results with his presentation, entitled: “A High Degree of Variability Currently Exists with Mixture Interpretation.”

Part of Dr. Butler’s presentation included a commentary from the highly recognized forensic scientist, Dr. Peter Gill, which was as follows: “If you show 10 colleagues a mixture, you will probably end up with 10 different answers.” During the years following MIX05, many scientists expressed concerns, regarding the use of a simple inclusion/exclusion threshold for Relative Fluorescence Units (RFU). This presented a high potential for failure—as it encouraged what is affectionately referred to as the ‘Texas Sharpshooter Fallacy’ in forensic DNA interpretations. This fallacy is derived from an old tale describing a man (perhaps a Texan?) who test fires several bullets at the outer wall of an old barn. A thought dawns upon the man. He grins widely, and rushes off to grab some red paint, as well as some white paint. A few hours later, the man leads an assembly of friends and neighbors out to the old barn. The group is genuinely amazed at all of the painted targets on the barn wall—with a bullet hole perfectly located in the center of each target. Our primary character savors the outpouring of admiration.

His delight comes to an abrupt end when a savvy neighbor inspects the barn more closely—noticing that the paint is still sticky. The charade is completely exposed when the neighbor also realizes that some of the paint has been splashed through the obviously pre-existing bullet holes, and is dripping down the inside of the barn wall.

In 2008-2011, as scientists continued to wrestle with the DNA mixture interpretation version of this fallacy, a chasm formed between many forensic biologists/mathematicians—on one side, and the FBI/SWGDAM—on the other side. Refer to a 2011 article in Science and Justice, from the authors, Itiel Dror and Greg Hampikian. This article was entitled: “Subjectivity and bias in forensic DNA mixture interpretation”. In their conclusion, these authors addressed the forensic community as follows:

“…while this is the first published empirical study of potential DNA bias, Butler of the NIST laboratories has conducted extensive studies of mixture analysis over several years, wherein he supplies a large number of volunteer laboratories identical DNA mixture data and asks for their analysis. The results of these excellent studies have been presented at conferences and are available at the NIST webpages, but have never been published in a peer-reviewed journal. An interesting and perhaps the most critical point for this paper is that Butler’s research findings show that inclusion statistics for the same profiles (using the same data) varied over 10 logs, that is from 1 in 434,600 to 1.18 x 1015, using the exact same electropherograms.”
  
While the NIST MIX05 studies were illuminating, and may have somehow succeeded in nudging rational DNA mixture guidelines forward, harsh inconsistencies persisted for years to come. Around 2009/2010, the FBI and SWGDAM initiated a movement toward resolving this misinterpretation crisis by implementing a second, ‘stochastic’ RFU threshold. Apparently, the rationale of this movement was—if one threshold has proven ineffective toward resolving DNA mixture headaches, perhaps two thresholds might deliver a tangible improvement.

In 2013, NIST assessed this ‘binary threshold’ strategy, by organizing a new series of inter-laboratory surveys—referred to as the NIST MIX13 Study. Unfortunately, the outcome of the MIX13 study projected an unfavorable light on the binary approach. One hundred study participants were asked to assess a DNA mixture that included three contributors. One reference sample came from an individual who was known to be absent from this 3-person mixture. Seventy of the one hundred study participants incorrectly included this known individual. In addition to this appalling 70% rate of false inclusions, the random match probability of inclusion stats ranged from 1 in 9 to 1 in 344,000. Twenty-four of the one hundred study participants reported the comparison to the known reference as inconclusive. Only six participants correctly excluded the ‘innocent’ known individual. One of those participants utilized a PG software system—which is currently marketed as TrueAllele®.

Results from the NIST MIX05 and MIX13 studies were summarized within an August 1, 2018, publication in Forensic Science International: Genetics, under the following citation: John M. Butler, Margaret C. Kline, and Michael D. Coble, NIST Interlaboratory studies involving DNA mixtures (MIX05 and MIX13): Variation observed and lessons learned, Volume 37: Pages 81-94.

Within the first paragraph of the “Conclusions” section of this landmark publication, Dr. Butler and his co-authors warned scientists as follows:
“The results described in this article provide only a brief snapshot of DNA mixture interpretation as practiced by participating laboratories in 2005 and 2013. Any overall performance assessment is limited to participating laboratories addressing specific questions with provided data based on their knowledge at the time. Given the adversarial nature of the legal system, and the possibility that some might attempt to misuse this article in legal arguments, we wish to emphasize that variation observed in DNA mixture interpretation cannot support any broad claims about ‘poor performance’ across all laboratories involving all DNA mixtures examined in the past.”

This commentary from the authors is well-taken and agreeable. However, the flip-side to this warning is as follows: Those who manage law enforcement labs, as well as forensic DNA outsource labs, must be willing to openly acknowledge the fact that the comparative interpretation of complex DNA mixtures can present a formidable challenge to forensic biologists. They must also acknowledge that PG software technology has been developed—for the most part—with the objective of conquering the scientific concerns that have been propelled by the troubling, lengthy history of DNA mixture misinterpretations.

Here in early 2019—despite the increasing intensity of efforts aimed at establishing guidelines for reliable mixture interpretations—a clear consensus has continued to elude forensic labs. Within the 2018 DNA interpretation guidelines from one accredited crime lab (Source: Phoenix PD Crime Lab, Forensic Biology Procedures—DNA: Interpretation Guidelines.), the opening statement of this document acknowledges the following reality:

“The interpretation of results in casework is a matter of professional judgement and expertise. Not every situation can or should be covered by a predetermined rule.”

The Modern-Day Arrival of PG Software Systems: STRmixTM

The STRmixTM website informs us that: “STRmix™ is expert forensic software that can resolve previously unresolvable mixed DNA profiles. Developed by global leaders in the field, it uses a fully continuous approach for DNA profile interpretation, resolving complex DNA mixtures worldwide.” The makers of this software system assure their readers that “STRmix™ can easily be understood and explained in court by DNA analysts.”

STRmixTM can be used to resolve relatively simple DNA mixtures, as well as complex mixtures, prior to factoring in the data from any known reference samples. Using well-established statistical methods, the software builds millions of conceptual DNA profiles.  It grades these profiles against the evidence sample, finding the combinations that logically justify the observations. Only after this has been accomplished, a range of Likelihood Ratio options are used for subsequent comparisons to known reference profiles. Specifically, STRmix™ uses a Markov Chain Monte Carlo (MCMC) engine to model peak heights of potential allelic data. The software also models various types of apparent stutter peak data, and factors in the possibility of allelic drop out events. All of these functions are performed rapidly by STRmix™.

The MCMC statistical approach provides a mechanism of sampling from any complicated distribution of data. Complicated distributions—such as the myriad of peak heights generated within a DNA mixture e-gram landscape—can be enormously challenging for probability calculations. Due to the fact that the performance of STRmix™ is being supported by comprehensive validation studies—with these underlying mathematics readily accessible to forensic DNA experts—the effectiveness of the software can be adequately summarized for jurors.

It is vital to recognize the reality that clarification of this technology for jurors can be most effectively accomplished in the hands of forensic DNA mathematicians and/or PG software developers. Clearly, the vast majority of crime lab analysts are not mathematical experts—nor are they likely to have developed any species of sophisticated software systems. It is faulty to conclude that any specialist in forensic DNA can be magically transformed into a mathematical expert—as a consequence of a brief PG analysis training program, coupled with plugging capillary electrophoresis data into a PG software system—such as STRmix™ or TrueAllele®.

If that were the case, any ardent couch potato could proceed as follows: 1) Skim through an ordinary television manual; 2) Become proficient with the TV remote control; 3) Grasp that an LED screen is composed of thousands of light pixels; 4) Study information on how images are captured during live broadcasts; 5) Review how images are transmitted via satellites, and funneled through dish receivers attached to homes; and voilĂ , the coach potato can now be passed off as a qualified expert on all aspects of the technology—including the mechanisms by which the technology is deployed.

On January 31, 2018, a Webinar was conducted, bearing the title: “From Training to Trial: A Reflection on Nearly Three Years of Probabilistic Genotyping.” This webinar was sponsored by Promega Corporation (the makers of PowerPlex® Fusion, among many other biotechnology products) Forensic Magazine, and the Institute of Environmental Science and Research (the birthplace of STRmixTM). The two speakers within this webinar were technological leaders at DNA Labs International (Deerfield Beach, Florida). DLI was the first private laboratory in the U.S. to validate STRmixTM. As part of this presentation, Ms. Rachel Oefelein (the Quality Assurance Manager, and a Senior DNA Analyst at DLI), offered the following advice to DNA experts testifying on the topic of probabilistic genotyping:

“Consider recommending an additional expert. We are not all John Buckleton. We are also not computer software programmers. So, there is no reason why you should have to explain—in great detail—what went into the developmental validation. You can only speak to what you know that you read from publications on the developmental validation…”


In contradiction to this, a movement is apparently emerging—asserting that PG-trained crime lab analysts should have zero restrictions on their testimony to the reliability of this technology—whereas independent DNA consultants should be restricted from providing any commentary whatsoever. The rational is that independent DNA consultants have not received sufficient training on the theory, or the hands-on usage of PG software. These assertions are not new. Movements have emerged over the years—pointing to the fact that many independent consultants have not conducted casework in a forensic DNA lab for the past decade or so. Consequently, such experts should be restricted in their availability to testify in criminal proceedings—until they have achieved a substantial level of training/experience in methodologies that might include PG, real-time PCR, YSTR typing, GlobalFilerTM, and PowerPlex® Fusion.

A profound irony is emerging here. There is an existing subset of individuals who were at ground zero of the DNA mixture misinterpretation calamity that plagued our criminal justice system over the course of these past two decades. It is reasonable to suspect the existence of a meaningful overlap between those individuals—and the individuals who have been most emphatic—in the push for infinite rounds of re-training for independent consultants in forensic biology/DNA.

Posted by Michael J. Spence, Ph.D. on January 23, 2019, and edited on February 5, 2019.