Thank you, ladies and gentlemen. Please be seated. All right. Good afternoon, ladies and gentlemen.
THE JURY: Good afternoon.
It's always my policy to explain to you why it is we get late starts. We started this morning at 9 o'clock with a hearing regarding the evidence that you're about to hear. It took longer than I thought it would as just about everything I guess these days. So I apologize to you for having kept you so long. And without further ado, Mr. Clarke, you may call the Prosecution's next witness.
Thank you, your Honor. Dr. Bruce Weir, please.
Bruce Weir, called as a witness by the People, was sworn and testified as follows:
Raise your right hand, please. You do solemnly swear that the testimony you may give in the cause now pending before this court, shall be the truth, the whole truth and nothing but the truth, so help you God?
Please have a seat on the witness stand and state and spell your first and last names for the record.
Thank you, your Honor. Good afternoon, ladies and gentlemen.
THE JURY: Good afternoon.
DIRECT EXAMINATION BY MR. CLARKE
Could you describe--well, let me rephrase that. As a professor of--and I'm sorry. You said statistics--
Statistics and genetics. I'm housed in the statistics department, but I have a courtesy appointment in genetics, which gives me certain responsibilities and rights to both those two departments.
Could you describe for the jury, please, as far as your position at North Carolina State in that department, what are your duties?
Well, they're primarily to conduct research in the appropriate statistical methods to interpret DNA data and genetic data in more general. So primarily to conduct research. I think my appointment is something like a 75 percent research appointment. The remainder of the time I teach. I teach courses in this area and I supervise graduate students.
I have a bachelors degree in mathematics. I took that in New Zealand. I then came to this country in 1965 to North Carolina State. I did a Ph.D. degree in statistics with a minor in genetics. So I was studying both subjects. At the conclusion of that in 1969, I went to the University of California at Davis as a post-doctoral genetics student.
Could you describe, please, a little bit about what statistics mean as far as your particular area of interest?
Well, statistics means data in essence. So it's the study of data, how we interpret data, how we might go about drawing inferences from data; and for genetic data, anything, anything that has to do with the transmission of a trait from parent to offspring, if it's a thing like height or IQ or something as simple as one of these DNA markers we've been hearing about. Any genetic entity, if it can be measured or counted, I can analyze it.
You've described the fact that you received your Ph.D. in statistics; is that correct?
Yes. The minor regarded the formal training, the course work I took and the examinations I had to pass in both statistics and genetics. My Ph.D. research was a theoretical piece of statistics, but on the genetic problem. It was to do with the way that we expect gene frequencies to behave for example. So I think the motivation of all my work is genetic, but the tools I use to do that work are statistical.
What relationship if any does this area that you've referred to as "Genetic" have with DNA?
Well, of course, DNA underlies all of genetics. All of genetic information is DNA information. So when I say I analyze genetic data, all I'm really saying is I analyze data which essentially depends on the DNA.
Could you describe for us, please, after--and I believe you said that while you were at the University of California at Davis, you did some post-graduate work?
Yes. I went back to my native New Zealand as a senior lecturer in the mathematics department and I was there for six years teaching mathematics--teaching primarily statistics actually, but continuing my research in genetics. I remained there six years and then came back to North Carolina State when I joined the faculty in 1976.
The time you were back at school as an lecturer in mathematics and statistics in Australia, was that at a particular school?
When you returned to the United States to North Carolina State, what position did you take there?
I came as an associate professor and I've subsequently being promoted to full professor of statistics and genetics.
Let's start with your position as an associate professor. What does that include? What types of duties in that position?
Oh, well, just the same as they are now essentially. At that time, my appointment was a hundred percent research, and I gradually picked up some teaching responsibilities.
Until 1992 when I got a further promotion if you like, and I have--it's a named professorship. So I have the title of a William Neal Reynolds professor of statistics and genetics.
What does that mean, to go from a professor position to a named professor position?
Oh, it's a sign that I'm doing good work. It's a mark of Honor at our institution. It's--it's a good thing.
All right. Excuse me, gentlemen. Dr. Weir, if I could ask you to allow Mr. Clarke to finish answering his--asking his question before you start to answer since the court reporter is having difficulty if you talk over each other.
During this time period from your education, that is your formal education through to the present day, have you received any honors or awards?
Yes, there have been some. I think the one I'm most proud of outside the university is the Guggenheim fellowship I received in 1983. That's a fellowship which carries a sum of money, and you get it by entering into a competition essentially. And they're quite prestigious, and we were very pleased to get that. And that enabled me to take a year away from North Carolina, and I went to the University of Edinburgh in Scotland on a year sabbatical.
Well, it was a real luxury. I didn't have any responsibilities. So I used that year to do some learning about the current DNA methodology. I needed to learn some new words as I'm sure you've been hearing here. The current DNA methods, the current molecular biology has changed over the last 10 years, and I needed to--at that time, I needed to learn a new language.
Did that coincide with any events going on in the DNA world in England at that time approximately?
It turned out too, although I didn't realize at the time, not very far from where I was having my year of luxury, a Professor Jefferies was just down the road at the University of Leicestershire developing the RFLP markers that we use in forensic applications. He was in essence developing DNA fingerprints at about that time.
As far as professional organizations or societies, are you members of any of those?
Yes. Part of my job requires me to keep abreast of developments in the field, and going along with that, I need to belong to the organizations that contain similar people.
Yes. I think the most relevant ones are the American Statistical Association, the Genetics Society of America and the American Society of Human Genetics.
Are you also members of other organizations as well in the areas of statistics and genetics?
Yes. There are several others. There's one called the Biometric Society. Biometrics means the mathematics of biology. So that's very germaine to this. There's a society for the study of evolution, for example, and I think there may be some others.
I'd like to turn your attention if I can to the scientific literature. Do you hold any positions as far as evaluating for journals or magazines whether or not scientific publications in your field should even be published in the literature?
Yes, I do. That primarily for the general genetics, which is the official journal of the Genetic Society of America. If somebody writes a paper in my area, paper that I would understand, they may submit it to me, and then I in turn send it out to other people to review it. If the reviews are favorable and if I agree with the reviews, then I can accept that paper and see that it's published. If the reviews or myself or both don't like the paper, we think it's not correct or appropriate, then we--then I reject it.
So is this in the position of your being an editor of a particular journal such as that one?
In that position then as I understand it, you not only review scientific papers in your field of expertise to determine if they are or should be published, but you also decide basically what other experts to send these manuscripts or articles to so that they can give you their input as well?
Yes. It requires me to be familiar with the work of a great many people. So I feel to have a good sense of who is working in my field and so that I can choose them as reviewers and it helps me to evaluate their own work. So this is not unusual. It's the way the peer review system works.
Can you describe for the jury, please, some of the publications that you either act or have acted as an editor for or assistant editor?
Well, the one we just talked about in particular, it's the journal genetics. The journal of the--the journal of--the American Journal of Human Genetics, I acted in somewhat similar role though I don't have the final authority to accept or reject. The journal of theoretical population biology I am responsible for all the genetics papers in that journal and I have the authority to accept or reject papers there. The journal of heredity, I'm an associate editor. I get to choose reviewers and make recommendations, although I don't have the final authority.
I'd like to shift your attention--and we'll return to scientific publications in just a few moments, but what I'd like to shift your attention, Dr. Weir, to is the area of presentations and lectures. First of all, in your area of expertise of statistics and genetics, have you been invited to give various presentations in the area of statistics as well as their application in the field of genetics?
Yes, I have. I think the most important way of disseminating my research is through publications. A more immediate way is to present them orally at scientific meetings, and I am invited to attend many such meetings at a local--all the way from the local high schools if you'd like on the way up to international conferences.
And have you in the past lectured at international meetings in various locations around the world in this field?
Yes. That's one of the benefits of my job, that I get to go to many countries. That's a benefit.
As far as this area of, again, invited presentations and meetings, do you organize or chair any meetings of that sort or individual fields within meetings such as that?
Yes. I--I organized a very large international meeting at my university in 1987 in genetics, the second international conference on quantitative genetics. It was a major undertaking. We had over 500 people from around the world. In February of this year, I organized and chaired a meeting in this state on quantitative genetics with a lot of statistics in it. Quantitative genetics is a little different from what we're talking about here. It's the kind of genetics that go along with things that we measure like height or grain yield or height of trees as opposed to things that we count like the DNA types we'll be talking about. So that was a conference I organized and chaired this year.
Are there any further conferences, whether this year or next year, at international locations that you are involved in organizing?
Well, not organizing, but I've been invited to talk on the statistical aspects of forensic DNA at an international conference in Spain in September and again in Edinburgh in July of next year.
In your role as a professor--and you may have described this a little bit earlier--do you supervise individuals or people who are trying to obtain their own Ph.d.s?
Yes. That's probably the most important aspect of my teaching. Teaching classes is of course very important, but the one-on-one teaching, working with a graduate student on a thesis project, working with that student from the concept of the idea through to final publication, it's time consuming, it's very important and I enjoy it very much.
As far as--and if I can return to the area of publications in your field, have you published in the scientific literature a number of articles about statistics and population genetics?
I think all of my--all of my publications are in this general area of applying statistics to genetic data, yes.
It's the process but--that we described that I am on one end of as an editor, but when I want to give my work, I have to go through the persons from the other range. So I select a journal which I think would be appropriate, send the paper to the editorial board of that journal and hope they reach the right decision.
And how many times or how many publications have you in fact either authored or co-authored in this field of statistics and population genetics?
Are there other types of things that you write that may be published in the scientific literature other than, for instance, peer-reviewed articles?
Well, there is another avenue for expressing our thoughts or giving some results and there are letters to the editor or commentaries or reviews. Sometimes I have been invited to write reviews of a field. For example, the population genetics issues in the forensic uses of DNA. There's a very prestigious journal, the proceedings of the national academy of sciences. The editor of that journal asked me to write a review of the issues in late 1992. So there are review articles like that and then there are letters and other--other what I call commentaries. These are pieces of writing which are published at the whim of the editor if you like, but without going through a peer review process.
Is it correct to say that the approximately 100 scientific publications that you have had through the peer review process that you've personally authored, that it's that type of publication that forms the backbone of science?
Oh, yes. That's the way science proceeds. Science cannot proceed--we scientists cannot reach opinions other than formulating themselves and publishing and then reading other publications. The way that our body of knowledge, everything we know has to be accepted by other scientists. Otherwise it's lost, and it's only accepted on the basis of the published literature.
Yes. I'm very proud of my 1990 textbook genetic data analysis. This is a compilation of a whole host of statistical methods for handling genetic data, the type of accounting data that we'll be talking about here. It's a collection of recipes if you like with the underlying methods.
I have in my hand what appears to be a book with a gray cover entitled genetic DNA analysis. Is that the book you've just referred to?
I'm sorry, your Honor. Just perhaps for clarification, he said genetic DNA analysis--
Have you been involved in any recent books or other publications that are--well, let me rephrase that. As far as books are concerned, have you been involved in the writing of any other books or as an editor of any of those books?
The first book I edited was involved in the genetic--in the statistical analysis of DNA sequence data. I think that was 1988. I edited the proceedings from the conference we held in Raleigh. I helped edit the proceedings of a conference held at the University of California at Davis. I'm currently preparing the second edition of genetic data analysis. Last week I had published a set of papers in the forensic uses of DNA. The collection of papers is called human identification. That was a set of papers written by authors I had selected from around the world who work in this area and included a couple of my own papers. So that was published as a special issue of a journal last week and will appear later this summer as a book.
Is that what I have in my hand now entitled at the top Genetica, an International Journal of Genetics?
The title of the journal is Genetica. The title of the issue is human identification.
That's right. That's an issue of--these journals come out periodically and they contain many articles. All the articles in that issue are the ones that I solicited, had reviewed, accepted and got published.
Oh, yes. That's the other side of the coin. I--I'm often asked by journal editors to write a review of a book to inform other scientists about books in this area, yes.
Yes. It gets you a chance to express your feelings without going through the need to be nice to reviewers.
Now, I would like to shift your attention if I can, Dr. Weir, to the actual testing in forensic casework in cases such as this case before this jury. Have you previously consulted in any cases--that is consulting using your expertise in the area of statistics and genetics. Have you consulted in any other cases other than this case today?
Well, I'm frequently asked by attorneys to help them with the interpretation of the DNA evidence, with the statistical interpretation. I have worked with some Prosecuting attorneys. I choose not to work with Defense attorneys because I don't feel able to mount an adequate Defense. In other words, an attack on these DNA matches. So the attorney--the Prosecuting attorney will send to me the reports in the case, the reports from the forensic scientists. When I receive those reports, they always contain a lot of information about the DNA matches and along with some numbers, things like the frequency of this profile is 1 in a million, that kind of statement. So it's my job to make sure that 1 in a million is an appropriate figure.
Let me stop you for a moment. When you said you consult with Prosecuting attorneys, but not with Defense attorneys because you feel you can't mount an attack on DNA; is that right?
Yes. I don't--I'm convinced that this methodology is sound and I should say that I'm convinced that the statistical methodology is sound. So I can't think of a way to attack that. It could be of course that if the analysis was done incorrectly, then it would be appropriate for me to appear for the Defense.
KEY QUOTEDid you describe for us approximately how many cases that you've acted as a consultant in where you've advised attorneys about this type of technology and their specific results?
I'm not sure of the number, but I have given testimony in I think 16 cases before today.
Can you tell us what states those have involved, if it has been more than your home state?
Yes. That's another opportunity to travel of course. I've been in this state and I've been in this city giving testimony. I've been in the states of Oregon and Washington, Illinois, Pennsylvania, Virginia, North and South Carolina, Florida.
And in those instances, did you in fact or were you allowed to give opinions about whether or not frequency data in, for instance, DNA typing cases was presented appropriately?
As far as DNA typing itself, can you describe for the jury, please, how is it you're familiar, with for instance, the technologies that have been previously testified in this case, including both the RFLP typing technique and the technique that uses PCR as well?
Well, the technologies really go back to my days in graduate school. I've spent my career developing statistical methods for genetic data. Now, up until the early 80's, the types of genetic data we had were different to some extent from these DNA types. That doesn't really matter. All that is necessary in the first place for our analyses is that we talk about things, genetic entities which are transmitted from parent to child, and they obey all the laws of genetics. These laws are fairly simple as it turns out, but they can still lead to a very complex situation. So the theories I've been working on apply to blood groups or eye color or anything that we could count and any discrete character which we could say yes, the person has the character or no, the person does not have the character. So these methods were in place, and it turned out in the late 80's, when we had the DNA types introduced to the forensic arena, we already had the methods pretty much in place to handle that kind of data.
In other words, methods were being developed even before DNA typing was available to evaluate characteristics that were transmitted from parents to children; is that right?
Yes. The time I was a post-doctorate at Davis, the data I was working at were the protein electrophoretic variance.
And these technologies--well, these DNA typing technologies then allowed you and other experts in your field then to look at the types of information or data that they were able to provide science as well; is that right?
You described developing some methods. Was this in your work as a scientist, you developed some statistical methods to look at this type of information?
Does that constitute a portion--and perhaps you can tell us what--you know, how much of that portion--of the approximately 100 scientific publications that you've had published?
Well, I think all of my publications have to do with the interpretation of genetic data.
As far as your publications, do they include articles about DNA typing results in forensics, that is in cases like this case in this courtroom?
Yes. I think since about 1990 when I became involved in this area, I've been working on appropriate methods, and my publications started I believe 1991, and it's been about a dozen I think, peer review papers, laying out statistical methods and the resulting analyses of data for the forensic DNA markers.
Can you describe for the jury, please--of--I'm sorry--approximately a dozen or so of your publications, what did they generally deal with as far as forensic cases?
Well, the issues have changed over time as they always do in science. At the beginning, the issue was--the issue of some doubt was, can we form the frequency of a profile by multiplying together a series of numbers, is this so-called product rule an appropriate thing to do. Well, it is appropriate if all these bits of information are independent of each other, if they really are telling as a series of different things. So we needed to address at the early stage whether or not these DNA profile components were in fact independent. So we set up some tests to see if the data supported that independence.
In a nutshell, can you describe what you mean when you use the term "We looked at independence and whether or not we could use the product rule"?
Well, it's very simple. It's embarrassing how simple it really is in concepts. The question is, we have as a single probe or single locus two pieces of information. It might be a DQ-Alpha 1.1 and 1.2, two types that we see. We would like to know how often those two types occur together in the same person. One way would be just to take a sample of people and see how many of them have that, and that's what we do.
The other way which we'll need to use in the next step is to say, well, a good estimate of that number is the number of people who have a 1.1 times the number who have a 1.2. So these are two independent bits. So they sort of exist in their own right. This one has a frequency of 10 percent, this one has a frequency of say five percent, and we can multiply them together. Now, the testing we do, it's very simple. We just go and get a sample of people and say how many people have the two types, how many would we expect have the two types if they're independent. And we compare what we observe and what we expect. And they won't be exactly the same, numbers being what they are, but providing they're not too far apart, then we'll say it looks like the data are consistent with independence.
Is that the type of work you do or the type of analysis you perform when you're asked to look at an individual case?
Yes. I always have to do that because in the cases, there's always going to come a point where some numbers are multiplied together to end up with astonishing looking numbers. So I want to make sure that this multiplication process is valid in that case.
Or if you're performing research for purposes of one of your scientific publications, is this the same type of information you look at that leads ultimately to your publishing an article?
Well, my articles have dealt with sort of two aspects, one with setting up the tests, you know, trying to make up some rules by which we observed the expected numbers could differ and us not be worried. So it's a little more complicated, but the idea is simple. We have to decide how close do they have to be for us to be comfortable. So that's the statistical methodology. That's the theory. So we develop and make sure those tests behave nicely. Then the second aspect is to apply them to some sets of data.
Now, I'd like to shift your attention if I could, Dr. Weir, to Cellmark diagnostics. Are you familiar with that laboratory?
And could you describe for the jury how is it that you are familiar with Cellmark?
Well, I've met with people from Cellmark, Dr. Cotton and some of her colleagues over the past several years at scientific meetings. Along I think it must have been 1991, when I was developing the test for independence, I was in touch with Dr. Cotton. And I'm not sure now whose idea it was, but I guess together, we came up with the idea that I should examine their databases. So I discussed with Dr. Cotton and Dr. Lisa foreman at Cellmark details of their database, and they sent me a computer disk which was just the DNA types of some hundreds of people that they had collected and typed. I analyzed the data and found--
As far as just in general at this point, your relationship with Cellmark diagnostics, is it in any formal manner? Is there any formal relationship?
At this point, it's extremely formal. My university has a contract with Cellmark to analyze their data.
And when you say "Analyze their data," could you tell us a little bit more about what that means?
As a result of your relationship with Cellmark in terms of looking at their data, you've described how they share information with you; is that right?
And I believe you described they shared a disk with one or more of their databases?
The way of transmitting the data is on a computer disk. The list of data is too extensive for me to have to type it into a computer.
Well, it's the only accurate way. They also give me a computer printout. And I suppose I could type it in, but there is considerable chance of making an error if I did that.
Several. In 1991, they provided what was their current forensic databases consisting of people classified as Caucasian, African American or Hispanic. They--it was just the African American and Caucasian first. The Hispanic came a little later. Last year, they sent me further databases they had collected through their paternity testing program. Each paternity case of course involves three people, a mother, a child and an alleged father. So there's a lot of information they get in the course of their paternity business. And the paternity databases are specified according to Caucasian and African American. They've also sent me their PCR databases. Oh, excuse me. The--the first ones I described were the RFLP, and they've also sent me PCR databases with the expressed request I test them for independence.
When you stated that you received in 1991 databases for I believe Caucasians and African Americans and then later received Hispanic database material from Cellmark--
--is there a title that we can apply with the year perhaps to that set of three databases?
Well, I've come to call those the `92 databases, the 1992. And the--the results of my analyses on the Caucasian and African American were published in a peer review paper in 1992 in the American Journal of Human Genetics.
All right. Perhaps you can describe that a little bit more fully. You ended up writing a scientific publication as a result of your looking at the `92 databases from Cellmark?
And is that one of the publications that is one of the approximately a hundred that you've had published in the peer-reviewed scientific literature?
You also described receiving in 1994 additional databases from paternity cases; is that right?
And then I believe you also said that you received--and I'm not sure you gave the year--PCR databases from Cellmark?
Now, you are familiar, are you not, with Cellmark's methods for calculating population frequencies in cases such as this one?
Because they've told me. I have held discussions with them going back to 1991, and then I've acted as an expert witness in some cases where their results were presented in court. I think the first one of those was in 1991. It was a case in this city.
As far as these databases that you've described, you've already used the concept and you've described independence and use of the product rule. Is that important to be able to use a database?
The--the problem we face is to attach some kind of a number to a DNA profile, we are talking about an event which is so rare that we generally haven't seen it before. And I'm talking about the case, for example, when there's a DNA profile based on say 6 or 7 RFLP probes. When we have that many probes involved, there are so many possibilities, so many permutations of the types of the 6 or 7 probes, there's so many different profiles out there in the world that we can't expect to ever see them all. We certainly have never seen the matching one before in any specific case. But we have a problem. We would like to attach a number to that. We would like to say this is a very rare profile. And by the way, that means it occurs only once in a million people or whatever the number is. We haven't observed a million people. So we have a problem. We solve the problem by thinking a little more clearly and saying, well, this profile, this single profile which is being declared to match actually consists of several components. If it's 6 probes each with two bands, I'm talking RFLPs, there are really 12 bits to the profile. It's like a 12-point pattern. Each of those points, we have a good idea of how frequently they occur in the population based on these databases. A database of a few hundred is perfectly adequate to estimate the frequency of one of the points. And what we find as a good rule of thumb, each one of these points occurs about 10 percent, in about 10 percent of the population. Their frequencies are about 10 percent.
That's rough. It's often less than that, occasionally more, but it's a nice convenient figure to think of. 10 percent is the chance of getting one of these 12 or 14, whatever it is, points. But we have a profile of the whole 12. We want a number for the set of 12. And it's not an amazing thing. We're going to multiply those 12 frequencies together. And suddenly we've gone from something we can understand very easily--
1 out of 10, 10--every 10 people we come across, there's a--there will be a couple of copies of any one of these bands, any one of the specific bands. But now we're asking, how often does the whole package come together. And it's kind of embarrassing. We suddenly end up with numbers that we find very hard to understand. If you multiply 10 by itself 12 times, you've got--you've got 12 zeroes there, and we don't have a very good experience in anything with that--well, apart from the national debt I guess--any--any numbers this big. So we have gone--now, this is--it's an important number obviously and it depends crucially on being able to multiply.
If this independence of these different characteristics didn't exist, would that mean then that the number may not be accurate, that the estimate may not be accurate?
Is that the type of examination that you have developed tests for, statistical tests to determine if this independence exists?
Is it also the case that you use tests, that is scientific and mathematical tests that have been developed by others for this type of analysis?
Oh, yes. I don't insist on doing it all myself. Some of the tests--tests that I've used in this case, for example, the tests are very old. They go back to a very famous statistician, an R.A. Fisher, writing in the 1930's. The tests he devised were good, but they were very difficult to implement. And another very prominent statistician in this country, Dr. Elizabeth Thompson, gave us in essence a trick, a way of doing the tests on the computer specifically for these databases.
Now, you mentioned that you have actually had published, that is a scientific publication about the 1992 databases used by Cellmark?
What did that publication involve? Can you capsulize or give us a summary of that?
Well, it was--once again, it was very simple. I took the two databases I had at that time and I think there were four RFLP probes at that time. Cellmark had collected types from 2- or 300 African Americans and 2- or 300 Caucasians.
So we had those databases. And then I examined the question, does each--each pair of bands at a probe, does that occur as often in people in the databases as I would expect under the assumption of independence. And then--that's one probe. Then I went to two probes, is this--when I look at two probes, each with--I can't work my fingers here. We've got two probes, one with two bands, the other one with two bands. Is a band of this probe, is it independent of a band of that probe, because we're going to end up multiplying the whole four together. So I did some tests--actually, I conducted tests that we had published previously--that weren't Dr. Thompson's tests. I had applied tests we had published about previously, applied them to the data and presented the results.
Were you able to reach any conclusions from this analysis that you published in the scientific literature?
Yes. The conclusions were that by and large, Cellmark's databases were consistent with independence.
KEY QUOTEAnd by "Consistent with independence," meaning that the results obtained from each of these probes could in fact be multiplied together?
All right. We'll return to that in a little bit, but what I'd like to ask you about now, Dr. Weir, is, have you had any previous experience--let me rephrase that. You're familiar with the California Department of Justice; is that right?
All right. Can you tell us a little bit about your experience with the Department of Justice? And I'm referring to the California Department of Justice DNA laboratory.
Well, over the last six months, I've had several telephone conversations with a Mr. Gary Sims and obtained from him a fairly good understanding of how it is that DOJ attaches numbers to their matching profiles.
Is there any--and you're familiar with the fact that the Federal Bureau of Investigation in Washington D.C. Also has a DNA laboratory, correct?
To what extent if any does the Department of Justice in California rely on work previously done by the FBI? And I'm only referring right now to the area of population frequencies.
Oh, they rely--it's a totally--all the numbers that DOJ generates in a case based on data collected by the FBI.
We'll return to this case a little bit later as I mentioned, but for purposes of this question, to your knowledge, was the FBI or were the FBI databases used by DOJ in calculating their frequencies of results in this case?
Now, you've mentioned the FBI. What type of relationship or working relationship, can you tell us a little bit more about that, have you had with them?
It's been a very close and harmonious relationship since the end of 1989. Dr. Budowle asked me to work with him in examining their databases. So he once again explained the procedures they arrived--that they used to construct their databases. Theirs was a little different from Cellmark's. They had a somewhat wider base of people at that time. They had blood--they had DNA collected from our blood banks around the country and constructed a database, just a listing of the profiles of some hundreds of people. Dr. Budowle provided me with those databases. Those were three ratio groups, Caucasian, African American and Hispanic and further subdivided into geographic region, Florida, California and Texas. He provided me with those data. I performed some different tests on those because the FBI has a different procedure from Cellmark of deciding when bands match, for example. So I analyzed those data with Dr. Budowle's cooperation and published those results in a paper published in 1992 in the journal of genetics.
We'll return to that in just a second. But this process of obtaining databases from the FBI, was that done in a similar fashion to when the databases were provided to you by Cellmark?
And that led to your publication--and I'm sorry. What year was it that you reviewed, that is authored a scientific publication about the FBI's databases?
The paper was published in 1992. It went through several revisions if you like. It was written actually I believe early in 1991.
Just to be clear, both the publication that you authored about Cellmark as well as the 1992 publication about the FBI, do those relate to RFLP data only?
As far as your review of the FBI's databases, can you give us again a brief summary of your findings?
Yes. I found what I expected to find and I think what any population geneticist would expect to find. The databases were consistent with the various components of these DNA profiles, all being independent one from another.
Incidentally, have there been publications by other scientists about the FBI's databases as well?
Oh, there have been several, several other people who have performed a similar role to myself, a Dr. Chakraborty for example. Should I spell that?
C-H-A-K-R-A-B-O-R-T-Y, from the University of Texas has performed even more extensive analyses than I have.
Now, as far as--and I'd like to take you into individual casework such as this case. Have you before this case been involved in evaluating this data or population frequency information of other laboratories--and actually let me rephrase that if I may. I think that's confusing. As far as your looking at databases from both the FBI and Cellmark, have you looked at other laboratories' databases?
Well, we've had the private Cellmark mentioned. Also, there's a lifecodes, which is a private company, genetic design, which is in North Carolina, and Roche Biomedical, private companies. State agencies, the State Bureau of Investigation in North Carolina, the South Carolina law enforcement division, crime labs in two counties in Florida, crime labs in Minnesota, Oregon, Toronto, Australia, New Zealand and United Kingdom.
So in terms of looking at a database, that's something that you've had experience with on many occasions?
Turning your attention now to this individual case--and you've spoken already of having conversations with I believe Robin Cotton and Gary Sims about this case; is that right?
Yes. I believe I've received all their written reports, at least the ones containing frequency estimates.
Did you also receive any raw data? And by raw data, I mean something other than a formal written report.
Well, I already had the databases that both labs use, Cellmark's own databases, the FBI's databases that DOJ uses for RFLPs. And then as part of their reports, I received information on the detailed fragment lengths, the sort of the nuts and bolts of the matching profiles.
When you say "Fragment lengths," you're referring to--and you've used the term "Nuts and bolts." Are these the actual approximate sizings of each of these bands that the jury has seen from the x-rays that we looked at some weeks ago?
Yeah. That's right. Each of those bands has a number attached to it like 1369, the estimated fragment length. So I got all those numbers.
Were you also provided with each laboratory's estimate of the approximate frequency or approximately how often characteristics found in the various items of evidence in this case are found in various populations?
Were you also provided with the data about the results of PCR testing in this case as well?
And that included the marker and I think you've already given a short example of DQ-Alpha?
And also, were you given the actual results, the typing results from use of the marker D1S80?
From a statistical standpoint or a population frequency standpoint, when you make determinations in an individual case, do you use the actual types reported by the DNA analyst or the laboratory that's involved?
In other words, do you ask for and want to look at, for instance, the x-rays or the photographs of typing strips so that you can evaluate what the types are that are shown?
No. I'm not qualified to do that. I have to start with the--with the information written down, the fragment length or the DQ-Alpha type from the strip. I'm not qualified to do the determination myself. I must accept what the forensic scientists say they found.
In other words, you rely on the expertise and experience and training of the DNA analyst to determine what types and what matches exist?
As far as an exclusion, do you have any interest or can you assign any frequency to an exclusion?
No. An exclusion is an exclusion and we don't want--we don't need to attach a number.
As far as your work in this case, did you also have--and you've described a little bit about discussions with Gary Sims and Robin Cotton. Did you have discussions when you had a question about a particular result or a database used and so forth?
If there was something that you couldn't determine from the actual materials you were provided, did you have one of these telephone or in-person contacts?
As far as this case is concerned, did you then perform certain calculations and look at databases and results?
Ultimately, did you reach certain opinions or conclusions about various factors or various areas of the frequencies that are involved in this case?
Well, I don't have opinions. I have estimated the frequencies of all the matching profiles in this case.
As far as population frequency data--and let's take a forensic case just like this one. When a number is presented in court, whether that's a number of 1 out of 10 or 1 out of 10 million or whatever, is that a precise number?
It's our attempt to attach some meaning to the match. The 1 in 10 will obviously be more precise than the 1 in 10 million. We are--we are attempting a very difficult task, to attach a number to a very rare event. An argument could be made that maybe we should stop doing that for 7 RFLPs or 11 RFLPs if we have a match with 22 fragments all lining up. It's inconceivable that we could come up with a number that would make any sense, meaning that we just have no experience of such an enormous number. So--but the other thing is that we will never know for sure how many people have any particular profile. We expect that by the time we get to 7 RFLP profiles, there will be very few if any other people in the world with that type. But we'll never know that.
We can't type everybody in the world. We can't even count all the people in Los Angeles. That's just not possible. We have to deal with estimates, and our estimates have to be based on the information we have available to us. And that means we have to go out and type as many people as possible. And just the practicality of that means, we'll be talking about a few hundred or a few thousand people who will be typed. Now, Cellmark in 1992 has a Caucasian database of--I don't know--300 people, and you can base an estimate on that. And we know if we had a different database of 300 people even collected under the same conditions, we would get a different answer because the answers we get are estimates of the true frequency, whatever that is, which is not known and we never will know it. So we have to be very sure to interpret these estimates as simply that. They are good estimates of a quantity, which we will never know for sure.
As far as the databases used in this case, did you in fact look at those databases and perform some of the tests that you've described earlier?
Just to make sure that it was legitimate to multiply the numbers together to end up with the profile frequencies.
Did you also look at the additional database that you received from Cellmark for 1994?
As far as the data used by the Department of Justice, can you tell us what databases you looked at there? And I'm only talking about RFLP at the moment.
Yeah. The RFLP databases are collected by the FBI. The FBI's databases have been expanded by the addition of additional probes over time. When I wrote my 1992 paper, I think there were five probes at that time, and now there are seven in the database. So the databases have been expanded. So I examined the current version, the databases I received from the FBI this year and used them--examined the databases, checked for independence and applied them to the profiles in this case.
As far as the PCR databases--and let's refer to those, and if you would, just describe again as you did with regard to the RFLP--did you test those databases as used by the laboratories in this case, Cellmark and Department of Justice also?
Yes. Cellmark has their own PCR databases, and they asked me the summer of 1994 to analyze them, which I did. And the DOJ's a little more complicated. They rely on several different sources of information about PCR types. They don't generate their own databases. I'm--I chose to examine an FBI database for PCR types. Those data--the database of the FBI has been described in publications by the FBI which I've reviewed. But in my--from my perspective, more importantly, I've examined that database myself. It contains the whole seven PCR types, that's DQ-Alpha, D1S80 and the five polymarkers, the whole seven loci type on the same people so that I could do tests for independence of those types.
Does it make any difference in--and referring to these PCR markers--that you looked at a different database than Cellmark used to report their frequencies from the PCR results in this case?
Well, the difference is that I'll get a different number, just as Cellmark will get a different number if they added a single person to their database tomorrow. Every database gives its characteristic number. What we have found in this whole business over the last five years, once you start getting several loci, several probes, we can reach conclusions that a matching profile is rare, no matter which database we use. It sounds kind of strange at first sight, that we should get numbers from different--from different racial backgrounds which, although they're different, they're not often wildly different. So we get differences between racial groups, between different labs, between databases collected in different years. They're all different numbers. They're all addressing the same issue, and I don't think it matters, providing we keep in mind that what we're trying to do is to establish whether or not a matching profile is rare.
As far as these tests that you performed on the various databases that you've just described, is there a way that you can summarize the results? And I'm talking about your examination of the databases.
Well, there's a general statement I would make. That by and large, the components of the DNA profiles in this case are independent. There are exceptions. Occasionally I will find some evidence of an apparent association, some lack of independence between parts of some of the profiles.
Oh, association, it's--refers to the lack of independence. If the piece of DNA a person receives from one parent is somehow associated with that from the other parent, meaning that if I tell you this person has a 1.2 DQ-Alpha, if that gives you any information at all about the other type, if that makes it more likely that the next one, the other one is a 1.4, then we have associated fragments. Now, we wouldn't expect that to be the case in these DNA markers, which as far as we know, don't have any effect on us. They don't affect our health or well-being. So we expect them to be independent. We don't expect people to marry based on their DNA types, we don't expect the two parents to have associated types and we don't expect them to transmit associated types to their children.
So when we get a database, we expect to find that the two bits are independent. And by and large, that's what we find. Not a hundred percent of the time. Occasionally we find that a particular pair of types will occur more often than we expect. Sometimes it will occur less often. And when I say "More often," I mean statistically, significantly more often.
Now, was that the case--as far as the conclusions that you've just described where you've described sometimes you see this association, was that the case in your analysis of the databases used in this case?
It happens occasionally. Roughly speaking, it happens about five percent of the time, when I do a test, I will find an apparent departure from independence.
Well, I don't think it means anything. It's the nature of what we do. When we do a test, when we look for independence, we don't know the answer. We don't know if they're independent or not. We don't know for sure. We're going to go through a procedure. We're going to do a test as I described, comparing observed and expected and end up making a statement.
Before we even do it, we know there's a chance that we're going to make a wrong call. We devise our tests to make that chance of a narrow small. And for this particular carnivora, saying there's an association when there really isn't, but saying there is, we devise things to make sure that we'll have a narrow, which is small, and we choose the small number of five percent. So we do what is called a five-percent test. It sounds like we're making life hard for ourselves, but that's the way it is. We do a procedure. Most of the time it's fine. Five percent of the time, it says, oh-oh, there's an association here. And we don't know--when we look at that, we don't know if that's the five percent--the 1 in 20 that we would expect or if it's something really biological going on and these two bits of DNA really are--have an affinity one for the other and they're going to be transmitted to--we just don't know by looking. So we have--we don't have a problem. We just say here are two bits of DNA which are associated. They're not telling us two bits of information. We shouldn't pretend we have two matches, one at each band.
We've sort of got somewhere between one and two. So I choose to just use one of those two bands. I'm going to throw away the other band for my calculations because I can't be sure that it's another bit of information. So I'll just use one of them.
In this particular case, did you perform calculation process--I'm sorry--perform a calculation process on various pieces of evidence?
In doing that, did you use this approach that you just described; that when you occasionally found this association, you would eliminate one of those pieces of information to make sure you weren't overstating basically how frequent or how--approximately how frequent a match may be?
By not considering a particular fragment or a particular allele as that term has been used in this court, would that be an example of using a very conservative approach to making a frequency estimate?
Yes. It's conservative in that by--in essence, discarding, throwing away one of those matching items, we're not figuring that into the calculation. So the frequency looks to be less rare.
When you discard or not use that one band, you're not reaching the conclusion, are you, that those items or bands are actually associated, are you?
No. It's my feeling that they're not. I can't--I don't know of a biological reason why they would be, but I don't know--I don't know what the reason is. It's just--it might be just that these particular set of 300 people in this database happen to have this particular pair of alleles more often than not. I don't--I don't attach much meaning to it, but nor can I discount that finding nor can I ignore that finding.
The calculations that you made in this case based on the various fragment lengths if the test was RFLP or the different alleles if the test was PCR in origin, were those calculations made based on methods that you've developed?
As far as the methods you used as well as the methods developed by others, have they been published in the scientific literature?
Oh, the methods have been published by myself and others over a long period of time, yes.
Are there places along the calculation process where you have to make decisions about things?
Oh, well, there are. For example, when we find associations, we have to decide what to do. And I've described what I do is discard one of those bands.
Are there other examples of decisions that need to be made along this process of calculating a profile frequency or calculating the frequency of matching characteristics?
Well, there are--through the whole chain of things right from the time when the forensic scientist has to declare a match up to the final answer comes, there are a series. So there's a watching criteria. And that's really beyond my expertise. I'm sure you've had it described. But the labs have a rule by which they'll say two RFLP fragments are close enough together that they match. They won't be exactly the same length, but they'll be so close that they look to be matching. And now--so the question, as I said, these types occur 10 percent of the time, that's overstating--oversimplifying it a little bit. Supposing we had a fragment length 1369 basis and we went out and looked at a thousand people, looked at all their bands. Chances are, we won't see any band of 1369. The possible range of band lengths is so enormous, there are so many thousands of different types, we won't see them all. So what we do is, we'll say well, how many times are the bands close to that one occur in the database. How many--Cellmark says how many occur in a floating bin around that database. So we have--it's--it's a decision on how to attach a frequency. There are other ways of doing it. That's what I'm trying to say. There are other ways. This is a conservative procedure.
In other words, there isn't simply one method that's necessarily or necessarily needs to be followed by every laboratory?
Are there other steps along the way wherein you, for instance, in your own calculation method have to make decisions? I'm referring to the approach.
All right. Your Honor, if we could take a break at this time, perhaps the witness' memory will be no longer blank about my question.
All right. All right. Ladies and gentlemen, we're going to take a brief recess at this time. Please remember all my admonitions to you; don't discuss the case amongst yourselves, don't form any opinions, allow anybody to communicate with you, don't conduct any deliberations. We'll be in recess for about 15 minutes. All right. Doctor, you can step down.
I choose not to work with Defense attorneys because I don't feel able to mount an adequate Defense. In other words, an attack on these DNA matches.
If you multiply 10 by itself 12 times, you've got--you've got 12 zeroes there, and we don't have a very good experience in anything with that--well, apart from the national debt I guess.
I'm convinced that this methodology is sound and I should say that I'm convinced that the statistical methodology is sound. So I can't think of a way to attack that.
The conclusions were that by and large, Cellmark's databases were consistent with independence.