[University Introductory Statistics] DNA crime scene

Nadeko

New member
Joined
Mar 6, 2016
Messages
1
Hello,


I am stuck on a problem I have been given in school. The problem is the following:


"There has been found a DNA of type S on a crime scene. We will assume a total population of N = 5000000 that are potential contributors to the lead. Next assume there is a DNA-database consisting of n = 30000 individuals. Also assume that there are M = 50 individuals in the whole population that have a DNA of type S."


There are six sub-questions (a)-(f), and I am stuck on (d)-(f). I will simply explain what questions (a)-(c) are about, and then write up questions (d)-(f).


In (a) I let X = number of individuals with type S in the database. Sample space was x = {1, 2, ..., 50}. Then I used a hypergeometric distribution formula to calculate the probability distribution of x. In (b) I am just asked to explain why a binomic distribution also is a good approach, and calculate the probability distribution. In (c) I am asked to calculate P(X = 1), which is approximately 0.22.


Now here are the next three questions:


"(d) Assume that every individual in the population have the same likelihood of being a contributor. Let A be the event that the contributor is one of the individuals in the database. Calculate P(A).


(e) Find P(X = 1 | A).


Hint: When we know that the contributor is in the database, then there are M - 1 = 49 left who we do not know is in the database or not. Argue that we then are interested in the probability that none of these are in the database.


(f) Find P(A | X = 1). Argue that this corresponds to the probability that the individual with matchin DNA profile in the database is the culprit."


Hope anyone can give me any tips on what I should do to solve this. Excuse my language if anything is unclear, English is my second language and I have translated this from my first language.
 
"There has been found a DNA of type S on a crime scene. We will assume a total population of N = 5,000,000 that are potential contributors to the lead. Next assume there is a DNA-database consisting of n = 30,000 individuals. Also assume that there are M = 50 individuals in the whole population that have a DNA of type S."

(d) Assume that every individual in the population have the same likelihood of being a contributor. Let A be the event that the contributor is one of the individuals in the database. Calculate P(A).

(e) Find P(X = 1 | A).

Hint: When we know that the contributor is in the database, then there are M - 1 = 49 left who we do not know is in the database or not. Argue that we then are interested in the probability that none of these are in the database.

(f) Find P(A | X = 1). Argue that this corresponds to the probability that the individual with match in DNA profile in the database is the culprit.
What are your thoughts? What have you tried? Where are you stuck?

For other viewers, be sure to check (place 1), (place 2), (place 3), and (place 4) before replying, to reduce confusion. Thank you! ;)
 
Top