Copyright © University of Cambridge. All rights reserved.

## 'Is Your DNA Unique?' printed from http://nrich.maths.org/

As you may know, DNA is made up of of four different bases:

-Adenine (A)

-Cytosine (C)

-Guanine (G)

-Thymine (T)

Suppose that the bases are randomly distributed along a single strand of the DNA:

i) If my DNA single strand is 10 bases in length, what is the probability that it contains only a single adenine?

ii) If my DNA single strand is 150 bases in length, what is the probability of a 30% cytosine content?

iii) If my DNA single strand is 1000 bases in length, what is the probability of getting at least 5 thymines in a row, as least once?

iv) The human genome is approximated 6 billion bases in length. What is the probability that another individual has the same genetic composition as me?

v) The bacterial restriction enzyme BamHI cuts DNA at the site GGATCC. If I digest my genome with this enzyme, how many cuts would I expect to occur?

DNA sequencing is a very laborious task, and requires expensive machinery and complicated computational power. DNA fingerprinting is a technique carried out by forensic scientists in order to match a sample of DNA to a number of suspects - this is commonly used in identifying a person from among a number of suspects who may have been at a crime scene.

However, since the sequencing of the entire human genome is so difficult, a different approach must be adopted: it has been found that most of the human genome is largely identical between individuals, except for single bases which are particularly varied in a population. These single bases occur approximately once among every 1000 bases. By comparing these particular sites between individual
samples of DNA, it is much more rapid to identify to a high degree of accuracy whether the two DNA samples are identical.

vi) If approximately 1 in 1000 bases is variable, what is the probability of an individual having the same genetic composition as me?

vii) How many of these variable sites should be investigated to identify a suspect to 99.99% probability?

viii) If we remember that DNA occurs as homologous chromosomes, and that these variable sites occur in the same places across a pair of homologous chromosomes, how many of the sites should be investigated such that the probability of a misidentification is smaller than 1 in 1,000,000?