# On DNA Testing

The human genome has sequences of nucleotide base pairs that are repeated over and over again. At each locus of interest, a person has two sets of repeats inherited from each parent. Each possible difference at a locus is an allele. The combinations of the possible differences at multiple loci form a DNA profile that can be used to tie suspects to a crime scene.

## The Accuracy of DNA Testing is Wanting

74/108 crime labs erroneously incriminated a suspect during a mock study. The reported match statistic for whether the match was coincidental varied over 100 trillion-fold.

There are methods (e.g. TrueAllele ) to reanalyze old DNA mixture data using software without the need for lab testing. This could help correct convictions made on faulty DNA tests.

## Your genome isn’t private. Maybe it never was

Golden State Killer case, 2018: Investigators upload sample from 30-year old rape kit to GEDmatch, a genealogy site for finding relatives. Investigators then pieced out family trees, rough age and location down to a single man. Neither he or any of his immediate relatives had made their DNA public.

Given the increasing success of consumer genetics companies, 1 in 25 American adults is in some company’s database, anyone can track down your identity if they have access to your genetic info (think hair strand).

Furthermore, $$\approx$$ 38 genetic variants give good predictions on facial structure.

## Advancements in Genetic Sequencing (and the Role in COVID-19)

The advent of commercial genome sequencing is compared to the invention of the microscope. The National Human Genome Research Insitute invested $200m over 15 years in startups trying to lower the cost and raise the speed of whole-genome sequencing. Many failed, but Solexa’s optical sequencing technique proved successful, and was eventually absorbed into Illumina - the de facto leader in the industry. The HIV genome has 10k letters. SARS-CoV-2 genome has 30k. The human genome has 3b. The seminal Sanger sequencing of the 1970s was labourious: ~35 base pairs in a year. Fast forward to 2010, and we could do 500k letters a day. And now, 3b pairs can be done overnight. Illumina’s NovaSeq 6000s go for ~$1m a pop. In 2014, Illumina announced the $1k genome. In summer 2020, the sequencing cost for a genome was$600. Oxford Nanopore also has an electrical-based (as opposed to optical) and portable, albeit less accurate sequencer.

In Jan 2020, a Sydney virologist, with the consent of the Chinese scientist, uploaded the COVID-19 genome sequence to virological.org . Teams from all over the world could analyze the sequences and upload new ones, and hypothesize mutations and spreads.

Genetic surveillance is a popular trend in the community. The idea is to monitor samples and then identify hidden pathways of transmission and curb the spread of infection.

