Privacy and DNA Databases

What are the facts and current understanding on privacy with regards to Genetic Genealogy testing? There have been vocal naysayers since the main streaming and popularity of the service. Especially with the growth of online databases held by companies — both non-profit and for-profit. So we try to capture information here for everyone to make a more informed decision. This is a constantly changing landscape with law enforcement and the court system playing catch up on a field they used to lead in. There are many ethical, medical, scientific, legal, and governance issues.

Much of the recent controversy started with a March 2015 publication in New Orleans that did not garner much attention. The story was then picked up and published by Wired in October 2015 and then by many, many others shortly after as it went viral. The key summary is a prosecutor submitted a DNA crime sample to one of the autosomal testing services and then proceeded to use weak matches there to implicate others in a crime.

Key Points

  • FBI and similar law enforcement databases (often referred to collectively as CODIS) use autosomal STR testing of about 20 markers across the 22 autosomal chromosomes ONLY. Genetic Genealogy uses 500,000+ SNP markers on the autosomes, as well as SNP markers on X, Y and Mitochondria. The only STR markers used are on the Y chromosome.
  • AncestryDNA and 23andMe both reported, in Summer 2015, that they had reached over 1 million testers in their databases. By default, both included all testers in their match database at that time. As of Summer 2016, AncestryDNA has passed 2 million testers in their database.
  • FamilyTreeDNA, the early pioneer in the field that is the only early company stilli existance and independent, has traditionally focused on Y DNA testing. They have far less testers in their database. Only recently did they expand to autosomal testing. And their match database appears to be comprised far more of “transfers” in from AncestryDNA than testers who used their service directly (to be confirmed).
  • MyHeritage not only offers autosomal testing directly now but also supports “transfers” in from the other testing companies.
  • Y and Mitochondrial DNA matching only helps identify within the patrilineal and matrilineal lines; respectively. Deep Y DNA testing can isolate to within the last few hundred years. Even full sequence mitochondrial DNA testing only helps for those in your line in the last 1,000 years or so. So the real worry here is with autosomal testing; the more common and popular these days.
  • ySearch and mitoSearch are more open databases maintained by FamilyTreeDNA where testers have the option of uploading too. GEDMatch is the closest equivalent to these more public databases and used for autosomal. It has become very popular and its use has been encouraged in this project. Unlike the former two, GEDMatch is run by principals who are/were affiliated with RogersDNA, an ad-hoc surname project similar to this one. GEDMatch have greatly expanded, become popular, and even now has a revenue stream. It would not be surprising to find them bought up by a larger, corporate entity who gains rights and access to the database. Other databases for NGS test results on Y are yfull, ytree .

Best Practices

Many of our project admins are deeply invested in Genetic Genealogy. As a result, we have tested many near and distant family members. So there are practices we have picked up that may help others here.
  • Never use a full or even partial name on a test kit. Optimum would be totally random sequence of numbers and letters. A compromise is maybe a project coding surname and then initials for the given name. Ancestry.com does not divulge full names of test kits managed by someone else. Instead they divulge it as “M.I. (managed by user xxxxx)” where M.I. are My Initials (of the tester) and xxxxx is the ancestry user name managing the kit. We have used H600 as the project code for some testers that have been enlisted as part of the wider project. One admin uses the first letter of the birth surname of the grandparents of an individual to name a project.
  • Never publicly publish, on the open internet or otherwise, the association of a test kit ID with an individual (living or dead). Keep this private to a limited set of individuals. At most, provide an association to an ancestor that has been deceased for 100 years (or more) and at least 3 generations away.
  • Utilize a unique email address and other contact info for each database. Or at least for your genetic genealogy work that is separate from other personal, work or genealogy contacts. This way, you maintain a little more privacy and distance from the test kit manager and your otherwise more public footprint left across the internet.