Loading...
 

Privacy and DNA Databases

What are the facts and current understanding on privacy with regards to Genetic Genealogy testing? There have been vocal naysayers since the main streaming and popularity of the service. Especially with the growth of online databases held by companies — both non-profit and for-profit. So we try to capture information here for everyone to make a more informed decision. This is a constantly changing landscape with law enforcement and the court system playing catch up on a field they used to lead in. There are many ethical, medical, scientific, legal, and governance issues.

Much of the recent controversy started with a March 2015 publication in New Orleans that did not garner much attention. The story was then picked up and published by Wired in October 2015 and then by many, many others shortly after as it went viral. The key summary is a prosecutor submitted a DNA crime sample to one of the autosomal testing services and then proceeded to use weak matches there to implicate others in a crime. This exploded over 2018 when a suspect was arrested in the Golden State Killer case; apparently solved in a week in a similar manner using a DNA Adoption trained genetic genealogist. This led to the creation of a business division named Parabon Nanolabs to support and link these fields that then quickly allowed another 20 cases to be solved in Fall 2018. Now see the latest Science article in Oct 2018 and the coverage of it again by Wired. With the controversy surrounding the use of the database for criminal investigation, especially given the databases include many European Union members where there are much stricter standards than applied in the USA, many of the companies changed their terms of service to require law enforcement uses to identify themselves and allowed users to opt-out of being included in law enforcement searches. This mostly applied to databases allowing third-party test results to be imported. Ancestry and 23andMe still work their best to keep their databases from being used for such purposes and contest requests from law enforcement for access to their data. Well even this two tier approach came crumbling down in Summer 2019 when a local police investigator was able to get a "fishing" court order to break through the "protective" firewall at GEDMatch and gain access to the full database to find matches to his crime scene DNA evidence. This has many, especially in the EU again, upset about how easily the bar was lowered for police investigative access.

The take away summary is: if you are concerned and do not wish your data to be used by parties you are not aware of, do not upload to 3rd party sites and maybe avoid testing all together. At minimum, stick to the biggest two who have fought vigorously at giving access to their customer data other than through submitting saliva samples and using their laboratory.

Key Points

  • FBI and similar law enforcement databases (often referred to collectively as CODIS) use autosomal STR testing of about 20 markers across the 22 autosomal chromosomes ONLY. Genetic Genealogy uses 500,000+ SNP markers on the autosomes, as well as SNP markers on X, Y and Mitochondria. The only STR markers used are on the Y chromosome.
  • AncestryDNA and 23andMe both reported, in Summer 2015, that they had reached over 1 million testers in their databases. By default, both included all testers in their match database at that time. As of Summer 2016, AncestryDNA has passed 2 million testers in their database. And as recently as the summer of 2019, Ancestry reports well over 15 million testers and 23andMe around 10 million.
  • FamilyTreeDNA, the early pioneer in the field that is the only early company still in existence and independent, has traditionally focused on Y DNA testing. They have far less testers in their database. Only recently did they expand to autosomal testing. And their match database appears to be comprised far more of "transfers" in from AncestryDNA than testers who used their service directly.
  • MyHeritage not only offers autosomal testing directly now but also supports "transfers" in from ALL other testing companies. FTDNA supports transfers in from most.
  • Y and Mitochondrial DNA matching only helps identify within the Patriline and Matriline; respectively. Deep Y DNA testing can isolate to within the last few hundred years. Even full sequence mitochondrial DNA testing only helps for those in your line in the last 1,000 years or so. So the real worry here is with autosomal testing; the more common and popular these days.
  • --ySearch and mitoSearch are more open databases maintained by FamilyTreeDNA where testers have the option of uploading too. mitoYdna.org is a new entrant trying to replicate these services that were closed down in May 2018 when EU GDPR regulations came into force.
-- GEDMatch is the closest equivalent to these more public databases and used exclusively for atxDNA. It has become very popular and its use has been encouraged in this project. Unlike the former two, GEDMatch is run by principals who are/were affiliated with RogersDNA, an ad-hoc surname project similar to this one. GEDMatch have greatly expanded, become popular, and even now have a revenue stream. It would not be surprising to find them bought up by a larger, corporate entity who gains rights and access to the database. Other databases for Sequencing test results on Y outside of test companies themselves are yfull, ytree, yDNA-Warehouse and FGC..

Best Practices

Many of our project admins are deeply invested in Genetic Genealogy. As a result, we have tested many near and distant family members. While we never believe our relatives or those of distant people we test would be involved in a violent crime, it does happen. So it is better if you have some control over the release of information when using public match DNA databases. Practices we have picked up on that may help others here are.
  • Never use a full or even partial name on a test kit. Optimum would be totally random sequence of numbers and letters. A compromise is maybe a project code and then other code for a given tester. Ancestry does not divulge full names of test kits managed by someone else. Instead they divulge it as "M.I. (managed by user xxxxx)" where M.I. are My Initials (of the tester) and xxxxx is the ancestry user name managing the kit.
  • Never publicly publish, on the open internet or otherwise, the association of a test kit ID with an individual (living or dead). Keep this private to a limited set of individuals. At most, provide an association to an ancestor that has been deceased for 100 years (or more) and at least 3 generations away. If you feel you must submit a tree attached to a kit, work to anonymize the individuals born within the last 100 years — no names or geographic locations.
  • Utilize a unique email address and similar contact info for each kit on each database. Or at least for your genetic genealogy work. So as to keep it separate from other personal, work or genealogy contacts. This way, you maintain a little more privacy and distance as the test kit manager and your otherwise more public footprint left across the internet.
  • Always investigate any reach-out match and attempt to verify it is a real relative before divulging any information about your tester.
Note that some researchers and testers wish to be more available and out there. Pages in this project are a testament to that. These guidelines are more to protect those who may not be aware of nor necessarily wishing to participate in a way implied by the activity surrounding consumer-oriented genetic genealogy testing.