Human Genome Build

Since it is getting lots of press lately due to genetic genealogy companies changing their reference build type, we explain briefly what the build is all about here.

When the international Human Genome Project effort started in the late 1980’s, it was understood that the result of developing and understanding the mapping of nucleotides on each chromosome needed to recorded and available in a common way. That is, a database of the Human Genome as it was understood at that time. Even though all mammals share 99+% of their DNA with each other, every human has slightly different DNA. And those differences were as simple as a single nucleotide code change up to whole sequences of nucleotides being added, deleted or reordered in some determined way. So the Human Genome Project created what they termed a reference build that documents the known sequencing of nucleotides (or base pairs) in a defined reference human. The build has always had a focus on the genes in the DNA strands with a deep understanding is their nucleotide sequence and where it exists in the larger chromosome strand. As that work finished, effort was given to the inter-gene (sometimes called junk or non-coding) areas of the DNA strand which comprises the majority of our DNA. This area is where the largest and most drastic changes occur and thus is more difficult to fix as a reference build. These larger changes also cause any count of the nucleotides starting from a reference point, such as the ends of the strand or Telemere, to be difficult to keep stable unless the reference genome is also somewhat stable in its count.

As of Summer 2017, the Genome Reference Consortium is on Human Genome Build 38 (or HG38 for short). Most autosomal rest results from the first half of the 2010 decade were delivered as HG36 or HG37 model results. Some, like FamilyTreeDNA, had been delivering their X and Y DNA SNP results in context of an HG19 build. STR results are often given with named STR markers. SNP results are often given with “RSid” numbers. STR markers are named and generally independent of the reference build that defines where they reside. Whereas the “RSid” nomenclature is specific to a build and used as often the SNP may not be named yet. The “RSid” includes the location (base pair count from the end) as part of the identification. Hence, to do an apples-to-apples comparison of two different test results, you need to make sure the nomenclature is the same and, if using “RSid” names, from the same reference build.

External References