Y DNA Markers (Simplified)

There is still a lot of confusion in the community between the two types of genetic markers that are tested for: STR and SNP. And the fact that a different number, and collection, of markers may be tested by different test companies and levels of service. Thus making comparison of results between different companies a little more difficult. Finally, there is confusion about the accuracy of different tests. Mostly in the relationship of the number of genetic markers tested and their "quality" to determining a match to others or possibly placement in a phylogenetic tree.

To start with, we provide this simplified table and place various tests within it to make a bit of an apples to oranges comparison possible. We describe where the analogy holds and breaks down after.

Marker	Fewest Markers	Fewer Markers	Some Markers	More Markers	Most Markers
SNP	MyHeritage(500) Ancestry (1,000)	23andMe (3,000)	NGG (12,000) BigY (30k) (No Longer Available)	BigY-700 (300K+), WGS (26+mil)	Sequencing/30x-WGS Test like from Dante Labs, FGC, Nebula Genomics, and YSEQ
STR	y12 / y25	y37 / y67	y111	BigY-700 (700+), yFull analysis (900+)	WGS 3^rd generation Long-Read Sequencing or 2.5D Bar-Code Technology (15K+segments) (or the hope; when it becomes reliable)

Table of Y DNA Markers Covered By Different Tests and Companies

The analogy of comparing the SNP and STR tests works well to understand the different number of markers and the likely "precision" of the result from a particular test and company compared to one another. The analogy breaks down in that the selection of markers in the tests are not optimum for a smooth refinement of knowledge about the tester. They are just what the company chose and maybe all that was known at the time of creation. For some people, a few marker test can get fairly deep in the haplogroup tree. For others, not very deep.

The STR markers were carefully selected with the science known at the time. For some, amazingly and not intended, the first y12 markers contain enough very unique values for a given surname line that it can almost guarantee membership in that line if your markers are a close match (GD of 1 or 0). For others, even the y111 marker test is not enough to distinguish between surname lines. Unfortunately, even today, the science is not very clear as to what are the best markers to track close or distant surname lines. It is still evolving and nascent a science for STRs.

The SNP markers were a little better selected to match those needed to clarify the early branching on the phylogenetic tree (now, in yDNA). Kind of like they knew what to look for before deciding what markers to include in the various test levels. So the refinement of testing does more truly refine your haplogroup deeper in the tree. That is, closer to the current time for when the marker changed in the history of the human species. An example showing this refinement within one of our surname groups can be viewed here. With the latest science and analysis, both with SNP testing of the yDNA and mtDNA, the phylogenetic tree is developing into a much better knowledge of anthropological lines through time. So much so that we are beginning to find and track SNP changes in the genealogical time frame; or at least the last 1,000 years. Many surname groups have been able to avoid the expense of the wide, large net casting of Sequencing testing for some of their members and instead doing singular, SNP tests to verify leaf haplogroup membership in the phylogenetic tree. This, like for those few surname studies that were lucky enough to have this occur with the y12 test, opens up a whole new range of testing and verification possibilities without the expense of deep testing everyone. But, as the 3rd generation of genetic genealogy takes hold, WGS testing is opening up deep testing from the outset for everyone.

From https://www.facebook.com/photo.php?fbid=10165496697595377&set=p.10165496697595377&type=3

Y Chromosome Region Overview (6 Jan 2021)

Many do not realize as well that the yDNA chromosome has many regions which are not useful for testing. A rough overview is given here showing how nearly 50% is not usable for extracting stable SNP and STR values. More detail is in David Vances documents and charts referenced at the end. Current NGS testing is expanding what are useable regions but many are still not properly modeled in the human genome reference model and so not mapped from NGS tests.

All tests are still a bit of a crap shoot at best. It comes down to determining if your tested derived values have been seen before and match others. Of the over 26 million tested base-pair values from the yDNA chromosome, there are often only around 5,000 that actually represent derived values in any given tester. So it depends on where your changed values lie and whether they are more unique and part of these existing tests. For our B10DNA group, y12 is enough to uniquely identify you are related in the last 500 years. And most microarray tests reach down to R1b-L20 which is fairly deep in the tree. For others, it is a different story. There are a larger than expected number of cases where y111 results are yielding tens to a hundred or so "close" matches at that level where all or most are not provable genealogical relatives. There are even some with 30 BigY-700 matches and unknown connections. Still worse, some who have no yDNA STR nor SNP matches and therefore learn nothing except they are unique in the world among those tested to date. The phylogenetic tree, just like autosomal match lists, is based on testing and comparing individuals. If you are not similar enough to somebody, you will have no matches nor real information gained. For some, their only close members in the tree are ancient tested remains or research paper subjects.

For completeness, the microarray tests are all near the same amount of markers but with varying overlap of what markers are covered between them. So the only real comparison is a microarray test with approximately 600 thousand values versus a WGS with nearer 3,2 billion. Both delivering dual values for the autosomes. And with roughly 1-10% of those values representing real SNP and InDel markers that are differing from the reference genome. (Often closer to 1% for WGS and closer to 10% for microarray tests.)

Additional Reading

Vance, David Intro to the Y chromosome and where SNPs are found white paper (archive). Also see David Vance's detailed (spreadsheet-formatted) chart of actual regions in the Y DNA (PDF, with base-pair region definitions) (from Facebook Post on 6 Jan 2021) See also his two year earlier document Intro to Y-SNPs and Y-SNP Trees from a Facebook post Dec 2018.
A simpler diagram posted by David Howden on 6 Jan in the FTDNA BigY * NGS group is shown above
Atlantic Monthly on sequencing the Y Chromosome centromere.
Measuring with different services section on our own B10DNA page
ISOGG comparison chart of microarray test marker overlap (not a direct focus on yDNA but gives a hint overall to different coverages).

Backlinks

Structures

Additional Reading