The Y Chromosome Consortium (aka YCC) was a loose-nit organization founded in 1991 by University researchers (led by University of Arizona) that continued up until the mid 2000's with their web presence; finally disappearing completely by 2012. The ISOGG and its tree effort formed as the YCC decision to forgo updating their website and phylogenetic tree of haplogroups waned1
. ISOGG continues the effort on tree development to this day and is the only remaining one retaining the long form nomenclature that came out of it.
YCC published a series of papers that culminated in a key, seminal paper of 2002 (see below); trying to define a common nomenclature for a yDNA phylogenetic tree. A capture of its Figure 1 is shown here and taken from that 2002 paper. For more information on the trees in general, see our entry on phylogenetic trees in this glossary. The paper was also the apparent start of the YCC web presence hosted at the University of Arizona.
The trees up to this point were small enough to fit in a single page / diagram and branches identified by different means. As shown here, alphabetic letters for main top level clades were used. SNPs were used to identify major branch haplogroups lower. And a path notation below the top level alphabetic letters used to identify individual leaf's in the tree. These two key nomenclatures survive to this day. Still identified as the lineage and variant formats. While attributed to YCC here, they really were developed as a culmination and simplification from many academic papers the previous 10+ years.
YCC Long or "lineage" form is based on the alternating letters and numbers to identify the path down the tree to a specific haplogroup. An example "lineage" designation of Haplogroup R1b-P312 is R1b1a1a2a1a2. They start their path from the ancient, original tree of mainly single letters. Not from the actual root of the tree.
YCC Short or "variant" form is based on naming a haplogroup by using an SNP (variant) that defines it. Technically Haplogroup R1b-P312 should simply be known as P312 in this form. With this short form, you just know the haplogroup name but not its lineage. With the long form, you know the path through branching but not the SNPs important and defining that branching.
An example of the different forms is taken from our Haplogroup R1b-P312 page that defines this haplogroup path from R down to P312:
The variant form is normally only shown with a single SNP. One that is unique to a haplogroup in the phylogenetic tree. We have shown above the SNP name with the corresponding lineage (path) letter or number just above.
To help further illustrate, here is the original figure 2 from the YCC 2002 paper (top portion only). The concept of paragroups is more pronounced and included with every haplogroup. See the SNP naming page for more information on paragroups. Key to understand from this and our previous, more current, example is that the single top-level letter named haplogroups also have a mutation or variant associated with them that technically names them. So by this figure shown here, G and M201 are synonymous for the same haplogroup.
The variant form is unique as the SNPs are unique in each haplogroup and place in the whole tree. The variant or short form does not indicate the lineage; which is helpful. It is not susceptible to change when the order of haplogroups is changed as the tree changes. In fact, except for the rare case of an SNP being removed from the tree, the SNP-named haplogroup is very stable. Just its placement in the tree may be under constant revision.
Most haplogroups have more than one variant associated with them. And often variants have aliases as they were simultaneously founded and named by different researchers. Academic papers that named them did take years to publish after all. (This has led to a crazy name grab of late by two main tree developers; which has caused hundreds of thousands of variants to be named which might not never meet the criteria for inclusion in the tree. ) Some historic documents refer to the "variant" form as a "mutation" form. The term "mutation" is no longer used. See the SNP page for naming conventions of variants.
Early phylogenetic trees (including the mitochondrial DNA tree) used letters to identify key, early Haplogroups. As the tree grew, they ran out of letters and started adding numbers below the first letters. The YCC Long nomenclature simply grew out of this action as the tree branching continued and had to be identified and discussed in papers and such. In the long form, they simply alternate labeling branches with in-order letters and numbers after the top level of 24 alphabetic letters. The long format is a path designation from the top level simple, single-letter early tree down to the haplogroup being designated. But this long form naming is unwieldly as you get deep in the tree. And also when the tree is modified and whole sections get moved around; keeping track of where you are now or were before is an issue. So the variant (or short) form is now preferred for almost all uses.
Each haplogroup usually has many SNPs associated with it. At least one, often a dozen, and sometimes hundreds of SNPs. Haplogroups are most commonly known by or named by only one of the SNPs associated with it. Often the first SNP found to define it. Some cannot agree on the SNP to identify or name a haplogroup and so you sometimes see them list more than one. Sometimes the additional name is an alias for the same SNP used to name it (as seen with L278/P25 above). Other times they are different SNPs that are in the same haplogroup but well-known for various reasons (as seen with L151-L52-L11- ...). (This latter haplogroup has since been broken up and some of these SNPs are in sub-clades. Some define the haplogroup by the first SNP name listed; using an alphabetic order of the list instead of date first defined. Be prepared to investigate if two different named haplogroups from two different trees are really the same haplogroup or not.
By strict definition, the use of a letter then dash is only done in the variant (or short) form. And used to prepend an SNP haplogroup name by the major, "single letter" haplogroup that SNP is downstream from. So really a quasi mix of the short and long form when used this way. And thus retaining the stability of the single letter (or first few branches in the path) of the long form top-level ancient branches. But then simply ending the name with the SNP named haplogroup of a sub-clade that may be dozens of branches below it. In the above example, R1b- is used as the major name before the SNP-named Haplogroup R1b-P312. More formally, it should simply be just R (a single letter). But most utilize R1b as a top level name due to R1b's "weight" in the phylogenetic tree. There are more in the current tested population below the R1b branch than above. You may see I1, I2, and R1a pre-pended for similar reasons. Different sites and at different times, you will see these variations.
This has become even more confusing of late as the tree surrounding K has been going through a restructuring with more branching. So some haplogroups have even two longnames attached to them as the historical, single-letter branch points get pushed around and even split. As long as a haplogroup has more than one SNP attached to it, there is a possibility it will get split as more testers come about with some not having all the SNPs in the haplogroup.
YCC published a series of papers that culminated in a key, seminal paper of 2002 (see below); trying to define a common nomenclature for a yDNA phylogenetic tree. A capture of its Figure 1 is shown here and taken from that 2002 paper. For more information on the trees in general, see our entry on phylogenetic trees in this glossary. The paper was also the apparent start of the YCC web presence hosted at the University of Arizona.
The trees up to this point were small enough to fit in a single page / diagram and branches identified by different means. As shown here, alphabetic letters for main top level clades were used. SNPs were used to identify major branch haplogroups lower. And a path notation below the top level alphabetic letters used to identify individual leaf's in the tree. These two key nomenclatures survive to this day. Still identified as the lineage and variant formats. While attributed to YCC here, they really were developed as a culmination and simplification from many academic papers the previous 10+ years.
Haplogroup Naming
So the main legacy from the YCC is the haplogroup naming and tree identification process. The two forms of names are now covered in more detail.YCC Long or "lineage" form is based on the alternating letters and numbers to identify the path down the tree to a specific haplogroup. An example "lineage" designation of Haplogroup R1b-P312 is R1b1a1a2a1a2. They start their path from the ancient, original tree of mainly single letters. Not from the actual root of the tree.
YCC Short or "variant" form is based on naming a haplogroup by using an SNP (variant) that defines it. Technically Haplogroup R1b-P312 should simply be known as P312 in this form. With this short form, you just know the haplogroup name but not its lineage. With the long form, you know the path through branching but not the SNPs important and defining that branching.
An example of the different forms is taken from our Haplogroup R1b-P312 page that defines this haplogroup path from R down to P312:
R 1 b 1 a 1 a 2 a 1 a 2 (YCC lineage)
M207 > M173 > M343 > L278/P25 > L754 > L388 > P297 > M269 > L23 > L51 > L151-L52-L11-P310-P311 > P312 (YCC variant)
The variant form is normally only shown with a single SNP. One that is unique to a haplogroup in the phylogenetic tree. We have shown above the SNP name with the corresponding lineage (path) letter or number just above.
To help further illustrate, here is the original figure 2 from the YCC 2002 paper (top portion only). The concept of paragroups is more pronounced and included with every haplogroup. See the SNP naming page for more information on paragroups. Key to understand from this and our previous, more current, example is that the single top-level letter named haplogroups also have a mutation or variant associated with them that technically names them. So by this figure shown here, G and M201 are synonymous for the same haplogroup.
The variant form is unique as the SNPs are unique in each haplogroup and place in the whole tree. The variant or short form does not indicate the lineage; which is helpful. It is not susceptible to change when the order of haplogroups is changed as the tree changes. In fact, except for the rare case of an SNP being removed from the tree, the SNP-named haplogroup is very stable. Just its placement in the tree may be under constant revision.
Most haplogroups have more than one variant associated with them. And often variants have aliases as they were simultaneously founded and named by different researchers. Academic papers that named them did take years to publish after all. (This has led to a crazy name grab of late by two main tree developers; which has caused hundreds of thousands of variants to be named which might not never meet the criteria for inclusion in the tree. ) Some historic documents refer to the "variant" form as a "mutation" form. The term "mutation" is no longer used. See the SNP page for naming conventions of variants.
Early phylogenetic trees (including the mitochondrial DNA tree) used letters to identify key, early Haplogroups. As the tree grew, they ran out of letters and started adding numbers below the first letters. The YCC Long nomenclature simply grew out of this action as the tree branching continued and had to be identified and discussed in papers and such. In the long form, they simply alternate labeling branches with in-order letters and numbers after the top level of 24 alphabetic letters. The long format is a path designation from the top level simple, single-letter early tree down to the haplogroup being designated. But this long form naming is unwieldly as you get deep in the tree. And also when the tree is modified and whole sections get moved around; keeping track of where you are now or were before is an issue. So the variant (or short) form is now preferred for almost all uses.
Each haplogroup usually has many SNPs associated with it. At least one, often a dozen, and sometimes hundreds of SNPs. Haplogroups are most commonly known by or named by only one of the SNPs associated with it. Often the first SNP found to define it. Some cannot agree on the SNP to identify or name a haplogroup and so you sometimes see them list more than one. Sometimes the additional name is an alias for the same SNP used to name it (as seen with L278/P25 above). Other times they are different SNPs that are in the same haplogroup but well-known for various reasons (as seen with L151-L52-L11- ...). (This latter haplogroup has since been broken up and some of these SNPs are in sub-clades. Some define the haplogroup by the first SNP name listed; using an alphabetic order of the list instead of date first defined. Be prepared to investigate if two different named haplogroups from two different trees are really the same haplogroup or not.
By strict definition, the use of a letter then dash is only done in the variant (or short) form. And used to prepend an SNP haplogroup name by the major, "single letter" haplogroup that SNP is downstream from. So really a quasi mix of the short and long form when used this way. And thus retaining the stability of the single letter (or first few branches in the path) of the long form top-level ancient branches. But then simply ending the name with the SNP named haplogroup of a sub-clade that may be dozens of branches below it. In the above example, R1b- is used as the major name before the SNP-named Haplogroup R1b-P312. More formally, it should simply be just R (a single letter). But most utilize R1b as a top level name due to R1b's "weight" in the phylogenetic tree. There are more in the current tested population below the R1b branch than above. You may see I1, I2, and R1a pre-pended for similar reasons. Different sites and at different times, you will see these variations.
This has become even more confusing of late as the tree surrounding K has been going through a restructuring with more branching. So some haplogroups have even two longnames attached to them as the historical, single-letter branch points get pushed around and even split. As long as a haplogroup has more than one SNP attached to it, there is a possibility it will get split as more testers come about with some not having all the SNPs in the haplogroup.
Historic Trees versus current ones
Historically, as shown above and in textbooks describing phylogenetic trees, the nodes of the tree are unimportant except to indicate the MRCA and show the branching off the named branch. Each node has a single entry branch and two or more exit branches. The haplogroups are designated on the single branch above a node. The tree starts with a branch and ends with a branch. Today, most trees designate the node as both the MRCA and the haplogroup that defines the branch leading into it. And represent the nodes as blocks with the variant form name and possibly all the SNPs that are part of the haplogroup. In this way now, there are just singular edges (or branches) between blocks or haplogroups. The edges / branches loose meaning except to show the connection of haplogroups. Haplogroups tend to be the root / start of the tree and the leaf of the tree; not branches (or edges). Thus the importance shifts from named branches to named nodes that are now blocks.External Links
- YCC, A Nomenclature System for the Tree of Human Y-Chromosomal Binary Haplogroups, Genome Res. 2002 Feb; 12(2): pp339–348 (NIH, Genome Research) (figure 1 above found at the InternetArchive)
- ISOGG YCC
- Wikipedia YCC
- CeCe Moore's 2012 post calling for the change-over with some nice history and timeline on the tree evolution
References
1
Note on ISOGG formation in their own Wiki Timeline