Single Nucleotide Polymorphism (SNP)

A Single Nucleotide Polymorphism (or SNP) is a specific base-pair locus (location) in a DNA strand that has exhibited a change in some portion of the population. Often this is a swapping of the base pair values (A to T, C to G, etc). Sometimes an insertion or removal of a base-pair is also defined as an SNP. Finally, less often, a short sequence that is inserted or deleted is defined as an SNP. Hence not always a “single nucleotide polymorphism” although still termed that. (Biology and the science around it is messy and never absolute it seems.) SNP’s are the markers that are tested for in Genetic Genealogy and also in many other fields utilizing genetic testing. Larger changes in the DNA usually result in a cell that cannot survive and so are not usually observed. The exception is with STRs in the inter-gene (junk) regions. Most genetic genealogy testing is looking at the SNPs in the DNA.

SNP markers are used to compare and contrast individuals DNA with each other (and even to compare the paired strands of chromosomes to each other within an individual). If a change from the population defined normal is detected, this is termed a positive marker result (plus, +, derived, changed, and often colored green). If the change has not occurred, it is a negative result (minus, -, ancestral, not-changed and often colored red). The value of an SNP marker in a particular individual is termed an allele. In fact, a sequence or group of base-pairs and their particular values, that may include multiple SNPs is also termed the allele. Here and elsewhere in genetic genealogy, we tend to simply say _SNP value or marker value to mean the allele__.

Because SNPs are inherently more stable and do not tend to change back and forth (as STRs may), they are used for identifying different anthropological lines of humans through tens of thousands of years. This most commonly with testing on the Y chromosome and the mitochondria that pass down relatively unchanged from generation to generation. Only recently, with NGS testing on a wider set of samples, are we seeing more variance in the nearer term that is allowing SNPs to contribute more help in the genealogical time frame.

Autosomal SNP testing is the most popular and common form of genetic genealogy testing. While many are drawn in, through advertising, to its use to provide pie charts of ethnicity mix, the real value is in determining how many SNP values you share in common in a continuous sequence termed a matching segment. The longer the matching segments, the more matching segments that exist, and the amount of total matching segments as a percentage of your total DNA, then the closer in relation (termed Consanguinity) you are to the person. X chromosome SNP testing is often included in Autosomal testing although the matching and reporting of it varies. Unique properties of X inheritance exist and can be used to glean more information than simply found in the Autosomes.

Y chromosome SNP testing is used to confirm that matching Haplotypes are in fact representing two people in the same patrilineal line (as opposed to their matching Haplotype representing a -converged set of STR value changes over time). Testers with the same Haplotype but different Haplogroups are not in the same patrilineal line.

Mitochondrial DNA testing (note: not a chromosome as it does not reside in the nucleus) has limited utility in genetic genealogy and is most used for ancient anthropological studies. But one very specific question can be answered and so some value can be extracted. Especially on those hard to follow matrilineal lines of descent.

Common nomenclature is to identify the SNP by the group or individual that first identified it and then by some sequence number. NIST has a central database and coding scheme where each SNP begins with the letters rs followed by a sequence of numbers. Yet to be named SNPs are identified by the DNA strand (chromosome 1, for example) and then a locus (count of base pairs or location on the DNA strand in a reference model), followed by the Ancestral and then Derived value. Clearly having an exact count is an issue given insertions and deletions can occur, sometimes fairly large, at different points with different people. Centromere and telemere regions can have large variance as well.

External Resources