Loading...
 

1KG: 1000 Genomes

As the decade to map the first human genome wound down (2001) (at a considerable expense of several billion dollars), the HapMap and then 1000 Genomes Project formed to capture and catalog 1,000 different human genome subjects across the world. This to get a richer, wider variety of genome sequences to better develop and understand the human genome than the previous, few sample first model. This work was the key and seminal next step in evolution of genomic understanding and took place mainly in the decade of the 2000's.

During this project, the sequencing cost was into the tens to hundred thousand dollars per subject. This work finished in the early 2010's but lingers on with improvements. As it was finishing, new, cheaper sequencing technology became available. So results were reprocessed to include better, refined sequencing results.

A decade later there are now 100K and 1 million subject studies started in various nations and regions. This due to the cost dropping below a thousand dollars for a WGS test. In fact, as of 2018, the cost is below five hundred dollars regularly and even drops to two hundred dollars (US) during flash sales. As of 2020, a price for a 30x WGS seems to be regularly at or below that consumer threshold of $500. Which has now brought the cost of WGS testing into the realm of consumer / personal genetic testing for genealogical purposes. This will explode the number of samples available for comparison and analysis. Dante Labs, who led this price drop, claims in 2021 to have over 35,000 WGS samples available for study.

Some yDNA phylogenetic tree groups have seeded their trees by populating them with these early, initial 1,000 samples. And other ancient remains DNA samples (often termed aDNA but not to be confused with a similar term atDNA used for autosomal). If a tester has a result matching near one of these samples, the experimental tree may be more in-depth with more branching as a result. B10DNA is one such example of that in this study where a Colombian man from the 1000 Genomes Project matches to the Devon Hore line that has been tested; the Colombian man with a TMRCA of under 1,000 years. The data files resulting from this WGS testing are available online without cost.

Out of this 1KGenome project has come many of the standard tools used in the bioinformatics industry. Namely the HTSlib collection that was formerly known as samtools, vcftools (now bcftools), faidx, bgzip (i.e. the gzip extended format known as BGZF) and others. Along with the respective file formats either created or extended by the project and tools: FASTQ, BAM and VCF. Key also is the BWA alignment tool, which delivered with it the best reference models for analysis using those tools: hs37d5 and hs38. They are still relied on by most today. The decoys and alternate contiguous regions help the tools extract the most information from the sequence results.

External References