Bioinformatics Documents

One of us has been focused, to the detriment of this project and his own research work, on NGS testing and bioinformatics. Much of it centered over at the Consumer WGS Testing Facebook group. A big part of the output has been a series of documents put in the Files section. Likely should have done them as wiki pages here. But instead used Google Docs to allow commenting and editing by others. Here are some of the key docs we write and maintain. Will likely try to start cataloguing here some of the key posts not yet covered in a doc as well. Would encourage you to join the group if interested in the topic further.

At least for this author, we have purchased WGS kits since 2019. And only WGS kits. Occasionally an Ancestry or 23andMe kit to get access to their large match database as well. We are not at the bleeding edge the way the sister Kennedy Surname Project is with pushing long-read NGS testing (aka 3rd generation). But definitely trying to bring the masses with us as we transition over.

Documents on WGS testing and bioinformatics

Bioinformatics for Newbies — the beast that spawns new documents and keeps expanding in content but not scope (v2 in the works)
BAM Reference Model, Determining your — along with companion Reference Model Spreadsheet
Utilizing Reference Models: New and Old - Covering how the latest reference model technology of the past decade is still not really benefiting the medical nor genetic genealogy community. And the steps needed to make that happen.
Average Read Depth, Determining the: Quick and Dirty — with corresponding samtools idxstats spreadsheet v2 example (early v1) and
Average Read Depth: Backgrounder — the first
Average Read Depth for WES — a new, early draft approach to determining the 130x WES average read depth
Read Length and Insert Size — a very early draft document covering sequencer read length and the effective read fragment length. Most importantly, the gap in the read fragment that may not be read if the two are not setup to match.
Bioinformatics Tools on Win10: Quick and Dirty — a Win10 Container, if you will
Upgrading Samtools in Ubuntu Linux — dealing with differing release schedules and deprecated functionality
WGS Extract Beta v5 manual — Initially released in June 2021 (best to just see WGS Extract software site])
v4 (2021-2023), v3 (2020-2021), v2 (Jan-Jun 2020) and v1 (Summer 2019) manuals previously written are now archived)
See WGS Extract GitHub Home for the latest information on this tool we help develop
Analyzing Low Mapping in BAMs — a peculiar problem to Dante Labs
Sequencer Run Information — really focused on Illumina Novaseq 6000 machines and their tags
Annotating VCF Files with SNP Names — to present the methods available for this key step of adding annotations to your called variant file
Ordering the HG38 remap from ySeq — tips and techniques to ordering, supplying your data and getting the results downloaded
Developing a Nucleotide Base Quality Metric from Sequencing
InDel's in Microarray Files — early draft as we tackle making WGS Extract properly generate Microarray File Formats with InDel's
Microarray Files and the WGS Extract Tool Generator — early draft expanding and pulling information out of the tool user manual.
Data Model for Microarray Results DBs
A Practical WGS Consensus Sequence for Genetic Genealogy — a very early draft of an idea being coded into WGS Extract to maybe lead to a more standard WGS segment matching capability not based on Microarray file extraction
Why 30x? — a very early draft document to cover the Average Read Depth and what value is best
Breadth of Coverage using Bins — The next layer of quality check using Coverage with many buckets or bins to measure by
CRAM to BAM: Quick and Dirty, Converting a — for Nebula Genomics customers who receive a CRAM file
WGS Transfer Methods — describing different methods to get large WGS files off test sites
Genetic Genealogy and WGS Testing — a very early, in development draft, now to be retargeted as WGS in general, for the community here; growth out of the Bioinformatics for Newbies document that was always more focused on installing and running tools from the command line
Bioinformatics Computers — some tips and notes on selecting good computer for bioinformatics
Using WGS Tools on FTDNA BigY Files — some notes on handling FTDNA BigY files in traditional WGS tools
DNA Test Interpretation — tables of programs and sites to interpret microarray and WGS files
Installing Microsoft WSL v1 and Ubuntu 18,04
Installing Microsoft WSL v2 and Ubuntu 20,04
Installing WSL2 G and Ubuntu --you must use the latest Developer release of Win10 (or Win 11) to get WSLG. This is the best way to get a full Linux under Microsoft Windows; the most maintainable and easy to install. This can be the best way to run WGS Extract on Windows by using The Ubuntu Linux installation. This does force HyperV to be turned on which disables some other virtualizations such as VMWare (currently). Sometimes, the natively compiled Bioinformatic tools can be more efficient due to the emulated file system interface in WSL.
Installing Oracle / Sun VirtualBox — need to add one for VMWare Player as well. VirtualBox for AMD processors; VMWare for Intel (to support MacOS virtualization; does not matter for Ubuntu Linux)
Compiling the Bioinformatic Tools on Microsoft Windows. (For the latest information, see the make shell scripts with the WGS Extract Windows installation.

Backlinks

Structures

Documents on WGS testing and bioinformatics