Loading...
 

STR Extraction from BAM files

yFull extracts over 400 STR values from the HG19 BigY BAM file. But there are many no calls and some differences in the value obtained from FTDNA's Sanger Sequencing technique for STRs. We aim to look at this further here to understand which STR values are no call, and eventually understand why, and to understand which tend to give differences from FTDNA's first 111 STR markers. An ultimate goal is to provide further out matching than the 111 currently supported and maybe even identify better, additional STRs that would be good candidates to be tested for. Especially in the Ornery STR Matching identified haplogroups.

This study / question was initiated with a post to the U152 Yahoo Group in 13 Jan 2016. The study has not been formally developed into a form ready to gather large amounts of data. We got maybe 10 submissions of data back in 2016 but never went further. This mostly because BigY started including the y111 STRs in their order (instead of separately having to upgrade. And they have an undocumented feature to order the upgrade to y111 for old BigY tests for only $29.

Alex Williamson's yTree is now starting to collect STR data sets in addition to VCF files. yFull has now started collecting STR files as well.

FTDNA is reportedly now, as part of their conversion from HG19 to HG38, considering doing the STR extraction similar to how yFull has. If so, this would open up widely the potential DB available and lead to solving the OSM problem by identifying new STRs that are more unique for those haplogroups.

FTDNA does now do extensive STR extraction from BigY results. But they and ((yFull) have labeled the STRs with private labels. And neither has defined them anywhere. There is no apparent overlap as the value sets are so different.

External References