Improving Indel Detection Specificity of the Ion Torrent PGM Benchtop Sequencer

Zhen Xuan Yeo; Maurice Chan; Yoon Sim Yap; Peter Ang; Steve Rozen; Ann Siew Gek Lee; Aerts, Jan
September 2012
PLoS ONE;Sep2012, Vol. 7 Issue 9, Special section p1
Academic Journal
The emergence of benchtop sequencers has made clinical genetic testing using next-generation sequencing more feasible. Ion Torrent's PGMâ„¢ is one such benchtop sequencer that shows clinical promise in detecting single nucleotide variations (SNVs) and microindel variations (indels). However, the large number of false positive indels caused by the high frequency of homopolymer sequencing errors has impeded PGMâ„¢'s usage for clinical genetic testing. An extensive analysis of PGMâ„¢ data from the sequencing reads of the well-characterized genome of the Escherichia coli DH10B strain and sequences of the BRCA1 and BRCA2 genes from six germline samples was done. Three commonly used variant detection tools, SAMtools, Dindel, and GATK's Unified Genotyper, all had substantial false positive rates for indels. By incorporating filters on two major measures we could dramatically improve false positive rates without sacrificing sensitivity. The two measures were: B-Allele Frequency (BAF) and VARiation of the Width of gaps and inserts (VARW) per indel position. A BAF threshold applied to indels detected by UnifiedGenotyper removed ~99% of the indel errors detected in both the DH10B and BRCA sequences. The optimum BAF threshold for BRCA sequences was determined by requiring 100% detection sensitivity and minimum false discovery rate, using variants detected from Sanger sequencing as reference. This resulted in 15 indel errors remaining, of which 7 indel errors were removed by selecting a VARW threshold of zero. VARW specific errors increased in frequency with higher read depth in the BRCA datasets, suggesting that homopolymer-associated indel errors cannot be reduced by increasing the depth of coverage. Thus, using a VARW threshold is likely to be important in reducing indel errors from data with higher coverage. In conclusion, BAF and VARW thresholds provide simple and effective filtering criteria that can improve the specificity of indel detection in PGMâ„¢ data without compromising sensitivity.


Related Articles

  • A complete bacterial genome assembled de novo using only nanopore sequencing data. Loman, Nicholas J; Quick, Joshua; Simpson, Jared T // Nature Methods;Aug2015, Vol. 12 Issue 8, p733 

    We have assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data. Our method has three stages: (i) overlaps are detected between reads and then corrected by a multiple-alignment process; (ii) corrected reads are assembled using the Celera...

  • Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages. Cowley, Lauren A.; Beckett, Stephen J.; Chase-Topping, Margo; Perry, Neil; Dallman, Tim J.; Gally, David L.; Jenkins, Claire // BMC Genomics;2015, Vol. 16 Issue 1, p1 

    Background: Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are...

  • In Vivo Facilitated Diffusion Model. Bauer, Maximilian; Metzler, Ralf // PLoS ONE;Jan2013, Vol. 8 Issue 1, Special section p1 

    Under dilute in vitro conditions transcription factors rapidly locate their target sequence on DNA by using the facilitated diffusion mechanism. However, whether this strategy of alternating between three-dimensional bulk diffusion and one-dimensional sliding along the DNA contour is still...

  • Origins of the E. coli Strain Causing an Outbreak of Hemolytic�Uremic Syndrome in Germany. Rasko, David A.; Webster, Dale R.; Sahl, Jason W.; Bashir, Ali; Boisen, Nadia; Scheutz, Flemming; Paxinos, Ellen E.; Sebra, Robert; Chin, Chen-Shan; Iliopoulos, Dimitris; Klammer, Aaron; Peluso, Paul; Lee, Lawrence; Kislyuk, Andrey O.; Bullard, James; Kasarskis, Andrew; Wang, Susanna; Eid, John; Rank, David; Redman, Julia C. // New England Journal of Medicine;8/25/2011, Vol. 365 Issue 8, p709 

    Background: A large outbreak of diarrhea and the hemolytic�uremic syndrome caused by an unusual serotype of Shiga-toxin�producing Escherichia coli (O104:H4) began in Germany in May 2011. As of July 22, a large number of cases of diarrhea caused by Shiga-toxin�producing E. coli...

  • Alignment-free detection of local similarity among viral and bacterial genomes. Domazet-Lošo, Mirjana; Haubold, Bernhard // Bioinformatics;Jun2011, Vol. 27 Issue 11, p1466 

    Motivation: Bacterial and viral genomes are often affected by horizontal gene transfer observable as abrupt switching in local homology. In addition to the resulting mosaic genome structure, they frequently contain regions not found in close relatives, which may play a role in virulence...

  • Performance comparison of benchtop high-throughput sequencing platforms. Loman, Nicholas J; Misra, Raju V; Dallman, Timothy J; Constantinidou, Chrystala; Gharbia, Saheer E; Wain, John; Pallen, Mark J // Nature Biotechnology;May2012, Vol. 30 Issue 5, p434 

    Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest set-up and running costs. Each instrument can generate data required for a draft bacterial genome...

  • Complete genome sequences of T5-related Escherichia coli bacteriophages DT57C and DT571/2 isolated from horse feces. Golomidova, Alla; Kulikov, Eugene; Prokhorov, Nikolai; Guerrero-Ferreira, Ricardo; Ksenzenko, Vladimir; Tarasyan, Karina; Letarov, Andrey // Archives of Virology;Dec2015, Vol. 160 Issue 12, p3133 

    We report the complete genome sequencing of two Escherichia coli T5-related bacteriophages, DT57C and DT571/2, isolated from the same specimen of horse feces. These two isolates share 96 % nucleotide sequence identity and can thus be considered representatives of the same novel species within...

  • Rapid quantification of sequence repeats to resolve the size, structure and contents of bacterial genomes. Willams, David; Trimble, William L; Shilts, Meghan; Meyer, Folker; Ochman, Howard // BMC Genomics;2013, Vol. 14 Issue 1, p1 

    Background: The numerous classes of repeats often impede the assembly of genome sequences from the short reads provided by new sequencing technologies. We demonstrate a simple and rapid means to ascertain the repeat structure and total size of a bacterial or archaeal genome without the need for...

  • Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors. Abbas, Mostafa M.; Mohie-Eldin, Mostafa M.; EL-Manzalawy, Yasser // PLoS ONE;Mar2015, Vol. 10 Issue 3, p1 

    As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics