Evaluating variations of genotype calling: a potential source of spurious associations in genome-wide association studies

Huixiao Hong; Zhenqiang Su; Weigong Ge; Leming Shi; Roger Perkins; Hong Fang; Mendrick, Donna; Weida Tong
April 2010
Journal of Genetics;Apr2010, Vol. 89 Issue 1, p55
Academic Journal
Genome-wide association studies (GWAS) examine the entire human genome with the goal of identifying genetic variants (usually single nucleotide polymorphisms (SNPs)) that are associated with phenotypic traits such as disease status and drug response. The discordance of significantly associated SNPs for the same disease identified from different GWAS indicates that false associations exist in such results. In addition to the possible sources of spurious associations that have been investigated and discussed intensively, such as sample size and population stratification, an accurate and reproducible genotype calling algorithm is required for concordant GWAS results from different studies. However, variations of genotype calling of an algorithm and their effects on significantly associated SNPs identified in downstream association analyses have not been systematically investigated. In this paper, the variations of genotype calling using the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) algorithm and the resulting influence on the lists of significantly associated SNPs were evaluated using the raw data of 270 HapMap samples analysed with the Affymetrix Human Mapping 500K Array Set (Affy500K) by changing algorithmic parameters. Modified were the Dynamic Model (DM) call confidence threshold (threshold) and the number of randomly selected SNPs (size). Comparative analysis of the calling results and the corresponding lists of significantly associated SNPs identified through association analysis revealed that algorithmic parameters used in BRLMM affected the genotype calls and the significantly associated SNPs. Both the threshold and the size affected the called genotypes and the lists of significantly associated SNPs in association analysis. The effect of the threshold was much larger than the effect of the size. Moreover, the heterozygous calls had lower consistency compared to the homozygous calls.


Related Articles

  • Mannose binding lectin genes (MBL2) polymorpshisms and the periodontal disease in diabetic patients. Araujo, Natalia Costa; Bello, Darcyla Maria de Aguiar; Crovella, Sergio; de Souza, Paulo Roberto Eleuterio; Donos, Nikos; Cimoes, Renata // Revista Odonto Ciencia;2011, Vol. 26 Issue 3, p203 

    Purpose: To assess the association between the polymorphism in exon-1 of the MBL2 gene and the periodontal disease in type 2 diabetic patients. Methods: The sample comprised of 100 patients, who were submitted to a clinical periodontal examination that evaluated in six sites per tooth the...

  • A Human Type 1 Diabetes Susceptibility Locus Maps to Chromosome 21q22.3. Concannon, Patrick; Onengut-Gumuscu, Suna; Todd, John A.; Smyth, Deborah J.; Pociot, Flemming; Bergholdt, Regine; Akolkar, Beena; Erlich, Henry A.; Hilner, Joan E.; Julier, Cécile; Morahan, Grant; Nerup, Jøn; Nierras, Concepcion R.; Chen, Wei-Min; Rich, Stephen S. // Diabetes;Oct2008, Vol. 57 Issue 10, p2858 

    OBJECTIVE--The Type 1 Diabetes Genetics Consortium (T1DGC) has assembled and genotyped a large collection of multiplex families for the purpose of mapping genomic regions linked to type 1 diabetes. In the current study, we tested for evidence of loci associated with type 1 diabetes utilizing...

  • An Arabidopsis Example of Association Mapping in Structured Samples. Keyan Zhao; Aranzana, María José; Sung Kim; Lister, Clare; Shindo, Chikako; Chunlao Tang; Toomajian, Christopher; Honggang Zheng; Dean, Caroline; Marjoram, Paul; Nordborg, Magnus // PLoS Genetics;Jan2007, Vol. 3 Issue 1, p71 

    A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe...

  • Fifty-year variations in water ionic composition in small tributaries of the Southern Baikal. Pavlov, V.; Sorokovikova, L.; Tomberg, I.; Khvostov, I. // Water Resources;Sep2014, Vol. 41 Issue 5, p553 

    Concentrations of major ions in surface waters of the rivers of Khara-Murin and Snezhnaya are compared based on data of many-year observations carried out in the 1950s and 2000s. The concentrations of HCO, Cl, Ca, Mg, Na + K are shown to be stable. A considerable increase in SO concentration was...

  • Background population: how does it affect LR-based forensic voice comparison? Yuko Kinoshita; Shunichi Ishihara // International Journal of Speech, Language & the Law;2014, Vol. 21 Issue 2, p191 

    This article investigates to what extent and in what ways the size of the background population affects the outcome of likelihood ratio (LR) based forensic voice comparison. While sample size is known to affect the accuracy of statistical modelling, specific effects in the context of forensic...

  • A mixture model approach to sample size estimation in two-sample comparative microarray experiments. Jørstad, Tommy S.; Midelfart, Herman; Bones, Atle M. // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: Choosing the appropriate sample size is an important step in the design of a microarray experiment, and recently methods have been proposed that estimate sample sizes for control of the False Discovery Rate (FDR). Many of these methods require knowledge of the distribution of effect...

  • Apparent Survival Rates of Forest Birds in Eastern Ecuador Revisited: Improvement in Precision but No Change in Estimates. Blake, John G.; Loiselle, Bette A. // PLoS ONE;Dec2013, Vol. 8 Issue 12, p1 

    Knowledge of survival rates of Neotropical landbirds remains limited, with estimates of apparent survival available from relatively few sites and species. Previously, capture-mark-recapture models were used to estimate apparent survival of 31 species (30 passerines, 1 Trochilidae) from eastern...

  • Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation. Flannick, Jason; Korn, Joshua M.; Fontanillas, Pierre; Grant, George B.; Banks, Eric; Depristo, Mark A.; Altshuler, David // PLoS Computational Biology;Jul2012, Vol. 8 Issue 7, p1 

    High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize...

  • Revising a Personal Genome by Comparing and Combining Data from Two Different Sequencing Platforms. Kim, Deokhoon; Kim, Woo-Yeon; Lee, Sun-Young; Lee, Sung-Yeoun; Yun, Hongseok; Shin, Soo-Yong; Lee, Jungyoun; Hong, Yoojin; Won, Youngmi; Kim, Seong-Jin; Lee, Yong Seok; Ahn, Sung-Min // PLoS ONE;Apr2013, Vol. 8 Issue 4, p1 

    For the robust practice of genomic medicine, sequencing results must be compatible, regardless of the sequencing technologies and algorithms used. Presently, genome sequencing is still an imprecise science and is complicated by differences in the chemistry, coverage, alignment, and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics