A haplotype inference algorithm for trios based on deterministic sampling

BACKGROUND: In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase i...

ver descrição completa

Na minha lista:
Detalhes bibliográficos
Main Authors: Iliadis, Alexandros, Watkinson, John, Anastassiou, Dimitris, Wang, Xiaodong
Formato: Artigo
Idioma:Inglês
Publicado em: BioMed Central 2010
Assuntos:
Acesso em linha:https://ncbi.nlm.nih.gov/pmc/articles/PMC2939632/
https://ncbi.nlm.nih.gov/pubmed/20727218
https://ncbi.nlm.nih.govhttp://dx.doi.org/10.1186/1471-2156-11-78
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
id pubmed-2939632
record_format dspace
spelling pubmed-29396322010-09-21 A haplotype inference algorithm for trios based on deterministic sampling Iliadis, Alexandros Watkinson, John Anastassiou, Dimitris Wang, Xiaodong BMC Genet Research Article BACKGROUND: In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing) is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data. RESULTS: We introduce a technique for phasing trio datasets using a tree-based deterministic sampling scheme. We have compared our method with publicly available algorithms PHASE v2.1, BEAGLE v3.0.2 and 2SNP v1.7 on datasets of varying number of markers and trios. We have found that the computational complexity of PHASE makes it prohibitive for routine use; on the other hand 2SNP, though the fastest method for small datasets, was significantly inaccurate. We have shown that our method outperforms BEAGLE in terms of speed and accuracy for small to intermediate dataset sizes in terms of number of trios for all marker sizes examined. Our method is implemented in the "Tree-Based Deterministic Sampling" (TDS) package, available for download at http://www.ee.columbia.edu/~anastas/tds CONCLUSIONS: Using a Tree-Based Deterministic sampling technique, we present an intuitive and conceptually simple phasing algorithm for trio data. The trade off between speed and accuracy achieved by our algorithm makes it a strong candidate for routine use on trio datasets. BioMed Central 2010-08-23 /pmc/articles/PMC2939632/ /pubmed/20727218 http://dx.doi.org/10.1186/1471-2156-11-78 Text en Copyright ©2010 Iliadis et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
institution US NLM
collection PubMed Central
language Inglês
format Artigo
topic Research Article
spellingShingle Research Article
Iliadis, Alexandros
Watkinson, John
Anastassiou, Dimitris
Wang, Xiaodong
A haplotype inference algorithm for trios based on deterministic sampling
description BACKGROUND: In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing) is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data. RESULTS: We introduce a technique for phasing trio datasets using a tree-based deterministic sampling scheme. We have compared our method with publicly available algorithms PHASE v2.1, BEAGLE v3.0.2 and 2SNP v1.7 on datasets of varying number of markers and trios. We have found that the computational complexity of PHASE makes it prohibitive for routine use; on the other hand 2SNP, though the fastest method for small datasets, was significantly inaccurate. We have shown that our method outperforms BEAGLE in terms of speed and accuracy for small to intermediate dataset sizes in terms of number of trios for all marker sizes examined. Our method is implemented in the "Tree-Based Deterministic Sampling" (TDS) package, available for download at http://www.ee.columbia.edu/~anastas/tds CONCLUSIONS: Using a Tree-Based Deterministic sampling technique, we present an intuitive and conceptually simple phasing algorithm for trio data. The trade off between speed and accuracy achieved by our algorithm makes it a strong candidate for routine use on trio datasets.
author Iliadis, Alexandros
Watkinson, John
Anastassiou, Dimitris
Wang, Xiaodong
author_facet Iliadis, Alexandros
Watkinson, John
Anastassiou, Dimitris
Wang, Xiaodong
author_sort Iliadis, Alexandros
title A haplotype inference algorithm for trios based on deterministic sampling
title_short A haplotype inference algorithm for trios based on deterministic sampling
title_full A haplotype inference algorithm for trios based on deterministic sampling
title_fullStr A haplotype inference algorithm for trios based on deterministic sampling
title_full_unstemmed A haplotype inference algorithm for trios based on deterministic sampling
title_sort haplotype inference algorithm for trios based on deterministic sampling
publisher BioMed Central
publishDate 2010
url https://ncbi.nlm.nih.gov/pmc/articles/PMC2939632/
https://ncbi.nlm.nih.gov/pubmed/20727218
https://ncbi.nlm.nih.govhttp://dx.doi.org/10.1186/1471-2156-11-78
_version_ 1753971520615481344