|
Choosing SNPs using feature selection
Tu Minh Phuong, Zhen Lin, and Russ B. Altman, 2004
Abstract
A major challenge for genomewide disease association studies is the high cost of genotyping large num-ber of single nucleotide polymorphisms (SNP). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as “tagging” SNPs, able to capture most variation in a popula-tion. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally re-dundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based approaches.
Software
The source code in C++ can be downloaded here.
| [Home][PCA - another tSNP finder] | Please send questions to Phuong, phuongtu@stanford.edu |