ICSI's HAP Software Used to Analyze Genetic Data
October 18, 2005
ICSI's Eran Halperin and UCSD's Eleazar Eskin have analysed the genetic data in the National Institute of Health's dbSNP database of the National Center for Biotechnology Information (NCBI), using HAP, the haplotype analysis software they developed. The total size of the dataset analyzed is more than double that of previous datasets.
HAP has been used successfully to perform genetic analysis on large datasets (see Science, Feb. 18, 2005), but modifications were needed for the algorithms to work with the dbSNP data, which includes all publicly available data on single nucleotide polymorphisms (SNPs). SNPs are locations in the human DNA sequence where variation occurs in a population. The HAP program was adapted to be able to handle the different types and sources of data in the dbSNP database, and extended to analyze mother, father and child trios, to provide further genetic information. HAP analyzed the SNPs in more than 286 million haplotypes for this study, capturing the majority of genetic variation in the dataset. This was accomplished in under 24 hours. The results of the dbSNP analysis appear in a special issue of Genome Research, published on October 26. Eskin and Halperin hope that the scientific community will use the data they have made available as tools in their genetic research. The duo is designing disease association studies that make use of their data as well as the HAP tool.