SIFTing the human genome for polymorphisms that affect protein function

Pauline Ng, University of Washington

Photo of Pauline Ng

Approximately half of the gene lesions currently known to be responsible for human-inherited disease are due to amino acid substitutions. Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects; some of these could potentially affect protein function. We have constructed a computational tool that uses sequence homology to predict whether a substitution affects protein function.

SIFT is a program which sorts intolerant from tolerant substitutions. SIFT may be used to identify plausible disease candidates among the SNPs that cause missense substitutions. Assuming that disease-causing amino acid substitutions are damaging to protein function, we applied SIFT to a database of missense substitutions associated with or involved in disease. SIFT predicted 69% of these to be damaging. SIFT gave predictions for over 3000 nonsynonymous SNPs (nsSNPs) from dbSNP, a database of sequence variants that may or may not be involved in disease. 75% of the variants were predicted to be tolerated. Some of the nsSNPs predicted to affect function were variants known to be associated with disease. Others were artifacts of SNP discovery.

A WWW interface and source code for SIFT is at http://sift-dna.org/.

Abstract Author(s): Pauline C. Ng and Steven Henikoff