Multi-level biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects

TitleMulti-level biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects
Publication TypeJournal Article
Year of Publication2016
AuthorsRaimondi, D, Gazzo, AM, Rooman, M, Lenaerts, T, Vranken, WF
JournalBioinformatics
Abstract

Motivation: There are now many predictors capable of identifying the likely phenotypic effects of Single Nucleotide Variants (SNVs) or short in-frame Insertions or Deletions (INDELs) on the increasing amount of genome sequence data. Most of these predictors focus on SNVs and use a combination of features related to sequence conservation, biophysical and/or structural properties to link the observed variant to either neutral or disease phenotype. Despite notable successes, the mapping between genetic variants and their phenotypic effects is riddled with levels of complexity that are not yet fully understood and that are often not taken into account in the predictions, despite their promise of significantly improving the prediction of deleterious mutants. Results: We present DEOGEN, a novel variant effect predictor that can handle both missense SNVs and in-frame INDELs. By integrating information from different biological scales and mimicking the complex mixture of effects that lead from the variant to the phenotype, we obtain significant improvements in the variant-effect prediction results. Next to the typical variant-oriented features based on the evolutionary conservation of the mutated positions, we added a collection of protein-oriented features that are based on functional aspects of the gene affected. We cross-validated DEOGEN on 36825 polymorphisms, 20821 deleterious SNVs and 1038 INDELs from SwissProt. The multi-level contextualization of each (variant, protein) pair in DEOGEN provides a 10% improvement of MCC with respect to current state-of-the-art tools. Availability: The software and the data presented here is publicly available at http://ibsquare.be/deogen.Contact: wvranken@vub.ac.beSupplementary information: Supplementary data are available at Bioinformatics online.

URLhttp://bioinformatics.oxfordjournals.org/content/early/2016/02/18/bioinformatics.btw094.abstract
DOI10.1093/bioinformatics/btw094