• Bioinformatics · Jan 2016

    Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data.

    • Pengyi Yang, Sean J Humphrey, David E James, Yee Hwa Yang, and Raja Jothi.
    • Systems Biology Section, Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, RTP, NC 27709, USA.
    • Bioinformatics. 2016 Jan 15; 32 (2): 252-9.

    MotivationProtein phosphorylation is a post-translational modification that underlines various aspects of cellular signaling. A key step to reconstructing signaling networks involves identification of the set of all kinases and their substrates. Experimental characterization of kinase substrates is both expensive and time-consuming. To expedite the discovery of novel substrates, computational approaches based on kinase recognition sequence (motifs) from known substrates, protein structure, interaction and co-localization have been proposed. However, rarely do these methods take into account the dynamic responses of signaling cascades measured from in vivo cellular systems. Given that recent advances in mass spectrometry-based technologies make it possible to quantify phosphorylation on a proteome-wide scale, computational approaches that can integrate static features with dynamic phosphoproteome data would greatly facilitate the prediction of biologically relevant kinase-specific substrates.ResultsHere, we propose a positive-unlabeled ensemble learning approach that integrates dynamic phosphoproteomics data with static kinase recognition motifs to predict novel substrates for kinases of interest. We extended a positive-unlabeled learning technique for an ensemble model, which significantly improves prediction sensitivity on novel substrates of kinases while retaining high specificity. We evaluated the performance of the proposed model using simulation studies and subsequently applied it to predict novel substrates of key kinases relevant to insulin signaling. Our analyses show that static sequence motifs and dynamic phosphoproteomics data are complementary and that the proposed integrated model performs better than methods relying only on static information for accurate prediction of kinase-specific substrates.Availability And ImplementationExecutable GUI tool, source code and documentation are freely available at https://github.com/PengyiYang/KSP-PUEL.Contactpengyi.yang@nih.gov or jothi@mail.nih.govSupplementary InformationSupplementary data are available at Bioinformatics online.Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

      Pubmed     Full text   Copy Citation     Plaintext  

      Add institutional full text...

    Notes

     
    Knowledge, pearl, summary or comment to share?
    300 characters remaining
    help        
    You can also include formatting, links, images and footnotes in your notes
    • Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.
    • Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.
    • Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.
    • Links can be included with: [my link to pubmed](http://pubmed.com)
    • Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")
    • For footnotes use [^1](This is a footnote.) inline.
    • Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..

    hide…