Methods in molecular biology
-
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
-
Multiplex assays that allow the simultaneous measurement of multiple analytes in small sample quantities have developed into a widely used technology. Their implementation spans across multiple assay systems and can provide readouts of similar quality as the respective single-plex measures, albeit at far higher throughput. Multiplex assay systems are therefore an important element for biomarker discovery and development strategies but analysis of the derived data can face substantial challenges that may limit the possibility of identifying meaningful biological markers. This chapter gives an overview of opportunities and challenges of multiplexed biomarker analysis, in particular from the perspective of machine learning aimed at identification of predictive biological signatures.
-
Advancements in MS-based phospho-proteomics techniques have helped uncover hundred thousands of protein phosphorylation sites in human and various model organisms. The majority of these sites are uncharacterized. ⋯ Analyzing the phosphorylation and sequence conservation of uncharacterized sites across species can help reveal a subset of the functionally important phosphorylation events. Here, we outline the workflow and provide an overview of publicly available computational resources for conservation analysis of novel phosphorylation sites.
-
Despite recent advances in mass spectrometric sequencing speed and improved sensitivity, the in-depth analysis of proteomes still widely relies on off-line peptide separation and fractionation to deal with the enormous molecular complexity of shotgun digested proteomes. While a multitude of methods has been established for off-line peptide separation using HPLC columns, their use can be limited particularly when sample quantities are scarce. In this protocol, we describe an approach which combines high pH reversed-phase peptide separation into few fractions in StageTip micro-columns. ⋯ Here, we provide a step-by-step protocol for TMT6plex labeling of peptides, the construction of StageTips, sample fractionation and pooling schemes adjusted to different types of analytes, mass spectrometric sample measurement, and downstream data processing using MaxQuant. To illustrate the expected results using this protocol, we provide results from an unlabeled and a TMT6plex labeled phosphopeptide sample leading to the identification of >17,000 phosphopeptides in 8 h (Q Exactive HF) and >23,000 TMT6plex labeled phosphopeptides (Q Exactive Plus) in 12 h of measurement time. Importantly, this protocol is equally applicable to the fractionation of full proteome digests.
-
Post-translational modifications (PTMs) are covalent modifications that proteins might undergo following or sometimes during the process of translation. Together with gene diversity, PTMs contribute to the overall variety of possible protein function for a given organism. Single-nucleotide polymorphisms (SNPs) are the most common form of variations found in the human genome, and have been found to be associated with diseases like Alzheimer's disease (AD) and Parkinson's disease (PD), among many others. ⋯ However, these data are unsystematically distributed across a number of diverse databases. Thus, there is a need for efforts toward data standardization and validation of bioinformatics algorithms that can fully leverage SNP and PTM information for biomedical research. In this book chapter, we will present some of the commonly used databases for both SNVs and PTMs and describe a broad approach that can be applied to many scenarios for studying the impact of nsSNVs on PTM sites of human proteins.