An automated proteomic data analysis workflow for mass

Bmc Bioinformatics · Oct 2009

An automated proteomic data analysis workflow for mass spectrometry.

Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS(2)) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics. ⋯ For biologists carrying out proteomics by mass spectrometry, our workflow facilitates automated, easy to use analyses of Bioworks (3.2 or later versions) data. All the methods used in the workflow are peer-reviewed and as such the results of our workflow are compliant with proteomic data submission guidelines to public proteomic data repositories including PRIDE. Our workflow is a necessary intermediate step that is required to link proteomics data to biological knowledge for generating testable hypotheses.

read more… or not…
- Ken Pendarvis, Ranjit Kumar, Shane C Burgess, and Bindu Nanduri.
- Institute for Digital Biology, Mississippi State University, Mississippi State, MS 39762, USA.
- Bmc Bioinformatics. 2009 Oct 8; 10 Suppl 11: S17.
BackgroundMass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS(2)) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics.ResultsThe input for our workflow is Bioworks 3.2 Sequest (or a later version, including cluster) output in XML format. We use a decoy database approach to assign probability to peptide identifications. The user has the option to select "quality thresholds" on peptide identifications based on the P value. We also estimate probability for protein identification. Proteins identified with peptides at a user-specified threshold value from biological experiments are grouped as either control or treatment for further analysis in ProtQuant. ProtQuant utilizes a parametric (ANOVA) method, for calculating differences in protein expression based on the quantitative measure SigmaXcorr. Alternatively ProtQuant output can be further processed using non-parametric Monte-Carlo resampling statistics to calculate P values for differential expression. Correction for multiple testing of ANOVA and resampling P values is done using Benjamini and Hochberg's method. The results of these statistical analyses are then combined into a single output file containing a comprehensive protein list with probabilities and differential expression analysis, associated P values, and resampling statistics.ConclusionFor biologists carrying out proteomics by mass spectrometry, our workflow facilitates automated, easy to use analyses of Bioworks (3.2 or later versions) data. All the methods used in the workflow are peer-reviewed and as such the results of our workflow are compliant with proteomic data submission guidelines to public proteomic data repositories including PRIDE. Our workflow is a necessary intermediate step that is required to link proteomics data to biological knowledge for generating testable hypotheses.

Pubmed Free full text Copy Citation Plaintext

Add institutional full text...
Notes
Knowledge, pearl, summary or comment to share?

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.

Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.

Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.

Links can be included with: [my link to pubmed](http://pubmed.com)

Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")

For footnotes use [^1](This is a footnote.) inline.

Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..
hide…

An automated proteomic data analysis workflow for mass spectrometry.

Notes

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Want more great medical articles?

Keep up to date with a free trial of metajournal, personalized for your practice.
1,706,642 articles already indexed!

An automated proteomic data analysis workflow for mass spectrometry.

Notes

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Want more great medical articles?

Keep up to date with a free trial of metajournal, personalized for your practice.1,706,642 articles already indexed!

Keep up to date with a free trial of metajournal, personalized for your practice.
1,706,642 articles already indexed!