1571
Comment:
|
← Revision 26 as of 2015-12-07 06:13:08 ⇥
3016
|
Deletions are marked like this. | Additions are marked like this. |
Line 9: | Line 9: |
Many intracellular signaling processes are mediated by interactions involving peptide recognition modules such as SH3 domains. These domains bind to small, contiguous sequence motifs which can be identified using high-throughput experimental screens such as phage display and then used to computationally predict protein interactions mediated by these domains. Most protein-protein interaction prediction approaches either lack the ability to predict peptide recognition module mediated interactions or they do not consider different constraints governing physiologically relevant interactions between two proteins. | Many intracellular signaling processes are mediated by interactions involving peptide recognition modules such as SH3 domains. These domains bind to small, linear protein sequence motifs which can be identified using high-throughput experimental screens such as phage display. Binding motif patterns can then be used to computationally predict protein interactions mediated by these domains. While many protein-protein interaction prediction methods exist, most do not work with peptide recognition module mediated interactions or do not consider many of the known constraints governing physiologically relevant interactions between two proteins. |
Line 12: | Line 12: |
A novel method for predicting physiologically relevant SH3 domain-peptide mediated protein-protein interactions in S. cerevisae using phage display data is presented. This method is based on the fact that domain-peptide mediated interactions do not occur in isolation. They are influenced by many sequential and cellular constraints. Therefore, by combining different peptide and protein features using multiple Bayesian models we are able to predict high confidence interactions with an overall accuracy (F-score) between 0.98 and 0.96 for different thresholds. | A novel method for predicting physiologically relevant SH3 domain-peptide mediated protein-protein interactions in ''S. cerevisae'' using phage display data is presented. Like some previous similar methods, this method uses position weight matrix models of protein linear motif preference for individual SH3 domains to scan the proteome for potential hits and then filters these hits using a range of evidence sources related to sequence-based and cellular constraints on protein interactions. The novelty of this approach is the large number of evidence sources used and the method of combination of sequence based and protein pair based evidence sources. By combining different peptide and protein features using multiple Bayesian models we are able to predict high confidence interactions with an overall accuracy accuracy of 0.97. |
Line 17: | Line 18: |
Source: [[attachment:DoMo-Pred.zip]] | |
Line 18: | Line 20: |
==== Predictions ==== [[attachment:SH3_PPI_Predictions.zip]] Text file format: || Domain || Peptide || Start || Stop || Sequence || Peptide Score || Peptide Count || Protein Score || Protein Count || Score || || P11710 || P53861 || 313 || 318 || RTTSH || 0.96 || 4 || 0.01 || 5 || 0.16 || || P11710 || P34216 || 236 || 240 || RTTPL || 0.53 || 4 || 0.98 || 5 || 0.98 || || ... || ... || ... || ... || ... ||... ||... ||... ||... ||... || Note: '''Domain''' is the Uniprot id of SH3 domain containing protein. '''Peptide''' is the Uniprot id of peptide containing protein. '''Start and Stop''' are peptide start and stop positions. '''Sequence''' is the predicted peptide sequence. '''Peptide Score/Protein Score''' is the score of peptide/protein classifier. '''Peptide Count/Protein Count''' is the number of peptide/protein features used for predictions. '''Score''' is the score of combined classifier. ==== Supplementary material ==== [[attachment:PRM_PPI_supplementary.pdf]] |
|
Line 20: | Line 36: |
==== Predictions ==== [[attachment:predictions.zip]] ==== Supplementary material ==== |
Training and test datasets used in manuscript. [[attachment:Datasets.zip]] |
Predicting physiologically relevant SH3 domain mediated protein-protein interactions in yeast
Shobhit Jain and Gary Bader
Motivation
Many intracellular signaling processes are mediated by interactions involving peptide recognition modules such as SH3 domains. These domains bind to small, linear protein sequence motifs which can be identified using high-throughput experimental screens such as phage display. Binding motif patterns can then be used to computationally predict protein interactions mediated by these domains. While many protein-protein interaction prediction methods exist, most do not work with peptide recognition module mediated interactions or do not consider many of the known constraints governing physiologically relevant interactions between two proteins.
Results
A novel method for predicting physiologically relevant SH3 domain-peptide mediated protein-protein interactions in S. cerevisae using phage display data is presented. Like some previous similar methods, this method uses position weight matrix models of protein linear motif preference for individual SH3 domains to scan the proteome for potential hits and then filters these hits using a range of evidence sources related to sequence-based and cellular constraints on protein interactions. The novelty of this approach is the large number of evidence sources used and the method of combination of sequence based and protein pair based evidence sources. By combining different peptide and protein features using multiple Bayesian models we are able to predict high confidence interactions with an overall accuracy accuracy of 0.97.
Downloads
Latest Release
Source: DoMo-Pred.zip
Predictions
Text file format:
Domain |
Peptide |
Start |
Stop |
Sequence |
Peptide Score |
Peptide Count |
Protein Score |
Protein Count |
Score |
P11710 |
P53861 |
313 |
318 |
RTTSH |
0.96 |
4 |
0.01 |
5 |
0.16 |
P11710 |
P34216 |
236 |
240 |
RTTPL |
0.53 |
4 |
0.98 |
5 |
0.98 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
Note: Domain is the Uniprot id of SH3 domain containing protein. Peptide is the Uniprot id of peptide containing protein. Start and Stop are peptide start and stop positions. Sequence is the predicted peptide sequence. Peptide Score/Protein Score is the score of peptide/protein classifier. Peptide Count/Protein Count is the number of peptide/protein features used for predictions. Score is the score of combined classifier.
Supplementary material
Datasets
Training and test datasets used in manuscript. Datasets.zip