Diff for "Data/StructurePDZProteomeScanning" - Bader Lab @ The University of Toronto

Differences between revisions 18 and 44 (spanning 26 versions)

Predicting PDZ Domain Mediated Protein Interactions from Structure

Shirley Hui, Xiang Xing, and Gary D. Bader

Website: http://webservice.baderlab.org/domains/POW/

Background

PDZ domains are structural protein domains that recognize ligands containing simple linear amino acid motifs and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity determination and neural development. PDZ domain-peptide interaction predictors have recently been developed and can predict peptide ligands using primary amino acid sequence information, however they are limited to PDZ sequences that are similar in sequence to the training domains. Since domain structure is known to influence binding specificity, we hypothesized that the use of structural information should result in the prediction of new interactions and should be less dependent on sequence similarity than the sequence-based predictors.

Results

We developed a novel support vector machine-based predictor of PDZ domain and C-terminal peptide interactions using PDZ domain structure and peptide sequence information. Different cross validation strategies and blind tests show that the predictor can correctly predict interactions in multiple organisms. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and innate immune signalling and suggest new PDZ interactions for other processes including wound healing and Wnt signalling.

SVM Predictions

SVM predictions were validated using known interactions from PDZBase, a domain peptide interaction database and known protein-protein interactions (PPIs) from iRefIndex. iRefIndex is a PPI database which consolidates PPIs from different databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT.

The following are SVM proteome scanning structure-based and sequence-based predictions for human, fly and worm PDZ domains. Only domains with predicted interactions are included. For example: The structure-based predictor returned predictions for 215 out of 218 human domains scanned.

Organism	Structure-based	Sequence-based
Human	Human 215 out of 218 (zip)	Human 224 out of 241 (zip)
Fly	Fly 7 out of 7 (zip)	Fly 6 out of 7 (zip)
Worm	Worm 6 out of 6 (zip)	Worm 6 out of 6 (zip)

The format of the output files is:

<indicator> <predicted binder sequence> <decision value> <source> <transcript ids>

<indicator> is one of the following symbols:
- * = validated by PDZBase, corresponds to an iRefIndex PPI (human only), or validated by protein microarray experiments (fly and worm only)
- X = false positive as determined by protein microarray experiments (fly and worm only)
- empty = no experiment or other evidence to validate or support this prediction
<predicted binder sequence> is the sequence of length five of the predicted binder
<decision value> is a real number computed by the SVM to evaluate if a given sequence should be predicted as a binder or not. All values will be greater than zero since the files only contain predicted binders.
<source> is non empty if only indicator is non empty and is one of the following codes:
- PB = found in PDZBase
- IR = corresponds to a PPI in iRefIndex (transcript index1, ..., transcript index n)
- PM = found in protein microarray experiment
- - = not found in any of the above sources
<protein ids>
- Ensembl protein ids corresponding to the predicted binder

Supplementary Information

Cytoscape BiNGO
- BiNGO Enrichment Files (zip)
  - BiNGO enrichment files created by the Cytoscape BiNGO Plugin from sequence-based and structure-based hits. Cytoscape v2.8.1 and the BiNGO Plugin v1.44 were used. Only terms with greater than 5 and less than 300 genes (based on the GMT file below) were used from the GO ontology v1.2 (Dec 7, 2011).
Cytoscape Enrichment Map
- Sequence vs. Structure Enrichment Map (cys)
- Structure vs. iRefIndex Enrichment Map (cys)
- Sequence vs. iRefIndex Enrichment Map (cys)
- Gene-set File (GMT)
  - Cytoscape session file for the summary enrichment map comparing sequence-based and structure-based predictions for human PDZ domains. Cytoscape v2.8.1 and the Enrichment Map Plugin v1.2 were used.
Data Files (zip)
- Domain Structures
  - PDB files for experimentally determined and homology modelled structures for human, mouse, worm and fly
- Proteomes
  - Ensembl proteome files for Human, Worm and Fly
- Experiment Interaction files (in peptide file format)
  - Fly files from Chen
  - Human files from Tonikian
  - Mouse files from Stiffler
  - Worm files from Chen
- Negative Interaction files (in raw format)
  - Human files generated by SVM
  - Mouse files generated by SVM
- Curated Interaction files (flat files)
  - PDZBase for Human (Mouse, Worm and Fly included, but not used)
  - iRefIndex interactions for Human
- Phage codon bias files
- ProteomeScan Files
  - Files required to run the proteome scanning software

Source Code

GNU LGP License (txt)
Java Source Code (zip)
Dependency Jars (zip)
- jfreechart 1.0.12 (and dependencies)
- weka 3.9.1
- auc calculator (Davis & Goadrich, 2006)
- BioJava 1.5
- iText 2.1.3
- jmatio
- Bingo 2.3
- Cytoscape 2.6.3
- Cytoscape-task 2.6.3
- BRAIN 1.0.5 (pdzsvm)
- libSVM 2.8.9 (pdzsvmstruct)

Team

Shirley Hui
Xiang Xing
Gary Bader

CategoryHomepage

-  ⇤ ← Revision 18 as of 2011-09-30 15:52:44 → 
  Size: 5837
  Editor: ShirleyHui
  Comment:
+   ← Revision 44 as of 2013-03-14 14:10:08 → ⇥
  Size: 6835
  Editor: ShirleyHui
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-== Structure based proteome scanning prediction of PDZ domain peptide interactions ==
+== Predicting PDZ Domain Mediated Protein Interactions from Structure ==
 Line 7:
-Website: URL
+Website: http://webservice.baderlab.org/domains/POW/
 Line 10:
-PDZ domains are peptide recognition domains that are involved in important biological processes that bind their targets through the recognition of simple linear motifs.  The recent availability of high throughput PDZ domain peptide interaction data has prompted the development of sequence based predictors of PDZ domain peptide interactions.  However, the performance of these predictors depends on how similar in sequence a given domain is to the training domains.  On the other hand, domain structure features are known to play roles in determining PDZ domain binding specificity and can also be used for training.  When used for proteome scanning, such a predictor may be able to predict more novel interactions and increase the coverage of PDZ domain mediated protein protein interactions that can be currently predicted.
+PDZ domains are structural protein domains that recognize ligands containing simple linear amino acid motifs and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity determination and neural development. PDZ domain-peptide interaction predictors have recently been developed and can predict peptide ligands using primary amino acid sequence information, however they are limited to PDZ sequences that are similar in sequence to the training domains. Since domain structure is known to influence binding specificity, we hypothesized that the use of structural information should result in the prediction of new interactions and should be less dependent on sequence similarity than the sequence-based predictors.
 Line 13:
-We developed a structure based predictor of PDZ domain peptide interactions.  We use domain structure features for training which are known to facilitate protein folding and stability and protein interactions.  We also computationally generate additional negative interactions for training and show that this reduces the number of potential false positives returned by the predictor.  Through multiple cross validation strategies and a series of blind tests we show that the predictor is estimated to have improved generalization performance and can correctly predict interactions in different organisms.   Through proteome scanning in human we show that the structure based predictions correspond to known PDZ domain peptide interactions and known protein protein interactions in curated databases.  We also show that a large number of validated hits are novel, representing a 53% increase in PDZ domain mediated PPIs that could be predicted before.  A functional enrichment analysis shows that the biological process terms associated with these hits are also novel.
+We developed a novel support vector machine-based predictor of PDZ domain and C-terminal peptide interactions using PDZ domain structure and peptide sequence information. Different cross validation strategies and blind tests show that the predictor can correctly predict interactions in multiple organisms. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and innate immune signalling and suggest new PDZ interactions for other processes including wound healing and Wnt signalling.
 Line 18:
-The following are SVM proteome scanning predictions for 175 human, 7 fly and 6 worm PDZ domains.
+The following are SVM proteome scanning structure-based and sequence-based predictions for human, fly and worm PDZ domains.  Only domains with predicted interactions are included.  For example: The structure-based predictor returned predictions for 215 out of 218 human domains scanned.
 Line 20:
- * [[attachment:HumanPredictions.zip|Human 175 (zip)]]
 * [[attachment:FlyPredictions.zip|Fly 7 (zip)]]
 * [[attachment:WormPredictions.zip|Worm 6 (zip)]]
+|| Organism || Structure-based || Sequence-based ||
|| Human || [[attachment:HumanStructPredictions.zip|Human 215 out of 218 (zip)]] || [[attachment:HumanSeqPredictions.zip|Human 224 out of 241 (zip)]] ||
|| Fly || [[attachment:FlyStructPredictions.zip|Fly 7 out of 7 (zip)]] || [[attachment:FlySeqPredictions.zip|Fly 6 out of 7 (zip)]] ||
|| Worm || [[attachment:WormStructPredictions.zip|Worm 6 out of 6 (zip)]] || [[attachment:WormSeqPredictions.zip|Worm 6 out of 6 (zip)]] ||
-Line 38:
+Line 39:
- * <transcript ids>
   * Ensembl TRS ids corresponding to the predicted binder
+ * <protein ids>
   * Ensembl protein ids corresponding to the predicted binder
-Line 41:
+Line 42:
-== Supplementary ==
+== Supplementary Information ==
-Line 43:
+Line 44:
- * Supplementary Doc Link
 Line 46:
-   * BiNGO enrichment files created by the Cytoscape BiNGO Plugin.  Cytoscape v2.8.1 and the BiNGO Plugin v1.44 were used.
 * Cytoscape Enrichment Maps
  * [[attachment:EnrichmentMapSessionFiles.zip|Enrichment Map Session Files (zip)]]
   * Cytoscape session files for the enrichment maps for human PDZ domains created for this project.  Cytoscape v2.8.1 and the Enrichment Map Plugin v1.2 were used.
+   * BiNGO enrichment files created by the Cytoscape BiNGO Plugin from sequence-based and structure-based hits.  Cytoscape v2.8.1 and the BiNGO Plugin v1.44 were used.  Only terms with greater than 5 and less than 300 genes (based on the GMT file below) were used from the GO ontology v1.2 (Dec 7, 2011).  
 * Cytoscape Enrichment Map
  * [[attachment:SummaryEnrichmentMap.cys|Sequence vs. Structure Enrichment Map (cys)]]
  * [[attachment:StructureiRefIndexERMap.cys|Structure vs. iRefIndex Enrichment Map (cys)]]
  * [[attachment:SequenceiRefIndexERMap.cys|Sequence vs. iRefIndex Enrichment Map (cys)]]
  * [[attachment:Human_GOALL_no_GO_iea_UniProt-hui.gmt|Gene-set File (GMT)]]
   * Cytoscape session file for the summary enrichment map comparing sequence-based and structure-based predictions for human PDZ domains.  Cytoscape v2.8.1 and the Enrichment Map Plugin v1.2 were used.
-Line 51:
+Line 54:
-  * Domain Structure
+  * Domain Structures
-Line 57:
+Line 60:
-    * Human files from Sidhu
+    * Human files from Tonikian
-Line 64:
+Line 67:
-    * PDZBase for Human (Worm and Fly included, but not used)
+    * PDZBase for Human (Mouse, Worm and Fly included, but not used)
-Line 68:
+Line 71:
-    * Files required to run the !ProteomeScan software
+    * Files required to run the proteome scanning software
-Line 72:
+Line 75:
-   *[[attachment:PDZSVMStruct_1.0_src.zip|Java Code (zip)]]
+   *[[attachment:PDZSVMStruct_1.0_src.zip|Java Source Code (zip)]]