Size: 3073
Comment:
|
Size: 4423
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl BaderLabGroup:read,write,revert,delete All: | ## page was renamed from PDZProteomeScanning ## page was renamed from PDZInteractionPredictionProject #acl All:read |
Line 3: | Line 5: |
== Proteome scanning of PDZ domain interactions using support vector machines == | = Proteome scanning of PDZ domain interactions using support vector machines = |
Line 6: | Line 8: |
## <<TableOfContents>> | <<TableOfContents>> |
Line 8: | Line 10: |
== Motivation == PDZ domains mediate important biological processes through the recognition of short linear motifs. Two recent independent high through put protein microarray and phage display experiments have been used to detect PDZ domain interactions. Several computational predictors of PDZ domain interactions have also been developed, however they are trained using only protein microarray data or focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require not only an accurate but precise predictor due to the thousands of possible interactors in a given proteome. However, once validated these predictions would increase the coverage of current PDZ domain interaction networks and further our understanding of the biologically processes they mediate. |
== Background == An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes. |
Line 12: | Line 14: |
We developed a PDZ domain interaction predictor using SVMs trained with both protein microarray and phage display data. In order to use the phage display data for training, we developed a method to deterministically generate artificial negative interactions for the phage display data since it consisted of positive interactions only. Through extensive blind testing we showed that the SVM could predict interactions in different organisms. We then used the SVM to scan the proteomes of different organisms to predict binders for several PDZ domains. Predictions were validated using PDZBase or protein microarray data and a comparison of F1 measures and FPRs between the SVM and published or commonly used predictors demonstrated the SVM’s improved accuracy and precision. | We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and compared to published state of art predictors, is more accurate and precise. |
Line 14: | Line 16: |
== Supplementary Data == a. Supplementary Document (Link) a. PDZSVM Data Files [[attachment:PDZSVMData.zip]] |
== SVM Predictions == SVM predictions were validated using known interactions from PDZBase, a domain peptide interaction database. To further support our predictions, the number of interactions which also corresponded to known protein-protein interactions (PPIs) was calculated for 213 PDZ domains with interactions from iRefIndex. iRefIndex is a PPI database which consolidates PPIs from different databases including BIND, BioGRID, CORUM, DIP, HPRD, !IntAct, MINT. The following are SVM proteome scanning predictions for 13 human, 6 fly and 6 worm PDZ domains with domain-peptide interactions in PDZBase. * [[attachment:HumanSVMPredictions.zip|Human 13 (zip)]] * [[attachment:FlySVMPredictions.zip|Fly 6 (zip)]] * [[attachment:WormSVMPredictions.zip|Worm 6 (zip)]] The following are SVM proteome scanning predictions for 192 human PDZ domains for which the SVM predicted binders. From this set 75 PDZ domains had predicted interactions which corresponded to PPIs in iRefIndex. Please see Supplementary Information for more details. * [[attachment:Human192SVMPredictions.zip|Human 192 (zip)]] The format of the output is: <indicator> <predicted binder sequence> <decision value> <source> <transcript ids> * <indicator> is a symbol: * * = validated by PDZBase or corresponds to an iRefIndex PPI * X = false positive as determined by protein microarray experiments (only for fly and worm) * empty = no experiment or other evidence to validate or support this prediction * <predicted binder sequence> is the sequence of length five of the predicted binder * <decision value> is a real number computed by the SVM to evaluate if a given sequence should be predicted as positive or negative * <source> only non empty if indicator is non empty * PB = Found in PDZBase * IR = Corresponds to a PPI in iRefIndex (transcript index1, ..., transcript index n) * <transcript ids> * Ensembl TRS ids with tails corresponding to the predicted binder == Supplementary == * [[attachment:SupplementaryInformation.pdf|Supplementary Information (pdf)]] * [[attachment:PDZSVMData.zip|Data Files (zip)]] |
Line 29: | Line 59: |
* Human Protein Reference Database | * iRefIndex interactions for Human |
Line 31: | Line 61: |
* !ProteomeScan Files * Files required to run the !ProteomeScan software |
|
Line 32: | Line 64: |
== Java Implementation == a. Source and Binaries * License: [[attachment:PDZSVM_LICENSE.txt]] * Code: [[attachment:PDZSVM_1.0_src.zip]] * Classes [[attachment:PDZSVM_1.0.jar]] a. Dependencies [[attachment:PDZSVMDep.zip]] * jfreechart 1.0.12 (and dependencies) * weka 3.9.1 * auc calculator (Davis & Goadrich, 2006) * !BioJava 1.5 * iText 2.1.3 * jmatio * BRAIN 1.0.5 (pdzsvm) * libSVM 2.8.9 (pdzsvm) |
== Source Code == *[[attachment:PDZSVM_LICENSE.txt|GNU LGP License (txt)]] *[[attachment:PDZSVM_1.0_src.zip|Java Code (zip)]] *[[attachment:PDZSVMDep.zip|Dependency Jars (zip)]] * jfreechart 1.0.12 (and dependencies) * weka 3.9.1 * auc calculator (Davis & Goadrich, 2006) * !BioJava 1.5 * iText 2.1.3 * jmatio * Bingo 2.3 * Cytoscape 2.6.3 * Cytoscape-task 2.6.3 * BRAIN 1.0.5 (pdzsvm) * libSVM 2.8.9 (pdzsvm) |
Proteome scanning of PDZ domain interactions using support vector machines
Contents
Background
An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes.
Results
We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and compared to published state of art predictors, is more accurate and precise.
SVM Predictions
SVM predictions were validated using known interactions from PDZBase, a domain peptide interaction database. To further support our predictions, the number of interactions which also corresponded to known protein-protein interactions (PPIs) was calculated for 213 PDZ domains with interactions from iRefIndex. iRefIndex is a PPI database which consolidates PPIs from different databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT.
The following are SVM proteome scanning predictions for 13 human, 6 fly and 6 worm PDZ domains with domain-peptide interactions in PDZBase.
The following are SVM proteome scanning predictions for 192 human PDZ domains for which the SVM predicted binders. From this set 75 PDZ domains had predicted interactions which corresponded to PPIs in iRefIndex. Please see Supplementary Information for more details.
The format of the output is:
<indicator> <predicted binder sequence> <decision value> <source> <transcript ids>
<indicator> is a symbol:
- * = validated by PDZBase or corresponds to an iRefIndex PPI
- X = false positive as determined by protein microarray experiments (only for fly and worm)
- empty = no experiment or other evidence to validate or support this prediction
<predicted binder sequence> is the sequence of length five of the predicted binder
<decision value> is a real number computed by the SVM to evaluate if a given sequence should be predicted as positive or negative
<source> only non empty if indicator is non empty
- PB = Found in PDZBase
- IR = Corresponds to a PPI in iRefIndex (transcript index1, ..., transcript index n)
<transcript ids>
- Ensembl TRS ids with tails corresponding to the predicted binder
Supplementary
- Models
- Chen model parameter and binding site encoding files
- Stiffler model parameter files
- Proteomes
- Ensembl proteome files for Human, Worm and Fly
- Experiment Interaction files (in peptide file format)
- Fly files from Chen
- Human files from Sidhu
- Mouse files from Stiffler
- Worm files from Chen
- Curated Interaction files (flat files)
- PDZBase for Human (Worm and Fly included, but not used)
- iRefIndex interactions for Human
- Phage codon bias files
ProteomeScan Files
Files required to run the ProteomeScan software
- Models
Source Code
- jfreechart 1.0.12 (and dependencies)
- weka 3.9.1
auc calculator (Davis & Goadrich, 2006)
BioJava 1.5
- iText 2.1.3
- jmatio
- Bingo 2.3
- Cytoscape 2.6.3
- Cytoscape-task 2.6.3
- BRAIN 1.0.5 (pdzsvm)
- libSVM 2.8.9 (pdzsvm)
Team
- Shirley Hui
- Gary Bader