Diff for "DomainSpecificityPredictionProject" - Bader Lab @ The University of Toronto

Differences between revisions 85 and 114 (spanning 29 versions)

PDZ Domain-Peptide Interaction Prediction

Contents

PDZ Domain-Peptide Interaction Prediction
Table of Contents
Team

Background

The human genome contains approximately 26,000 protein-coding genes, which through alternative splicing can direct the synthesis of thousands of different proteins. The majority of these proteins interact with other proteins to coordinate a variety of cellular processes including DNA replication, cell cycle control, and signal transduction. The ability to accurately detect these interactions enables the assembly of protein interaction networks which can be used to better understand and study the biochemistry of the cell.

Computational PPI Prediction

Several computational methods to predict protein protein interactions (PPIs) have been developed and can be used to support or prioritize experiments. Such methods fall into a range of categories from physics to statistics-based method, however they all face several challenges. For physics-based prediction methods, the structures of the proteins are often unavailable or protein flexibility is not taken into consideration. Sequence based methods like PWMs can only represent short binding motifs and often do not account for interdependencies between residues and positions. In general, the computational prediction of PPIs is considered an extremely difficult problem that is not fully addressed by any existing method.

Many PPIs are mediated by peptide recognition domains (PRDs), which are evolutionary conserved modular interaction domains often found combined in different ways to form larger proteins. Proteins containing PRDs are used by the cell for numerous processes such as the co-localization of proteins, regulation of signaling processes or recognition of protein post-translational modifications. Interactions usually occur through the recognition of short linear sequences in the target protein such as proline-rich or C terminal motifs. Because of their simpler binding sites and straightforward modes of target recognition, it is easier to computationally predict peptide-PRD interactions than it is to predict PPIs more generally.

Computational Prediction of PDZ Domain Interactions

The PSD95/DlgA/Zo-1 (PDZ) domain is an ideal model for studying the computational prediction of peptide-PRD interactions since they are have important biological roles, are well studied and one of the simplest binding sites among PRDs. PDZ domains are found in bacteria, yeast, plants, and metazoans with 250 found in humans. They often interact with ion channels, adhesion molecules, and neurotransmitter receptors in signaling and scaffolding proteins. The biological roles include maintaining cell polarity, facilitating signal coupling, and regulating synaptic development. Their importance is emphasized, as mutations of the PDZ domain in different proteins have been associated with various diseases.

Sequence Based Prediction

Recently, two high through put experiments have been performed to study different PDZ domains. This has enabled the development of computational predictors of PDZ domain interactions. This project focuses on using a machine learning method called support vector machines to computationally predict PDZ domain interactions directly from a given proteome. [Read More]

Structure Based Prediction

While the previously developed sequence based predictor is able to more accurately and precisely scan proteomes of different organisms for PDZ domain binders, its performance relies on the sequence similarity between testing and training domains. On the other hand, it is known that the domain structure can play a big role in determining PDZ domain binding specificity. Therefore, we developed a structure based predictor of PDZ domain peptide interactions which is trained using PDZ domain structure features. [Read More]

POW! PDZ Domain-Peptide Interaction Prediction Website

POW is a website that allows users to predict domain-peptide interactions for human, mouse, worm and fly PDZ domains. Predictions are made using a support vector machine (SVM) that was trained using experimentally determined PDZ interaction data from protein microarray and phage display experiments for mouse and human [1,2]. Two types of predictors are available for use. The first is sequence-based (trained using domain and peptide sequence features) while the other structure-based (trained using domain structure and peptide sequence features).

POW! Website: http://webservice.baderlab.org/domains/POW/

A simple command line user interface that allows users to run POW! locally on their computer is also available. It is written in Java and can be downloaded below. Please unzip the file below and consult the Readme for more details.

POW! CLUI:

Thesis

Team

Shirley Hui
Gary Bader

CategoryProject

-  ⇤ ← Revision 85 as of 2010-04-27 01:16:10 → 
  Size: 8168
  Editor: ShirleyHui
  Comment:
+   ← Revision 114 as of 2013-04-11 14:57:56 → ⇥
  Size: 10462
  Editor: ShirleyHui
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-#acl BaderLabGroup:read,write,revert,delete All:
+#acl All:read
 Line 3:
-== Proteome scanning of PDZ domain interactions using support vector machines ==
+== PDZ Domain-Peptide Interaction Prediction ==
 Line 5:
-## == Table of Contents ==
## <<TableOfContents>>
+== Table of Contents ==
<<TableOfContents>>
 Line 8:
-== Motivation ==
PDZ domains mediate important biological processes through the recognition of short linear motifs. Two recent independent high through put protein microarray and phage display experiments have been used to detect PDZ domain interactions.  Several computational predictors of PDZ domain interactions have also been developed, however they are trained using only protein microarray data or focus on limited subsets of PDZ domains.  An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders.  Such an application would require not only an accurate but precise predictor due to the thousands of possible interactors in a given proteome.  However, once validated these predictions would increase the coverage of current PDZ domain interaction networks and further our understanding of the biologically processes they mediate.
+=== Background ===
The human genome contains approximately 26,000 protein-coding genes, which through alternative splicing can direct the synthesis of thousands of different proteins. The majority of these proteins interact with other proteins to coordinate a variety of cellular processes including DNA replication, cell cycle control, and signal transduction. The ability to accurately detect these interactions enables the assembly of protein interaction networks which can be used to better understand and study the biochemistry of the cell.
 Line 11:
-== Results ==
We developed a PDZ domain interaction predictor using SVMs trained with both protein microarray and phage display data.  In order to use the phage display data for training, we developed a method to deterministically generate artificial negative interactions for the phage display data since it consisted of positive interactions only.  Through extensive blind testing we showed that the SVM could predict interactions in different organisms.  We then used the SVM to scan the proteomes of different organisms to predict binders for several PDZ domains.   Predictions were validated using PDZBase or protein microarray data and a comparison of F1 measures and FPRs between the SVM and published or commonly used predictors demonstrated the SVM’s improved accuracy and precision.
+=== Computational PPI Prediction ===
Several computational methods to predict protein protein interactions (PPIs) have been developed and can be used to support or prioritize experiments.  Such methods fall into a range of categories from physics to statistics-based method, however they all face several challenges.  For physics-based prediction methods, the structures of the proteins are often unavailable or protein flexibility is not taken into consideration.  Sequence based methods like PWMs can only represent short binding motifs and often do not account for interdependencies between residues and positions. In general, the computational prediction of PPIs is considered an extremely difficult problem that is not fully addressed by any existing method.
 Line 14:
-== Supplementary Data ==
 a. Supplementary Document (Link)
 a. PDZSVM Data Files [[attachment:PDZSVMData.zip]]
  * Models
    * Chen model parameter and binding site encoding files
    * Stiffler model parameter files
  * Proteomes
    * Ensembl proteome files for Human, Worm and Fly
  * Experiment Interaction files (in peptide file format)
    * Fly files from Chen
    * Human files from Sidhu
    * Mouse files from Stiffler
    * Worm files from Chen
  * Curated Interaction files (flat files)
    * PDZBase for Human (Worm and Fly included, but not used)
    * Human Protein Reference Database
  * Phage codon bias files
+Many PPIs are mediated by peptide recognition domains (PRDs), which are evolutionary conserved modular interaction domains often found combined in different ways to form larger proteins.  Proteins containing PRDs are used by the cell for numerous processes such as the co-localization of proteins, regulation of signaling processes or recognition of protein post-translational modifications. Interactions usually occur through the recognition of short linear sequences in the target protein such as proline-rich or C terminal motifs.  Because of their simpler binding sites and straightforward modes of target recognition, it is easier to computationally predict peptide-PRD interactions than it is to predict PPIs more generally.
-Line 32:
+Line 16:
-== Availability and Implementation ==
Source code and dependencies are freely available upon request, implemented in Java.
 * Dependencies: 
   * jfreechart 1.0.12 (and dependencies)
   * weka 3.9.1
   * auc calculator (Davis & Goadrich, 2006)
   * !BioJava 1.5
   * iText 2.1.3
   * jmatio
   * BRAIN 1.0.5 (pdzsvm)
   * libSVM 2.8.9 (pdzsvm)
+=== Computational Prediction of PDZ Domain Interactions ===
The PSD95/DlgA/Zo-1 (PDZ) domain is an ideal model for studying the computational prediction of peptide-PRD interactions since they are have important biological roles, are well studied and one of the simplest binding sites among PRDs. PDZ domains are found in bacteria, yeast, plants, and metazoans with 250 found in humans. They often interact with ion channels, adhesion molecules, and neurotransmitter receptors in signaling and scaffolding proteins.  The biological roles include maintaining cell polarity, facilitating signal coupling, and regulating synaptic development. Their importance is emphasized, as mutations of the PDZ domain in different proteins have been associated with various diseases.

==== Sequence Based Prediction ====
Recently, two high through put experiments have been performed to study different PDZ domains.  This has enabled the development of computational predictors of PDZ domain interactions.  This project focuses on using a machine learning method called support vector machines to computationally predict PDZ domain interactions directly from a given proteome. [[Data/PDZProteomeScanning|[Read More]]]

==== Structure Based Prediction ====
While the previously developed sequence based predictor is able to more accurately and precisely scan proteomes of different organisms for PDZ domain binders, its performance relies on the sequence similarity between testing and training domains.  On the other hand, it is known that the domain structure can play a big role in determining PDZ domain binding specificity.  Therefore, we developed a structure based predictor of PDZ domain peptide interactions which is trained using PDZ domain structure features. [[Data/StructurePDZProteomeScanning|[Read More]]]

==== POW! PDZ Domain-Peptide Interaction Prediction Website ====
POW is a website that allows users to predict domain-peptide interactions for human, mouse, worm and fly PDZ domains. Predictions are made using a support vector machine (SVM) that was trained using experimentally determined PDZ interaction data from protein microarray and phage display experiments for mouse and human [1,2]. Two types of predictors are available for use. The first is sequence-based (trained using domain and peptide sequence features) while the other structure-based (trained using domain structure and peptide sequence features).

POW! Website: http://webservice.baderlab.org/domains/POW/

A simple command line user interface that allows users to run POW! locally on their computer is also available.  It is written in Java and can be downloaded below.  Please unzip the file below and consult the Readme for more details.

POW! CLUI: 

=== Thesis ===
 * [[attachment:Thesis-ShirleyHui-Feb19-2013.pdf|Computational Prediction of PDZ Mediated Protein-Protein Interactions (2013) PDF]]
 * [[attachment:Thesis-ShirleyHui-Feb19-2013.docx|Computational Prediction of PDZ Mediated Protein-Protein Interactions (2013) WORD]]