Diff for "DomainSpecificityPredictionProject" - Bader Lab @ The University of Toronto

Differences between revisions 7 and 68 (spanning 61 versions)

TableOfContents()

Goals

Computationally predict specificity of peptide recognition domain from the primary amino acid sequences
Analyze PDZ, WW and then SH3 domains

Background

[wiki:/PDZ PDZ Domains]
[wiki:/MachineLearning Machine Learning]

Strategy/Ideas

[wiki:/Strategy Strategy]

Data

[wiki:/PDZData PDZ Data]

Experiments

[wiki:/Experiment Experiments and Results]

Status

[wiki:/Log Status]

Tools/Resources

[wiki:/ToolsResources Tools and Resources]

Related Literature

[http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
[http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]

Team

Shirley Hui
Gary Bader

CategoryProject

-  ⇤ ← Revision 7 as of 2007-05-09 19:45:04 → 
  Size: 1313
  Editor: ShirleyHui
  Comment:
+   ← Revision 68 as of 2008-03-04 21:41:42 → ⇥
  Size: 5100
  Editor: ShirleyHui
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
+== Table of Contents ==
[[TableOfContents()]]
-Line 4:
+Line 7:
- * Predict specificity of peptide recognition domain from the primary amino acid sequence.
+ * Computationally predict specificity of peptide recognition domain from the primary amino acid sequences
-Line 7:
+Line 10:
-== Strategy ==
+== Background ==
 * [wiki:/PDZ PDZ Domains]
 * [wiki:/MachineLearning Machine Learning]

== Strategy/Ideas ==
 * [wiki:/Strategy Strategy]

== Data ==
 * [wiki:/PDZData PDZ Data]

== Experiments ==
 * [wiki:/Experiment Experiments and Results]
-Line 10:
+Line 24:
- * [wiki:/Log Status Log]
+ * [wiki:/Log Status]
-Line 12:
+Line 26:
-== Tasks ==
+## == Tasks ==
## 
##  1. --(Learn SVN, Brain code (!ResidueResidueCorrelation))--
##  1. Literature review related to domain specificity (background activity), PDZ domains (from Ioana's project)
##  1. --(Run !ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2  (Requires: PDZ profiles from Gary))--
##  1. MSA subproject
##   1. --(Learn basics of multiple sequence alignment (Baxevanis, chapter 12))--
##   1. Find and evaluate MSA algorithms (compare notes with Stacy) + evaluate Superfamily, PFAM databases of protein family alignments
##   1. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results.
##  1. Benchmark/validate correlation subproject
##   1. We know H (PDZ), T @-2 (peptide) correlation
##   1. Look at structures (e.g. 1N7T and 1BE9) to see if correlated residues/positions are close to each other and compatible (physicochemically). We need to focus on ## PDZ structures that have bound peptides (search in PDB)
##   1. Build set of known true and false correlations for use in evaluating prediction algorithm (Note: also ask Dev Sidhu, when available). See [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=10871264 Baldi et al. review]
## 1. Amino acid group subproject
##   1. Learn about amino acid groups
##   1. Define an initial aa grouping (reasonable grouping from Levy paper)
##   1. Add new feature to !ResidueResidueCorrelation class so it considers grouping + run on PDZ data. This involves implementing the groups as a reduced alphabet (amino acids in a group are considered equivalent)
##   1. Try all groupings to see how it affects the results (from Levy paper)
##   1. See if we can incorporate aa similarity defined by substitution matrix approach (e.g. BLOSUM, PAM, GONNET) into our method, instead of grouping
##   1. Similarly, evaluate aa similarity defined by factor analysis (Atchley et al paper)
##  1. Think about new PDZ domain features that can be used for prediction.
-Line 14:
+Line 48:
-. Learn SVN, Brain code (ResidueResidueCorrelation)
 1. Literature review related to domain specificity (background activity)
 1. Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2  (Requires: PDZ profiles from Gary)
 1. Implement new feature: amino acid groups (learn amino acid groups) + run on PDZ data
 1. Think about new PDZ domain features that can be used for prediction.
+## == Ideas ==
##  * [wiki:/MachineLearning Machine Learning Page]
##  * With current correlation counting calculation, Weight calculation by how many peptides are in the peptides file (i.e. normalize the correlation calculation in some way)
##  * Build tools to help interpret correlations in the context of multiple sequence alignments (and later structures).
##  * Use of structural data (PDZ domain structures) (may require homology modeling)
##  * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model)
##  * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis
##  * Analysis of SNPs and how they affect domain binding (including correlations between SNPs)
##  * Define the binding site of the PDZ domain based on phage display data.  Given that identical binding sites between two PDZ domains should correspond to identical ## binding specificities, find the set of PDZ domain sites that correlate perfectly with binding specificity.
-Line 20:
+Line 58:
-== Ideas ==
 * Use of structural data (PDZ domain structures) (may require homology modeling)
 * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model)
 * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis
+## == Courses ==

## === Biology ===
##  * [http://bio250y.chass.utoronto.ca/ BIO250] - Cell and Molecular Biology
##   * Classes: Tues/Thurs - 1-2 PM (Convocation Hall) OR Mon - 6-8 PM (MC 102-Mechanical Engineering Building)
##   * Textbook: [http://www.amazon.com/Molecular-Biology-Fourth-Bruce-Alberts/dp/0815332181/ref=pd_sim_b_1/105-5132391-0345258?ie=UTF8&qid=1188913552&sr=1-4 Molecular Biology of the Cell 4th Ed.] Alberts et al.
## === Protein Structure ===
##  * BCH340H1 - Proteins: from Structure to Proteomics
##   * Classes: Winter 2008
##   * Textbook: ?
##   * Previous Course Web Pages:
##     * [http://arrhenius.med.utoronto.ca/~chan/bch340h04-outline.html 2004 Chan]
##     * [http://xtal.uhnres.utoronto.ca/prive/BCH340/ 2006 Prive]
## === Machine Learning ===
##  * CSC2515 - Machine Learning
##    * Previous Course Web Pages:
##      * [http://www.cs.toronto.edu/~roweis/csc2515/ 2003-2006 Roweis]

## == Committee Meetings ==
##  * [wiki:/Meeting Notes]

== Tools/Resources ==
 * [wiki:/ToolsResources Tools and Resources]

== Related Literature ==
 * [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
 * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]
-Line 29:
+Line 90:
-== Documents ==

== Background Literature ==
 * The Structure and Function of Proline Recognition Domains, Zarrinpar et al., 2003 attachment:Structure_Function_Pro_Recog_Domains_Zarrinpar_et_al_2003.pdf