Diff for "DomainSpecificityPredictionProject" - Bader Lab @ The University of Toronto

Differences between revisions 53 and 75 (spanning 22 versions)

TableOfContents()

Goals

Computationally predict specificity of peptide recognition domain from the primary amino acid sequences
Analyze PDZ, WW and then SH3 domains

Background

[wiki:/PDZ PDZ Domains]
[wiki:/MachineLearning Machine Learning]

Strategy

[wiki:/Strategy Strategy]

Ideas

[wiki:/Ideas Ideas]

Data

[wiki:/PDZData PDZ Data]

Experiments

[wiki:/Experiments Experiments and Results]

Status

[wiki:/Log Status]

Tools/Resources

[wiki:/ToolsResources Tools and Resources]

Reading Notes

[wiki:/../ShirleyHui/MBCReadings Molecular Biology of the Cell]
[wiki:/../ShirleyHui/PPIReadings Protein-protein Interaction Detection]
Support Vector Machines

Related Literature

[http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
[http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]

Team

Shirley Hui
Gary Bader

CategoryProject

-  ⇤ ← Revision 53 as of 2008-03-02 15:58:47 → 
  Size: 8172
  Editor: ShirleyHui
  Comment:
+   ← Revision 75 as of 2008-04-08 19:07:57 → ⇥
  Size: 5329
  Editor: ShirleyHui
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
- * Predict specificity of peptide recognition domain from the primary amino acid sequence.
+ * Computationally predict specificity of peptide recognition domain from the primary amino acid sequences
 Line 10:
+== Background ==
 * [wiki:/PDZ PDZ Domains]
 * [wiki:/MachineLearning Machine Learning]
-Line 11:
+Line 15:
-## [wiki:/Strategy Strategy Log]
+ * [wiki:/Strategy Strategy]

== Ideas ==
 * [wiki:/Ideas Ideas]

== Data ==
 * [wiki:/PDZData PDZ Data]

== Experiments ==
 * [wiki:/Experiments Experiments and Results]
-Line 14:
+Line 27:
- * [wiki:/Log Status Log]
+ * [wiki:/Log Status]
-Line 69:
+Line 82:
+== Tools/Resources ==
 * [wiki:/ToolsResources Tools and Resources]

== Reading Notes ==
 * [wiki:/../ShirleyHui/MBCReadings Molecular Biology of the Cell]
 * [wiki:/../ShirleyHui/PPIReadings Protein-protein Interaction Detection]
 * Support Vector Machines

== Related Literature ==
 * [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
 * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]
-Line 73:
+Line 98:
-== Tools/Resources ==

=== Domains ===
 * [wiki:/PDZ PDZ Domain]

=== Databases ===
 * [http://www.ensembl.org/ Ensembl]
   * Software system which produces and maintains automatic annotation on selected eukaryotic genomes.
 * [http://www.ebi.ac.uk/interpro/ InterPro]
   * Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
 * [http://www.biomart.org/ BioMart]
   * Query-oriented data management system that simplifies the task of creation and maintenance of advanced query interfaces backed by a relational database.  It is particularly suited for providing the 'data mining' like searches of complex descriptive (e.g. biological) data.

=== Sequence Alignment ===

==== Multiple ====
===== Hierarhical Methods =====
 * [http://www.compbio.dundee.ac.uk/Software/Amps/amps.html/ AMPS] 1990
   * Calculates Z-scores through pairwise sequences comparison with randomization
   * Generates alignments without having to generate trees
 * [http://www.ebi.ac.uk/clustalw/ ClustalW] 1997
   * Uses a series of different pair-score matrices
   * Biases location of gaps based on secondary structure mask
   * Allows for realigning to refine the alignment
   * Can infer phylogeny
   * Problems:
     * Time required to complete first all against all comparison to create guide tree
 * [http://www.drive5.com/muscle/ MUSCLE] 2004
   * MUltiple Sequence Comparison by Log-Expectation
   * Uses a quick hashing comparison based on identical matches 
 * [http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/ MAFFT] 2005
   * Calculates guide tree faster by using fast Fourier transform method on AA properites to identify regions of similarity
   * Uses these regions to guide dynamic programming alignment of the sequences
 
===== Non Hierarchical Methods =====

 * [http://www.ncbi.nlm.nih.gov/BLAST/ PSI-BLAST] 1997
   * Searches a database with a single sequence
   * High scoring sequences are built into a multiple alignment which is used to derive a search profile for subsequent search of the database
   * Repeat until no new sequences are added to the profile or a specified number of iterations have been performed
 * [http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi T-Coffee] 2000
   * Builds a library of pairwise alignments for the sequences of interest
   * Uses library to inform hierarchical method to find a multiple alignment that preserves consistency between the pairwise alignments
   * Can align sequences of varying lengths
 * [http://baboon.math.berkeley.edu/amap/ AMAP] 2007
   * Multiple sequence alignment by sequence annealing

===== Probabilistic Methods =====
 * [http://probcons.stanford.edu/ Probcons] 2005
 * [http://probalign.njit.edu/probalign/login ProbAlign] 2006
   * Estimates amino acid posterior probabilities using a partition function of the alignments.
   * Computes the maximum expected accuracy alignment after applying the probability consistency transformation of Probcons.
   * Improvements best seen with datasets of variable and long length sequences.

=== Viewers ===
 * [http://www.jalview.org/ JalView]
   * Multiple alignment viewer/editor written in Java

== Background Literature ==

[http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
 
=== Textbook ===
 * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]

=== Other ===
 * http://proteinkeys.org

Navigation

Table of Contents

Goals

Background

Strategy

Ideas

Data

Experiments

Status

Tools/Resources

Reading Notes

Related Literature

Team