2454
Comment:
|
5084
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
== Table of Contents == [[TableOfContents()]] |
|
Line 7: | Line 10: |
== Strategy == | == Background == * [wiki:/PDZ PDZ Domains] * [wiki:/MachineLearning Machine Learning] == Strategy/Ideas == * [wiki:/Strategy Strategy] == Data == * [wiki:/PDZData PDZ Data] == Experiments == * [wiki:/Experiment Experiments and Results] |
Line 10: | Line 24: |
* [wiki:/Log Status Log] | * [wiki:/Log Status] |
Line 12: | Line 26: |
== Tasks == | ## == Tasks == ## ## 1. --(Learn SVN, Brain code (!ResidueResidueCorrelation))-- ## 1. Literature review related to domain specificity (background activity), PDZ domains (from Ioana's project) ## 1. --(Run !ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary))-- ## 1. MSA subproject ## 1. --(Learn basics of multiple sequence alignment (Baxevanis, chapter 12))-- ## 1. Find and evaluate MSA algorithms (compare notes with Stacy) + evaluate Superfamily, PFAM databases of protein family alignments ## 1. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results. ## 1. Benchmark/validate correlation subproject ## 1. We know H (PDZ), T @-2 (peptide) correlation ## 1. Look at structures (e.g. 1N7T and 1BE9) to see if correlated residues/positions are close to each other and compatible (physicochemically). We need to focus on ## PDZ structures that have bound peptides (search in PDB) ## 1. Build set of known true and false correlations for use in evaluating prediction algorithm (Note: also ask Dev Sidhu, when available). See [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=10871264 Baldi et al. review] ## 1. Amino acid group subproject ## 1. Learn about amino acid groups ## 1. Define an initial aa grouping (reasonable grouping from Levy paper) ## 1. Add new feature to !ResidueResidueCorrelation class so it considers grouping + run on PDZ data. This involves implementing the groups as a reduced alphabet (amino acids in a group are considered equivalent) ## 1. Try all groupings to see how it affects the results (from Levy paper) ## 1. See if we can incorporate aa similarity defined by substitution matrix approach (e.g. BLOSUM, PAM, GONNET) into our method, instead of grouping ## 1. Similarly, evaluate aa similarity defined by factor analysis (Atchley et al paper) ## 1. Think about new PDZ domain features that can be used for prediction. |
Line 14: | Line 48: |
1. Learn SVN, Brain code (ResidueResidueCorrelation) 1. Literature review related to domain specificity (background activity) 1. Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary) 1. Implement new feature: amino acid groups (learn amino acid groups) + run on PDZ data 1. Think about new PDZ domain features that can be used for prediction. |
## == Ideas == ## * [wiki:/MachineLearning Machine Learning Page] ## * With current correlation counting calculation, Weight calculation by how many peptides are in the peptides file (i.e. normalize the correlation calculation in some way) ## * Build tools to help interpret correlations in the context of multiple sequence alignments (and later structures). ## * Use of structural data (PDZ domain structures) (may require homology modeling) ## * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) ## * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis ## * Analysis of SNPs and how they affect domain binding (including correlations between SNPs) ## * Define the binding site of the PDZ domain based on phage display data. Given that identical binding sites between two PDZ domains should correspond to identical ## binding specificities, find the set of PDZ domain sites that correlate perfectly with binding specificity. |
Line 20: | Line 58: |
== Ideas == * Use of structural data (PDZ domain structures) (may require homology modeling) * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis |
## == Courses == ## === Biology === ## * [http://bio250y.chass.utoronto.ca/ BIO250] - Cell and Molecular Biology ## * Classes: Tues/Thurs - 1-2 PM (Convocation Hall) OR Mon - 6-8 PM (MC 102-Mechanical Engineering Building) ## * Textbook: [http://www.amazon.com/Molecular-Biology-Fourth-Bruce-Alberts/dp/0815332181/ref=pd_sim_b_1/105-5132391-0345258?ie=UTF8&qid=1188913552&sr=1-4 Molecular Biology of the Cell 4th Ed.] Alberts et al. ## === Protein Structure === ## * BCH340H1 - Proteins: from Structure to Proteomics ## * Classes: Winter 2008 ## * Textbook: ? ## * Previous Course Web Pages: ## * [http://arrhenius.med.utoronto.ca/~chan/bch340h04-outline.html 2004 Chan] ## * [http://xtal.uhnres.utoronto.ca/prive/BCH340/ 2006 Prive] ## === Machine Learning === ## * CSC2515 - Machine Learning ## * Previous Course Web Pages: ## * [http://www.cs.toronto.edu/~roweis/csc2515/ 2003-2006 Roweis] ## == Committee Meetings == ## * [wiki:/Meeting Notes] == Tools/Resources == * [wiki:/ToolsResources Tools and Resources] == Related Literature == * [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea] * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell] |
Line 29: | Line 90: |
== Documents == == Background Literature == === More General === * The Structure and Function of Proline Recognition Domains, Zarrinpar A, Bhattacharyya RP, Lim WA., Sci STKE. 2003 Apr 22;2003(179):RE8. * attachment:Structure_Function_Pro_Recog_Domains_Zarrinpar_et_al_2003.pdf * Can we infer peptide recognition specificity mediated by SH3 domains?, Cesareni G, Panni S, Nardelli G, Castagnoli L., FEBS Lett. 2002 Feb 20;513(1):38-44. * attachment:Can_we_infer_PR_specificity_med_by_SH3_Cesareni_et_al_2002.pdf * Domains, motifs, and scaffolds: the role of modular interactions in the evolution and wiring of cell signaling circuits, Bhattacharyya RP, Remenyi A, Yeh BJ, Lim WA., Annu Rev Biochem. 2006;75:655-80. * attachment:Domains_motifs_scaffolds_Bhattacharyya_et_al_2006.pdf * Simplified amino acid alphabets for protein fold recognition and implications for folding, Murphy LR, Wallqvist A, Levy RM, Protein Eng. 2000 Mar;13(3):149-52. === PDZ Related === * PDZ domains-glue and guide., van Ham M, Hendriks W., Mol Biol Rep. 2003 Jun;30(2):69-82. * attachment:PDZ_Domains_Glue_and_Guide_2003.pdf * PDZ domains: structural modules for protein complex assembly., Hung AY, Sheng M., J Biol Chem. 2002 Feb 22;277(8):5699-702. Epub 2001 Dec 10. * attachment:PDZ_Domains_Structural_Modules_2001.pdf |
Table of Contents
Goals
- Predict specificity of peptide recognition domain from the primary amino acid sequence.
- Analyze PDZ, WW and then SH3 domains
Background
- [wiki:/PDZ PDZ Domains]
[wiki:/MachineLearning Machine Learning]
Strategy/Ideas
- [wiki:/Strategy Strategy]
Data
- [wiki:/PDZData PDZ Data]
Experiments
- [wiki:/Experiment Experiments and Results]
Status
- [wiki:/Log Status]
Tools/Resources
[wiki:/ToolsResources Tools and Resources]
Related Literature
[http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
[http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]
Team
- Shirley Hui
- Gary Bader