3885
Comment:
|
5134
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
== Table of Contents == [[TableOfContents()]] |
|
Line 4: | Line 7: |
* Predict specificity of peptide recognition domain from the primary amino acid sequence. | * Computationally predict specificity of peptide recognition domain from the primary amino acid sequences |
Line 7: | Line 10: |
== Background == * [wiki:/PDZ PDZ Domains] * [wiki:/MachineLearning Machine Learning] |
|
Line 8: | Line 15: |
* [wiki:/Strategy Strategy] == Ideas == * [wiki:/Ideas Ideas] == Data == * [wiki:/PDZData PDZ Data] == Experiments == * [wiki:/Experiments Experiments and Results] |
|
Line 10: | Line 27: |
* [wiki:/Log Status Log] | * [wiki:/Log Status] |
Line 12: | Line 29: |
== Tasks == | ## == Tasks == ## ## 1. --(Learn SVN, Brain code (!ResidueResidueCorrelation))-- ## 1. Literature review related to domain specificity (background activity), PDZ domains (from Ioana's project) ## 1. --(Run !ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary))-- ## 1. MSA subproject ## 1. --(Learn basics of multiple sequence alignment (Baxevanis, chapter 12))-- ## 1. Find and evaluate MSA algorithms (compare notes with Stacy) + evaluate Superfamily, PFAM databases of protein family alignments ## 1. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results. ## 1. Benchmark/validate correlation subproject ## 1. We know H (PDZ), T @-2 (peptide) correlation ## 1. Look at structures (e.g. 1N7T and 1BE9) to see if correlated residues/positions are close to each other and compatible (physicochemically). We need to focus on ## PDZ structures that have bound peptides (search in PDB) ## 1. Build set of known true and false correlations for use in evaluating prediction algorithm (Note: also ask Dev Sidhu, when available). See [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=10871264 Baldi et al. review] ## 1. Amino acid group subproject ## 1. Learn about amino acid groups ## 1. Define an initial aa grouping (reasonable grouping from Levy paper) ## 1. Add new feature to !ResidueResidueCorrelation class so it considers grouping + run on PDZ data. This involves implementing the groups as a reduced alphabet (amino acids in a group are considered equivalent) ## 1. Try all groupings to see how it affects the results (from Levy paper) ## 1. See if we can incorporate aa similarity defined by substitution matrix approach (e.g. BLOSUM, PAM, GONNET) into our method, instead of grouping ## 1. Similarly, evaluate aa similarity defined by factor analysis (Atchley et al paper) ## 1. Think about new PDZ domain features that can be used for prediction. |
Line 14: | Line 51: |
1. Learn SVN, Brain code (ResidueResidueCorrelation) 1. Literature review related to domain specificity (background activity) 1. Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary) 1. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results. 1. Implement new feature: amino acid groups (learn amino acid groups) + run on PDZ data 1. Think about new PDZ domain features that can be used for prediction. |
## == Ideas == ## * [wiki:/MachineLearning Machine Learning Page] ## * With current correlation counting calculation, Weight calculation by how many peptides are in the peptides file (i.e. normalize the correlation calculation in some way) ## * Build tools to help interpret correlations in the context of multiple sequence alignments (and later structures). ## * Use of structural data (PDZ domain structures) (may require homology modeling) ## * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) ## * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis ## * Analysis of SNPs and how they affect domain binding (including correlations between SNPs) ## * Define the binding site of the PDZ domain based on phage display data. Given that identical binding sites between two PDZ domains should correspond to identical ## binding specificities, find the set of PDZ domain sites that correlate perfectly with binding specificity. |
Line 21: | Line 61: |
== Ideas == * Use of structural data (PDZ domain structures) (may require homology modeling) * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis |
## == Courses == ## === Biology === ## * [http://bio250y.chass.utoronto.ca/ BIO250] - Cell and Molecular Biology ## * Classes: Tues/Thurs - 1-2 PM (Convocation Hall) OR Mon - 6-8 PM (MC 102-Mechanical Engineering Building) ## * Textbook: [http://www.amazon.com/Molecular-Biology-Fourth-Bruce-Alberts/dp/0815332181/ref=pd_sim_b_1/105-5132391-0345258?ie=UTF8&qid=1188913552&sr=1-4 Molecular Biology of the Cell 4th Ed.] Alberts et al. ## === Protein Structure === ## * BCH340H1 - Proteins: from Structure to Proteomics ## * Classes: Winter 2008 ## * Textbook: ? ## * Previous Course Web Pages: ## * [http://arrhenius.med.utoronto.ca/~chan/bch340h04-outline.html 2004 Chan] ## * [http://xtal.uhnres.utoronto.ca/prive/BCH340/ 2006 Prive] ## === Machine Learning === ## * CSC2515 - Machine Learning ## * Previous Course Web Pages: ## * [http://www.cs.toronto.edu/~roweis/csc2515/ 2003-2006 Roweis] ## == Committee Meetings == ## * [wiki:/Meeting Notes] == Tools/Resources == * [wiki:/ToolsResources Tools and Resources] == Related Literature == * [http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea] * [http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell] |
Line 30: | Line 93: |
== Documents == == Background Literature == === More General === * Domains, motifs, and scaffolds: the role of modular interactions in the evolution and wiring of cell signaling circuits, Bhattacharyya RP, Remenyi A, Yeh BJ, Lim WA., Annu Rev Biochem. 2006;75:655-80. * attachment:Domains_motifs_scaffolds_Bhattacharyya_et_al_2006.pdf * The Structure and Function of Proline Recognition Domains, Zarrinpar A, Bhattacharyya RP, Lim WA., Sci STKE. 2003 Apr 22;2003(179):RE8. * attachment:Structure_Function_Pro_Recog_Domains_Zarrinpar_et_al_2003.pdf === Specificity Prediction/Inference === * Ab initio prediction of transcription factor targets using structural knowledge, Kaplan T, Friedman N, Margalit H, PLoS Comput Biol. 2005 Jun;1(1):e1. Epub 2005 Jun 24. * attachment:Ab_Initio_Prediction_Transcription_Factor_Targets_Using_Structural_Knowlegde_Kaplan_2005.pdf * Specificity and robustness in transcription control networks, Sengupta AM, Djordjevic M, Shraiman BI, Proc Natl Acad Sci U S A. 2002 Feb 19;99(4):2072-7. * attachment:Specificity_robustness_transcription_control_networks_Sengupta_2002.pdf * Can we infer peptide recognition specificity mediated by SH3 domains?, Cesareni G, Panni S, Nardelli G, Castagnoli L., FEBS Lett. 2002 Feb 20;513(1):38-44. * attachment:Can_we_infer_PR_specificity_med_by_SH3_Cesareni_et_al_2002.pdf === Amino Acid Alphabets === * Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices, Cannata N, Toppo S, Romualdi C, Valle G, Bioinformatics. 2002 Aug;18(8):1102-8. * attachment:Simplifying_AA_alphabets_branch_bound_substit_matrices_Cannata_2002.pdf * Simplified amino acid alphabets for protein fold recognition and implications for folding, Murphy LR, Wallqvist A, Levy RM, Protein Eng. 2000 Mar;13(3):149-52. * attachment:Simplified_AA_alphabets_Murphy_2000.pdf * Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases, Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM, Bioinformatics. 2000 Nov;16(11):988-1002. * attachment:Iterative_structure_search_for_protein_homologs_Wallqvist_2000.pdf === PDZ Related === * PDZ domains-glue and guide., van Ham M, Hendriks W., Mol Biol Rep. 2003 Jun;30(2):69-82. * attachment:PDZ_Domains_Glue_and_Guide_2003.pdf * PDZ domains: structural modules for protein complex assembly., Hung AY, Sheng M., J Biol Chem. 2002 Feb 22;277(8):5699-702. Epub 2001 Dec 10. * attachment:PDZ_Domains_Structural_Modules_2001.pdf |
Table of Contents
Goals
- Computationally predict specificity of peptide recognition domain from the primary amino acid sequences
- Analyze PDZ, WW and then SH3 domains
Background
- [wiki:/PDZ PDZ Domains]
[wiki:/MachineLearning Machine Learning]
Strategy
- [wiki:/Strategy Strategy]
Ideas
- [wiki:/Ideas Ideas]
Data
- [wiki:/PDZData PDZ Data]
Experiments
- [wiki:/Experiments Experiments and Results]
Status
- [wiki:/Log Status]
Tools/Resources
[wiki:/ToolsResources Tools and Resources]
Related Literature
[http://www.connotea.org/rss/user/s2hui?download=view Literature List on Connotea]
[http://www.baderlab.org/DomainSpecificityPredictionProject/Reading Molecular Biology of the Cell]
Team
- Shirley Hui
- Gary Bader