Size: 203
Comment:
|
Size: 5183
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
* Predict specificity of peptide recognition domain from the primary amino acid sequence. * Analyze PDZ, WW and then SH3 domains |
|
Line 8: | Line 10: |
* [wiki:/Log Status Log] | |
Line 11: | Line 14: |
1. Learn SVN, Brain code (ResidueResidueCorrelation) 1. Literature review related to domain specificity (background activity) 1. Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary) 1. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results. 1. Implement new feature: amino acid groups (learn amino acid groups) + run on PDZ data 1. Think about new PDZ domain features that can be used for prediction. == Ideas == * Use of structural data (PDZ domain structures) (may require homology modeling) * Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model) * Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis |
|
Line 12: | Line 27: |
* Shirley Hui * Gary Bader |
|
Line 13: | Line 30: |
== Documents == | == Tools/Resources == === Databases === * [http://www.ensembl.org/ Ensembl] * Software system which produces and maintains automatic annotation on selected eukaryotic genomes. * [http://www.ebi.ac.uk/interpro/ InterPro] * Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. * [http://www.biomart.org/ BioMart] * Query-oriented data management system that simplifies the task of creation and maintenance of advanced query interfaces backed by a relational database. It is particularly suited for providing the 'data mining' like searches of complex descriptive (e.g. biological) data. === Sequence Alignment === ==== Multiple ==== * [http://www.drive5.com/muscle/ Muscle] * MUltiple Sequence Comparison by Log-Expectation. * [http://www.ebi.ac.uk/clustalw/ ClustalW] === Viewers === * [http://www.jalview.org/ JalView] * Multiple alignment viewer/editor written in Java |
Line 17: | Line 53: |
=== More General === * The Human and Mouse Complement of SH2 Domain Proteins-Establishing the Boundaries of Phosphotyrosine Signaling, Liu BA, Jablonowski K, Raina M, Arce M, Pawson T and Nash PD, Mol Cell, 2006 Jun 23; 22(6):851-68. * attachment:Human_mouse_complement_of_SH2_domain_proteins_Liu_2006.pdf * Domains, motifs, and scaffolds: the role of modular interactions in the evolution and wiring of cell signaling circuits, Bhattacharyya RP, Remenyi A, Yeh BJ, Lim WA., Annu Rev Biochem. 2006;75:655-80. * attachment:Domains_motifs_scaffolds_Bhattacharyya_et_al_2006.pdf * The Structure and Function of Proline Recognition Domains, Zarrinpar A, Bhattacharyya RP, Lim WA., Sci STKE. 2003 Apr 22;2003(179):RE8. * attachment:Structure_Function_Pro_Recog_Domains_Zarrinpar_et_al_2003.pdf === Specificity Prediction/Inference === * Ab initio prediction of transcription factor targets using structural knowledge, Kaplan T, Friedman N, Margalit H, PLoS Comput Biol. 2005 Jun;1(1):e1. Epub 2005 Jun 24. * attachment:Ab_Initio_Prediction_Transcription_Factor_Targets_Using_Structural_Knowlegde_Kaplan_2005.pdf * Specificity and robustness in transcription control networks, Sengupta AM, Djordjevic M, Shraiman BI, Proc Natl Acad Sci U S A. 2002 Feb 19;99(4):2072-7. * attachment:Specificity_robustness_transcription_control_networks_Sengupta_2002.pdf * Can we infer peptide recognition specificity mediated by SH3 domains?, Cesareni G, Panni S, Nardelli G, Castagnoli L., FEBS Lett. 2002 Feb 20;513(1):38-44. * attachment:Can_we_infer_PR_specificity_med_by_SH3_Cesareni_et_al_2002.pdf === Amino Acid Alphabets === * Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices, Cannata N, Toppo S, Romualdi C, Valle G, Bioinformatics. 2002 Aug;18(8):1102-8. * attachment:Simplifying_AA_alphabets_branch_bound_substit_matrices_Cannata_2002.pdf * Simplified amino acid alphabets for protein fold recognition and implications for folding, Murphy LR, Wallqvist A, Levy RM, Protein Eng. 2000 Mar;13(3):149-52. * attachment:Simplified_AA_alphabets_Murphy_2000.pdf * Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases, Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM, Bioinformatics. 2000 Nov;16(11):988-1002. * attachment:Iterative_structure_search_for_protein_homologs_Wallqvist_2000.pdf === PDZ Related === * PDZ domains-glue and guide., van Ham M, Hendriks W., Mol Biol Rep. 2003 Jun;30(2):69-82. * attachment:PDZ_Domains_Glue_and_Guide_2003.pdf * PDZ domains: structural modules for protein complex assembly., Hung AY, Sheng M., J Biol Chem. 2002 Feb 22;277(8):5699-702. Epub 2001 Dec 10. * attachment:PDZ_Domains_Structural_Modules_2001.pdf |
Goals
- Predict specificity of peptide recognition domain from the primary amino acid sequence.
- Analyze PDZ, WW and then SH3 domains
Strategy
Status
- [wiki:/Log Status Log]
Tasks
Learn SVN, Brain code (ResidueResidueCorrelation)
- Literature review related to domain specificity (background activity)
Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary)
- Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results.
- Implement new feature: amino acid groups (learn amino acid groups) + run on PDZ data
- Think about new PDZ domain features that can be used for prediction.
Ideas
- Use of structural data (PDZ domain structures) (may require homology modeling)
- Use of machine learning methods (SVM for classification and boosting decision tree for interpretable learning model)
- Analysis of correlation within domain and peptide (inter-residue correlation) maybe correspondence analysis
Team
- Shirley Hui
- Gary Bader
Tools/Resources
Databases
[http://www.ensembl.org/ Ensembl]
- Software system which produces and maintains automatic annotation on selected eukaryotic genomes.
[http://www.ebi.ac.uk/interpro/ InterPro]
- Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
[http://www.biomart.org/ BioMart]
- Query-oriented data management system that simplifies the task of creation and maintenance of advanced query interfaces backed by a relational database. It is particularly suited for providing the 'data mining' like searches of complex descriptive (e.g. biological) data.
Sequence Alignment
Multiple
[http://www.drive5.com/muscle/ Muscle]
- MUltiple Sequence Comparison by Log-Expectation.
[http://www.ebi.ac.uk/clustalw/ ClustalW]
Viewers
[http://www.jalview.org/ JalView]
- Multiple alignment viewer/editor written in Java
Background Literature
More General
- The Human and Mouse Complement of SH2 Domain Proteins-Establishing the Boundaries of Phosphotyrosine Signaling, Liu BA, Jablonowski K, Raina M, Arce M, Pawson T and Nash PD, Mol Cell, 2006 Jun 23; 22(6):851-68.
- attachment:Human_mouse_complement_of_SH2_domain_proteins_Liu_2006.pdf
- Domains, motifs, and scaffolds: the role of modular interactions in the evolution and wiring of cell signaling circuits, Bhattacharyya RP, Remenyi A, Yeh BJ, Lim WA., Annu Rev Biochem. 2006;75:655-80.
- attachment:Domains_motifs_scaffolds_Bhattacharyya_et_al_2006.pdf
- The Structure and Function of Proline Recognition Domains, Zarrinpar A, Bhattacharyya RP, Lim WA., Sci STKE. 2003 Apr 22;2003(179):RE8.
- attachment:Structure_Function_Pro_Recog_Domains_Zarrinpar_et_al_2003.pdf
Specificity Prediction/Inference
- Ab initio prediction of transcription factor targets using structural knowledge, Kaplan T, Friedman N, Margalit H, PLoS Comput Biol. 2005 Jun;1(1):e1. Epub 2005 Jun 24.
- attachment:Ab_Initio_Prediction_Transcription_Factor_Targets_Using_Structural_Knowlegde_Kaplan_2005.pdf
- Specificity and robustness in transcription control networks, Sengupta AM, Djordjevic M, Shraiman BI, Proc Natl Acad Sci U S A. 2002 Feb 19;99(4):2072-7.
- attachment:Specificity_robustness_transcription_control_networks_Sengupta_2002.pdf
- Can we infer peptide recognition specificity mediated by SH3 domains?, Cesareni G, Panni S, Nardelli G, Castagnoli L., FEBS Lett. 2002 Feb 20;513(1):38-44.
- attachment:Can_we_infer_PR_specificity_med_by_SH3_Cesareni_et_al_2002.pdf
Amino Acid Alphabets
- Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices, Cannata N, Toppo S, Romualdi C, Valle G, Bioinformatics. 2002 Aug;18(8):1102-8.
- attachment:Simplifying_AA_alphabets_branch_bound_substit_matrices_Cannata_2002.pdf
- Simplified amino acid alphabets for protein fold recognition and implications for folding, Murphy LR, Wallqvist A, Levy RM, Protein Eng. 2000 Mar;13(3):149-52.
- attachment:Simplified_AA_alphabets_Murphy_2000.pdf
- Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases, Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM, Bioinformatics. 2000 Nov;16(11):988-1002.
- attachment:Iterative_structure_search_for_protein_homologs_Wallqvist_2000.pdf
PDZ Related
- PDZ domains-glue and guide., van Ham M, Hendriks W., Mol Biol Rep. 2003 Jun;30(2):69-82.
- attachment:PDZ_Domains_Glue_and_Guide_2003.pdf
- PDZ domains: structural modules for protein complex assembly., Hung AY, Sheng M., J Biol Chem. 2002 Feb 22;277(8):5699-702. Epub 2001 Dec 10.
- attachment:PDZ_Domains_Structural_Modules_2001.pdf