5819
Comment:
|
5831
|
Deletions are marked like this. | Additions are marked like this. |
Line 7: | Line 7: |
''A Specificity Map for the PDZ Domain Family'' | ''A Specificity Map for the PDZ Domain Family'' (submitted) |
A Specificity Map for the PDZ Domain Family
This data is supplementary material for:
A Specificity Map for the PDZ Domain Family (submitted)
Raffi Tonikian5,8, Yingnan Zhang1,8, Stephen L. Sazinsky7, Bridget Currell2, Jung-Hua Yeh3, Boris Reva6, Heike A. Held1, Brent A. Appleton1, Marie Evangelista2, Yan Wu4, Xiaofeng Xin5, Andrew C. Chan3, Somasekar Seshagiri2, Laurence A. Lasky1, Chris Sander6, Charles Boone5,*, Gary D. Bader5+,6,* & Sachdev S. Sidhu1,*
Departments of 1Protein Engineering, 2Molecular Biology, 3Immunology, 4Antibody Engineering, Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, USA
5 Banting and Best Department of Medical Research and the Department of Molecular Genetics, University of Toronto, Donnelly CCBR, 160 College Street, Toronto, Ontario M5S 3E1, Canada
6 Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY 10021,
7Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, E19-563, Cambridge, MA 02139, USA
8These authors contributed equally to this work
+Current Address
PDZ Data
- attachment:HumanPDZ.zip
- attachment:HumanErbinMutantPDZ.zip
- attachment:WormPDZ.zip
This data has been submitted to the [http://mint.bio.uniroma2.it/domino DOMINO (MINT)] and [http://icb.med.cornell.edu/services/pdz/start PDZBase] databases.
File Format
Each zip file contains a set of peptide files as shown in the example below. A peptide file describes a protein containing a specific domain, and provides known peptide ligands of this domain obtained by an experimental technique. These files can be directly loaded into the [http://www.baderlab.org/Software/LOLA LOLA software], which can be used to visualize peptide sets as sequence logos and to create Logo Trees, or the [http://baderlab.org/Software/BRAIN BRAIN software], which is a Cytoscape plugin that can be used to search protein sequence databases for proteins that contain sequence patterns matching a set of peptides.
The peptide file consists of a Header Section that describes the protein and domain sequence, and a Peptide Section that lists and describes the peptide ligands.
Example:
Gene Name DLG1 Accession Refseq:NP_004078 Organism Homo Sapiens (Human) NCBITaxonomyID 9606 Domain Number 3 Domain Type PDZ Interpro ID IPR001478 Technique Phage Display High Valency Domain sequence KVVLHRGSTGLGFNIVGGEDGEGIFISFILAGGPADLSGELRKGDRIISVNSVDLRAASHEQAAAALKNAGQAVTIVA Domain Range 466-525 Comment PeptideName Peptide CloneFrequency QuantData ExternalIdentifier 1 XLHFWRESSV 66 2 XXRLWKQTSL 3 3 ILKIWRETSL 3 4 KRTIWRETSL 2 A KNLRSNSMLG 2 6 HLKFWRSTRV 2 7 AHSKWRSTSV 2 8 XXXHRRETTV 1 9 VISRWRQTSL 1 10 TTWLGRQTRV 1 11 SRSSYRETSV 1 12 XXXSRRETSV 1 13 RLFRYRETSL 1 B PIRKRWTMTL 1 15 XXXNHRETSV 1 16 KIVRWKNTSV 1 17 KHRTWYETSV 1 18 XXXXFKQTSV 1 19 ARPKWRTTRV 1 20 ALPRRRETSV 1
Header Section
Describes the protein, domain, and experiment. Required fields are indicated with a *.
NOTE: This section is in a 2 column format. Field names must be separated from their values with a single TAB character. Multiple TABs, or spaces, are not accepted.
Gene Name:* An identifier that represents the gene or protein sequence. Not required to be unique.
Accession:* A space-separated list of database accession identifiers for the protein or corresponding gene.
Organism: Description of taxon of the protein.
NCBITaxonomyID*: Taxon identifier from NCBI's Taxonomy repository.
Domain Number*: A number that represents the position of the domain sequence within the protein. For proteins containing multiple instances of the domain, this number helps distinguish the position of these instances. Set to "0" if instance information is not known.
Domain Type:* The formal name of the domain, e.g. WW, PDZ, SH3.
Interpro ID: The Interpro database identifier for the domain.
Technique: The experimental method used to identify potential ligands of the protein.
Domain Sequence:* The amino-acid sequence of the domain region.
Domain Range: The amino-acid position range for the domain region within the protein.
Comment: Notes, additional information, personal comments pertaining to this file.
Peptide Section
Describes the experimentally determined peptide ligands. The peptide sequences must be in multiple alignment format. The sequences should contain no gaps, and should be padded with the X symbol on both sides, where required, such that all sequences have identical length.
NOTE: This section is in a 5-column format. Column headers and values must be separated with a single TAB character. Multiple TABs or spaces are not accepted.
Required fields are indicated with a *.
PeptideName:* A unique numerical symbol assigned to each peptide ligand. To omit a peptide, set to a non-numeric value (e.g. "A"). Values in this column must be unique.
Peptide:* The peptide ligand sequence.
CloneFrequency: Applies only to phage display data: the observed frequency of the peptide in the cloning step.
QuantData: A number that relatively or absolutely quantifies the protein-ligand interaction. E.g. The optical density (OD) from a protein chip experiment.
ExternalIdentifier: A database identifier for the peptide.