2673
Comment:
|
5202
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Download annotation file based on Chembl v6. data[[attachment:Chembl_targets.txt]] | ---- == Sample Network based on synthetic lethal data for KRAS. == “KRAS_SL_genes_from_publications.txt” is a tab-delimited file that describes a synthetic lethal interaction network between KRAS and 116 genes that was derived from two publications. Download here [[attachment:KRAS_SL_genes_from_publications.txt]] |
Line 5: | Line 7: |
“Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. It contains 10 columns: | Mutations in the KRAS gene, a member of the Ras family of small GTPases, are frequently found pancreatic, thyroid, colon, lung and liver cancers and are correlated with poor prognosis. In this network nodes represent genes, and edges represent a synthetic lethal interaction based on published interactions. Genes that are syntheticly lethal with KRAS that have a known inhibitor available could represent potential and accessible theraputic targets for treatment of tumours with a KRAS mutation. |
Line 7: | Line 9: |
Uniprot ID: The Uniprot Id for the protein, as listed in Chembl. | It contains five columns: |
Line 9: | Line 11: |
Entrez Gene ID: The Entrez Gene ID for this protein based on the ID mapping service provided by the Uniprot website. | ''Screened Entrez Gene'': The KRAS entrez gene number. |
Line 11: | Line 13: |
Protein name from Chembl: A common name for this protein as listed by Chembl. | ''SL Entrez Gene'': The entrez gene number for the gene that was reported to be synthetically lethal with the KRAS gene. |
Line 13: | Line 15: |
Number of different compounds reported in Chembl: The number of distinct compounds that have a value of <1uM associated with the this protein at a Chembl confidence level of 5 or greater. As commercial supplies of particular compounds tend to vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein. | ''Screened Gene'': Gene symbol for KRAS |
Line 15: | Line 17: |
Number of publications reporting a compound protein interaction: Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with the protein. This number could be used as a proxy for how often novel compounds associated with this protein are identified. | ''SL Gene'': Gene symbol for the gene that was reported to be syntheticly lethal with the KRAS gene. |
Line 17: | Line 19: |
Smiles: The smiles string in this column represents the chemical structure for an example compound that has been annotated with this protein. The example compound was chosen as Chembl reports it has the highest number of distinct publications associating it with this protein. | ''Pubmed Link'': Link to the publication that reported this interaction. |
Line 19: | Line 21: |
InChIKey: InChIKey is an alternative chemical representation of the example compound that is optimized to search for chemical information using text based search engines, such as internet searches and database queries. | ---- == Annotation File Based on Chembl v6. data == “Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. Download here: [[attachment:Chembl_targets_with_named_compounds.txt]] |
Line 21: | Line 25: |
#Publications reporting interaction: This column contains the number of publications in Chembl in which the example compound has been associated with this protein. It could be used as a measure of how reliable this particular compound interaction is on the grounds that the more often the interaction is tested and reported the more reliable it is. | |
Line 23: | Line 26: |
Chembl link: A link to the Chembl webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds. | |
Line 25: | Line 27: |
Compound: A flag to identify which proteins/genes have been annotated in Chembl.This is useful for visualizing proteins/genes that have an associated compound | This sample file contains 14 columns: ''UniProt ID'': The UniProt ID for the protein, as listed in ChEMBL. ''Entrez Gene ID'': The Entrez Gene ID for this protein based on the ID mapping service provided by the UniProt website. ''Protein name from ChEMBL'': The common name for this protein as listed by ChEMBL. ''Number of different compounds reported in ChEMBL'': The number of distinct compounds that have a value of <1uM associated with this protein at a ChEMBL confidence level of 5 or greater. Because commercial availability of particular compounds can vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein. Additional availability information can be obtained from the ZINC database[15], a free database of commercially available compounds ''Number of publications reporting a compound protein interaction'': Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with this protein. This number could be used as a proxy for how often different binding studies on this protein are carried out. ''Top Named Compound'': Based on the criteria above, this is the compound in ChEMBL that has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL that have a name. Only ~4,000 compounds in ChEMBL (version 6) have a name associated with them. Names in ChEMBL are derived from a variety of sources including commonly used names, research codes, trade names, and unique nonproprietary names assigned to pharmaceuticals marketed in the United States). ''SMILES for example compound'': The SMILES string in this column represents the chemical structure for an example compound that has been reported in ChEMBL to interact with this protein. The example compound was chosen because it has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL. Note: this compound is not necessarily a compound with a name associated with it in ChEMBL. ''SMILES for named compound'': Same as above but for the named compound. ''InChIKey for example compound'': InChIKey is an alternative chemical representation of compounds that is optimized for searching for chemical information using text based search engines, such as Internet searches and database queries. InChIKey for named compound: Same as above but for the named compound. ''#Publications reporting interaction for example compound'': This column contains the number of publications in ChEMBL in which the example compound has been shown to interact with this protein. It could be used as a measure of how reliable this particular compound interaction is based on the assumption that the more often the interaction is tested and reported the more reliable it is. ''#Publications reporting interaction for named compound'': Same as above but for the named compound. ''ChEMBL link'': A link to the ChEMBL webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds. ''Compound'': A flag to identify which proteins/genes have been annotated in ChEMBL. This is useful for changing the visualization of the network in Cytoscape based on the presence or absence of compound information for a particular node. |
This page details how to visualize a biological network (such as a gene-gene interaction network) with associated chemical information
Sample Network based on synthetic lethal data for KRAS.
“KRAS_SL_genes_from_publications.txt” is a tab-delimited file that describes a synthetic lethal interaction network between KRAS and 116 genes that was derived from two publications. Download here KRAS_SL_genes_from_publications.txt
Mutations in the KRAS gene, a member of the Ras family of small GTPases, are frequently found pancreatic, thyroid, colon, lung and liver cancers and are correlated with poor prognosis. In this network nodes represent genes, and edges represent a synthetic lethal interaction based on published interactions. Genes that are syntheticly lethal with KRAS that have a known inhibitor available could represent potential and accessible theraputic targets for treatment of tumours with a KRAS mutation.
It contains five columns:
Screened Entrez Gene: The KRAS entrez gene number.
SL Entrez Gene: The entrez gene number for the gene that was reported to be synthetically lethal with the KRAS gene.
Screened Gene: Gene symbol for KRAS
SL Gene: Gene symbol for the gene that was reported to be syntheticly lethal with the KRAS gene.
Pubmed Link: Link to the publication that reported this interaction.
Annotation File Based on Chembl v6. data
“Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. Download here: Chembl_targets_with_named_compounds.txt
This sample file contains 14 columns:
UniProt ID: The UniProt ID for the protein, as listed in ChEMBL.
Entrez Gene ID: The Entrez Gene ID for this protein based on the ID mapping service provided by the UniProt website.
Protein name from ChEMBL: The common name for this protein as listed by ChEMBL.
Number of different compounds reported in ChEMBL: The number of distinct compounds that have a value of <1uM associated with this protein at a ChEMBL confidence level of 5 or greater. Because commercial availability of particular compounds can vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein. Additional availability information can be obtained from the ZINC database[15], a free database of commercially available compounds
Number of publications reporting a compound protein interaction: Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with this protein. This number could be used as a proxy for how often different binding studies on this protein are carried out.
Top Named Compound: Based on the criteria above, this is the compound in ChEMBL that has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL that have a name. Only ~4,000 compounds in ChEMBL (version 6) have a name associated with them. Names in ChEMBL are derived from a variety of sources including commonly used names, research codes, trade names, and unique nonproprietary names assigned to pharmaceuticals marketed in the United States).
SMILES for example compound: The SMILES string in this column represents the chemical structure for an example compound that has been reported in ChEMBL to interact with this protein. The example compound was chosen because it has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL. Note: this compound is not necessarily a compound with a name associated with it in ChEMBL.
SMILES for named compound: Same as above but for the named compound.
InChIKey for example compound: InChIKey is an alternative chemical representation of compounds that is optimized for searching for chemical information using text based search engines, such as Internet searches and database queries.
InChIKey for named compound: Same as above but for the named compound.
#Publications reporting interaction for example compound: This column contains the number of publications in ChEMBL in which the example compound has been shown to interact with this protein. It could be used as a measure of how reliable this particular compound interaction is based on the assumption that the more often the interaction is tested and reported the more reliable it is.
#Publications reporting interaction for named compound: Same as above but for the named compound.
ChEMBL link: A link to the ChEMBL webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds.
Compound: A flag to identify which proteins/genes have been annotated in ChEMBL. This is useful for changing the visualization of the network in Cytoscape based on the presence or absence of compound information for a particular node.