VisualizingChemicalInformation - Bader Lab @ The University of Toronto

This page details how to visualize a biological network (such as a gene-gene interaction network) with associated chemical information

Sample Network based on synthetic lethal data for KRAS.

“KRAS_SL_genes_from_publications.txt” is a tab-delimited file that describes a synthetic lethal interaction network between KRAS and 116 genes that was derived from two publications. Download here KRAS_SL_genes_from_publications.txt

Mutations in the KRAS gene, a member of the Ras family of small GTPases, are frequently found pancreatic, thyroid, colon, lung and liver cancers and are correlated with poor prognosis. In this network nodes represent genes, and edges represent a synthetic lethal interaction based on published interactions. Genes that are syntheticly lethal with KRAS that have a known inhibitor available could represent potential and accessible theraputic targets for treatment of tumours with a KRAS mutation.

It contains five columns:

Screened Entrez Gene: The KRAS entrez gene number.

SL Entrez Gene: The entrez gene number for the gene that was reported to be synthetically lethal with the KRAS gene.

Screened Gene: Gene symbol for KRAS

SL Gene: Gene symbol for the gene that was reported to be syntheticly lethal with the KRAS gene.

Pubmed Link: Link to the publication that reported this interaction.

Annotation File Based on Chembl v6. data

“Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. Download here: Chembl_targets_with_named_compounds.txt

This sample file contains 14 columns:

UniProt ID: The UniProt ID for the protein, as listed in ChEMBL.

Entrez Gene ID: The Entrez Gene ID for this protein based on the ID mapping service provided by the UniProt website.

Protein name from ChEMBL: The common name for this protein as listed by ChEMBL.

Number of different compounds reported in ChEMBL: The number of distinct compounds that have a value of <1uM associated with this protein at a ChEMBL confidence level of 5 or greater. Because commercial availability of particular compounds can vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein. Additional availability information can be obtained from the ZINC database[15], a free database of commercially available compounds

Number of publications reporting a compound protein interaction: Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with this protein. This number could be used as a proxy for how often different binding studies on this protein are carried out.

Top Named Compound: Based on the criteria above, this is the compound in ChEMBL that has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL that have a name. Only ~4,000 compounds in ChEMBL (version 6) have a name associated with them. Names in ChEMBL are derived from a variety of sources including commonly used names, research codes, trade names, and unique nonproprietary names assigned to pharmaceuticals marketed in the United States).

SMILES for example compound: The SMILES string in this column represents the chemical structure for an example compound that has been reported in ChEMBL to interact with this protein. The example compound was chosen because it has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL. Note: this compound is not necessarily a compound with a name associated with it in ChEMBL.

SMILES for named compound: Same as above but for the named compound.

InChIKey for example compound: InChIKey is an alternative chemical representation of compounds that is optimized for searching for chemical information using text based search engines, such as Internet searches and database queries.

InChIKey for named compound: Same as above but for the named compound.

#Publications reporting interaction for example compound: This column contains the number of publications in ChEMBL in which the example compound has been shown to interact with this protein. It could be used as a measure of how reliable this particular compound interaction is based on the assumption that the more often the interaction is tested and reported the more reliable it is.

#Publications reporting interaction for named compound: Same as above but for the named compound.

ChEMBL link: A link to the ChEMBL webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds.

Compound: A flag to identify which proteins/genes have been annotated in ChEMBL. This is useful for changing the visualization of the network in Cytoscape based on the presence or absence of compound information for a particular node.