Diff for "VisualizingChemicalInformation" - Bader Lab @ The University of Toronto

Differences between revisions 5 and 11 (spanning 6 versions)

This page details how to visualize a biological network (such as a gene-gene interaction network) with associated chemical information

Sample Network based on synthetic lethal data for KRAS.

“KRAS_SL_genes_from_publications.txt” is a tab-delimited file that describes a synthetic lethal interaction network between KRAS and 116 genes that was derived from two publications. Download here KRAS_SL_genes_from_publications.txt

Mutations in the KRAS gene, a member of the Ras family of small GTPases, are frequently found pancreatic, thyroid, colon, lung and liver cancers and are correlated with poor prognosis. In this network nodes represent genes, and edges represent a synthetic lethal interaction based on published interactions. Genes that are syntheticly lethal with KRAS that have a known inhibitor available could represent potential and accessible theraputic targets for treatment of tumours with a KRAS mutation.

It contains five columns:

Screened Entrez Gene: The KRAS entrez gene number.

SL Entrez Gene: The entrez gene number for the gene that was reported to be synthetically lethal with the KRAS gene.

Screened Gene: Gene symbol for KRAS

SL Gene: Gene symbol for the gene that was reported to be syntheticly lethal with the KRAS gene.

Pubmed Link: Link to the publication that reported this interaction.

Annotation File Based on Chembl v6. data

“Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. Download here: Chembl_targets_with_named_compounds.txt

This sample file contains 14 columns:

UniProt ID: The UniProt ID for the protein, as listed in ChEMBL.

Entrez Gene ID: The Entrez Gene ID for this protein based on the ID mapping service provided by the UniProt website.

Protein name from ChEMBL: The common name for this protein as listed by ChEMBL.

Number of different compounds reported in ChEMBL: The number of distinct compounds that have a value of <1uM associated with this protein at a ChEMBL confidence level of 5 or greater. Because commercial availability of particular compounds can vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein. Additional availability information can be obtained from the ZINC database[15], a free database of commercially available compounds

Number of publications reporting a compound protein interaction: Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with this protein. This number could be used as a proxy for how often different binding studies on this protein are carried out.

Top Named Compound: Based on the criteria above, this is the compound in ChEMBL that has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL that have a name. Only ~4,000 compounds in ChEMBL (version 6) have a name associated with them. Names in ChEMBL are derived from a variety of sources including commonly used names, research codes, trade names, and unique nonproprietary names assigned to pharmaceuticals marketed in the United States).

SMILES for example compound: The SMILES string in this column represents the chemical structure for an example compound that has been reported in ChEMBL to interact with this protein. The example compound was chosen because it has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL. Note: this compound is not necessarily a compound with a name associated with it in ChEMBL.

SMILES for named compound: Same as above but for the named compound.

InChIKey for example compound: InChIKey is an alternative chemical representation of compounds that is optimized for searching for chemical information using text based search engines, such as Internet searches and database queries.

InChIKey for named compound: Same as above but for the named compound.

#Publications reporting interaction for example compound: This column contains the number of publications in ChEMBL in which the example compound has been shown to interact with this protein. It could be used as a measure of how reliable this particular compound interaction is based on the assumption that the more often the interaction is tested and reported the more reliable it is.

#Publications reporting interaction for named compound: Same as above but for the named compound.

ChEMBL link: A link to the ChEMBL webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds.

Compound: A flag to identify which proteins/genes have been annotated in ChEMBL. This is useful for changing the visualization of the network in Cytoscape based on the presence or absence of compound information for a particular node.

-  ⇤ ← Revision 5 as of 2010-10-17 21:53:11 → 
  Size: 3963
  Editor: IainWallace
  Comment:
+   ← Revision 11 as of 2011-07-27 15:03:23 → ⇥
  Size: 5259
  Editor: IainWallace
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
+#acl  IainWallace:read,write,delete,revert All:read
-Line 3:
+Line 6:
-Sample Network based on synthetic lethal data for KRAS.
+----
== Sample Network based on synthetic lethal data for KRAS. ==
“KRAS_SL_genes_from_publications.txt” is a tab-delimited file that describes a synthetic lethal interaction network between KRAS and 116 genes that was derived from two publications. Download here [[attachment:KRAS_SL_genes_from_publications.txt]]
-Line 5:
+Line 10:
-     “KRAS_SL_genes_from_publications.txt” is a tab-delimited file that describes a synthetic lethal interaction network between KRAS and 116 genes that was derived from two publications.

   Mutations in the KRAS gene, a member of the Ras family of small GTPases, are frequently found pancreatic, thyroid, colon, lung and liver cancers and are correlated with poor prognosis.  In this network nodes represent genes, and edges represent a synthetic lethal interaction based on published interactions. Genes that are syntheticly lethal with KRAS that have a known inhibitor available could represent potential and accessible theraputic targets for treatment of tumours with a KRAS mutation.
+Mutations in the KRAS gene, a member of the Ras family of small GTPases, are frequently found pancreatic, thyroid, colon, lung and liver cancers and are correlated with poor prognosis.  In this network nodes represent genes, and edges represent a synthetic lethal interaction based on published interactions. Genes that are syntheticly lethal with KRAS that have a known inhibitor available could represent potential and accessible theraputic targets for treatment of tumours with a KRAS mutation.
-Line 11:
+Line 14:
-  ''Screened Entrez Gene'': The KRAS entrez gene number.
+''Screened Entrez Gene'': The KRAS entrez gene number.
-Line 13:
+Line 16:
-  ''SL Entrez Gene'': The entrez gene number for the gene that was reported to be synthetically lethal with the KRAS gene.
+''SL Entrez Gene'': The entrez gene number for the gene that was reported to be synthetically lethal with the KRAS gene.
-Line 15:
+Line 18:
-  ''Screened Gene'': Gene symbol for KRAS
+''Screened Gene'': Gene symbol for KRAS
-Line 17:
+Line 20:
-  ''SL Gene'': Gene symbol for the gene that was reported to be syntheticly lethal with the KRAS gene.
+''SL Gene'': Gene symbol for the gene that was reported to be syntheticly lethal with the KRAS gene.
-Line 19:
+Line 22:
-  ''Pubmed Link'': Link to the publication that reported this interaction.
+''Pubmed Link'': Link to the publication that reported this interaction.

----
== Annotation File Based on Chembl v6. data ==
“Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. Download here: [[attachment:Chembl_targets_with_named_compounds.txt]]
-Line 23:
+Line 30:
-Annotation File Based on Chembl v6. data[[attachment:Chembl_targets.txt|]]
+This sample file contains 14 columns:
-Line 25:
+Line 32:
-“Chembl_targets.txt” is a tab-delimited file describing 2104 proteins from a variety of species that are reported in the Chembl database (v6) to have at least one published, high-confidence, potent interaction with a compound. Download here: [[attachment:Chembl_targets.txt]]
+''UniProt ID'': The UniProt ID for the protein, as listed in ChEMBL.
-Line 27:
+Line 34:
-It contains 10 columns:
+''Entrez Gene ID'': The Entrez Gene ID for this protein based on the ID mapping service provided by the UniProt website.
-Line 29:
+Line 36:
-''Uniprot ID'': The Uniprot Id for the protein, as listed in Chembl.
+''Protein name from ChEMBL'': The common name for this protein as listed by ChEMBL.
-Line 31:
+Line 38:
-''Entrez Gene ID'': The Entrez Gene ID for this protein based on the ID mapping service provided by the Uniprot website.
+ ''Number of different compounds reported in ChEMBL'': The number of distinct compounds that have a value of <1uM associated with this protein at a ChEMBL confidence level of 5 or greater.  Because commercial availability of particular compounds can vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein. Additional availability information can be obtained from the ZINC database[15], a free database of commercially available compounds
-Line 33:
+Line 40:
-''Protein name from Chembl'': A common name for this protein as listed by Chembl.
+''Number of publications reporting a compound protein interaction'': Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with this protein. This number could be used as a proxy for how often different binding studies on this protein are carried out.
-Line 35:
+Line 42:
-''Number of different compounds reported in Chembl'': The number of distinct compounds that have a value of <1uM associated with the this protein at a Chembl confidence level of 5 or greater.  As commercial supplies of particular compounds tend to vary over time, this number could be used as an estimate of how easy it will be to source a compound to inhibit this protein.
+''Top Named Compound'': Based on the criteria above, this is the compound in ChEMBL that has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL that have a name. Only ~4,000 compounds in ChEMBL (version 6) have a name associated with them. Names in ChEMBL are derived from a variety of sources including commonly used names, research codes, trade names, and  unique nonproprietary names assigned to pharmaceuticals marketed in the United States).
-Line 37:
+Line 44:
-''Number of publications reporting a compound protein interaction'': Based on the same criteria as above, this is the number of distinct publications that report at least one compound with an interaction with the protein. This number could be used as a proxy for how often novel compounds associated with this protein are identified.
+''SMILES for example compound'': The SMILES string in this column represents the chemical structure for an example compound that has been reported in ChEMBL to interact with this protein. The example compound was chosen because it has the most publications demonstrating an interaction with this protein out of all the compounds in ChEMBL. Note: this compound is not necessarily a compound with a name associated with it in ChEMBL.
-Line 39:
+Line 46:
-''Smiles'': The smiles string in this column represents the chemical structure for an example compound that has been annotated with this protein. The example compound was chosen as Chembl reports it has the highest number of distinct publications associating it with this protein.
+''SMILES for named compound'': Same as above but for the named compound.
-Line 41:
+Line 48:
-''InChIKey'': InChIKey is an alternative chemical representation of the example compound that is optimized to search for chemical information using text based search engines, such as internet searches and database queries.
+''InChIKey for example compound'': InChIKey is an alternative chemical representation of compounds that is optimized for searching for chemical information using text based search engines, such as Internet searches and database queries.
-Line 43:
+Line 50:
-''#Publications reporting interaction'': This column contains the number of publications in Chembl in which the example compound has been associated with this protein. It could be used as a measure of how reliable this particular compound interaction is on the grounds that the more often the interaction is tested and reported the more reliable it is.
+InChIKey for named compound: Same as above but for the named compound.
-Line 45:
+Line 52:
-''Chembl link'': A link to the Chembl webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds.
+''#Publications reporting interaction for example compound'': This column contains the number of publications in ChEMBL in which the example compound has been shown to interact with this protein. It could be used as a measure of how reliable this particular compound interaction is based on the assumption that the more often the interaction is tested and reported the more reliable it is.
-Line 47:
+Line 54:
-''Compound'': A flag to identify which proteins/genes have been annotated in Chembl.This is useful for visualizing proteins/genes that have an associated compound
+''#Publications reporting interaction for named compound'': Same as above but for the named compound.

''ChEMBL link'': A link to the ChEMBL webpage for this protein. The webpage contains a wide variety of additional information regarding the reported interaction of this protein with different compounds.

''Compound'': A flag to identify which proteins/genes have been annotated in ChEMBL. This is useful for changing the visualization of the network in Cytoscape based on the presence or absence of compound information for a particular node.