Enrichment Map GREAT Tutorial

Contents

Enrichment Map GREAT Tutorial
1. Outline
2. Instructions

Outline

This quick tutorial will guide you through the generation of an Enrichment Map for an analysis performed using Genomic Region Enrichment Annotation Tool (GREAT),

To run this tutorial:

You need to have Cytoscape installed : minimally 3.1.0 must be installed but preferable to have the latest version of Cytoscape (e.g. version 3.1.1)
Install the Enrichment Map App from the Cytoscape App manager in cytoscape 3.
You need to download the test data: GREATTutorial.zip

Description of the tutorial files contained in the GREATTutorial folder:

TestRegions_ForGREAT.bed : Example GREAT genomic region input file.
GreatExprotAll.tsv : Example of download GREAT output file.
20140919-public-2.0.2-3vD5MB-hg19-all-gene.txt : Example downloaded GREAT gene to region association file.
!geneToRegionExpressionFile.txt : transformed gene to region association file downloaded from GREAT.

Instructions

Step 1: Generate GREAT output files

Screenshot Great Export results

GO to GREAT website - http://great.stanford.edu/great/public/html/
Select Species Assembly associated with your data. For this tutorial select Human: GRCh37
In Test regions click on Choose File
Navigate to files provided and select TestRegions_forGreat.bed
Click on Submit
Once the results page has loaded download all the results - in the Global controls heading click on the down arrow next to Global Export
Select All data as tsv - greatExportAll.tsv will automatically be downloaded to your default Downloads directory. --> This is the file you can use in Enrichment Map (Dataset 1 or 2:Enrichment Results)

Step 1B: Generate Gene to region association file [Optional]

{Optional} - Download the Gene-to-region used by GREAT and modify it to be used in EM as an expression file.

In the Global controls heading click on the down arrow next to Global Export
Select view all region-gene associations
Next to Gene->genomic region association table [The table on the right hand side of the page] click on Download table as text.
File will automatically downloaded into your default Downloads directory (file name is similar to DATE-public-2.0.2-3vD5MB-hg19-all-gene.txt where DATE is the date of download. Name will also change depending on the version of GREAT and genome selected).
Open the downloaded file in Excel.
Add a row to the top of the file.
In the first column enter "Name", and in the second column enter "Description"

Stpe 2: Generate Enrichment Map with GREAT Output

Screenshot Great Input Panel

Open Cytoscape
Click on Apps / Enrichment Maps / Load Enrichment Results
Make sure the Analysis Type is set to DAVID/BiNGO/GREAT
Please select the following files by clicking on the respective (...) button and selecting the file in the Dialog:
- NO GMT file is required for GREAT Analysis
- Dataset 1 / Expression: !geneToRegionExpressionFile.txt (OPTIONAL)
- Dataset 1 / Enrichments: !GreatExportAll.tsv
Tune Parameters
- P-value cut-off 0.001
- Q-value cut-off 0.05
- Check Jaccard Coefficient
  - Jaccard coefficient cut-off 0.25
Build Enrichment Map

Stpe 3: Filtering GREAT results

Screenshot Great filters

Once the network starts to build a dialog will pop up asking you how you would like to filter the GREAT results. There are four options:
1. Use Hypergeometric test p-values and FDR only --> Hypergeometric
2. Use Binomial test p-values and FDR only. --> Binomial
3. Use both hypergeometric and binomial test p-values and FDR. Enrichment result must pass threshold for both tests. --> Both
4. Enrichment result must pass one of the above tests to be included in the results --> Either
Select Both

Step 4: Examining Results

GREAT EM Result
Legend:

Node (inner circle) size corresponds to the number of genes in dataset 1 within the geneset
Colour of the node (inner circle) corresponds to the significance of the geneset for dataset 1.
Edge size corresponds to the number of genes that overlap between the two connected genesets.

-  ⇤ ← Revision 1 as of 2014-09-19 15:46:22 → 
  Size: 5030
  Editor: RuthIsserlin
  Comment:
+   ← Revision 7 as of 2014-09-19 17:42:32 → ⇥
  Size: 4926
  Editor: RuthIsserlin
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 16:
-    * TestRegions_ForGREAT.bed : Example GREAT binary genomic region input file.
    * GreatExprotAll.tsv : Example of download GREAT output file.
+    * !TestRegions_ForGREAT.bed : Example GREAT genomic region input file.
    * !GreatExprotAll.tsv : Example of download GREAT output file.
 Line 19:
+    * !geneToRegionExpressionFile.txt : transformed gene to region association file downloaded from GREAT.
 Line 23:
+{{attachment:Great_exportall.png|Screenshot Great Export results|align="right"}}
-Line 31:
+Line 32:
+=== Step 1B: Generate Gene to region association file [Optional] ===
-Line 36:
+Line 38:
+.  Open the downloaded file in Excel.
  1.  Add a row to the top of the file.
  1.  In the first column enter "Name", and in the second column enter "Description"
-Line 45:
+Line 51:
-      * Dataset 1 / Expression: `GeneToRegion.txt` (OPTIONAL)
      * Dataset 1 / Enrichments: `GreatExportAll.tsv`
+      * Dataset 1 / Expression: `!geneToRegionExpressionFile.txt` (OPTIONAL)
      * Dataset 1 / Enrichments: `!GreatExportAll.tsv`
-Line 50:
+Line 56:
-      * Check Overlap Coefficient
         * Overlap coefficient cut-off `0.6`
+      * Check Jaccard Coefficient
         * Jaccard coefficient cut-off `0.25`
-Line 53:
+Line 59:
+<<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>><<BR>>
=== Stpe 3: Filtering GREAT results ===
{{attachment:Great_filtering.png|Screenshot Great filters|align="right"}}
-Line 54:
+Line 63:
-. Hypergeometric test p-values and FDR.
      1. Binomial test p-values and FDR
      1. Both
      1. Either
+. Use Hypergeometric test p-values and FDR only --> ''Hypergeometric''
      1. Use Binomial test p-values and FDR only. --> ''Binomial''
      1. Use both hypergeometric and binomial test p-values and FDR.  Enrichment result must pass threshold for both tests. --> ''Both''
      1. Enrichment result must pass one of the above tests to be included in the results --> ''Either''
-Line 60:
+Line 69:
-=== Step 3: Examining Results ===
+=== Step 4: Examining Results ===
-Line 64:
+Line 73:
-. Node border (outer circle) size corresponds to the number of genes in dataset 2 within the geneset
  1. Colour of the node (inner circle) and border(outer circle) corresponds to the significance of the geneset for dataset 1 and dataset 2, respectively.
  1. Edge size corresponds to the number of genes that overlap between the two connected genesets.  Green edges correspond to both datasets when it is the only colour edge.  When there are two different edge colours, green corresponds to dataset 1 and blue corresponds to dataset 2.
    * '''NOTE''': if you are using two enrichment sets you will see two different colours of edges in the enrichment map.  When the set of genes in the two datasets are different (for example, when you are comparing two different species or when you are comparing results from two different platforms) the overlaps are computed for each dataset separately as there is a different set of genes that the enrichments were calculated on.  In this case, since the enrichments were reduced to only a subset of most differentially expressed at each time point the set of genes the enrichments are calculated on are different and overlap are calculated for each set separately.
+. Colour of the node (inner circle) corresponds to the significance of the geneset for dataset 1.
  1. Edge size corresponds to the number of genes that overlap between the two connected genesets.