Differences between revisions 18 and 26 (spanning 8 versions)

A to Z protocol to create an EnrichmentMap from Gene Expression Data and using GSEA (Gene Set Enrichment Analysis)

system requirement to run this workflow:
- TOBE COMPLETED
goal (what is the goal of enrichment analysis)
specific goal: known all pathways that could be altered between the 2 (or more) conditions that we are testing. We aim in this analysis to have a global and comprehensive view of what is happening in the cells. Snapshots of entire cells at the moment the RNA was extracted.
description of the steps:
How to create a rank file (.rnk)
- the rank file contains only 2 columns. The gene names as the first column and the differential expression values for each gene as the second column. In this protocol, we will use the t value from a moderated Student's t-test. Headers (column names) should be removed. The format should be tab delimited (meaning that the columns are separated by tabs) and the file extension should be .rnk.
- the rank file is a format described in the GSEA documentation: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
- the rank file will be used to run the gene set enrichment analysis (GSEA).
How to get the pathway database file (.gmt)
- This file contains all known and curated biological functions. For each of these functions, the names of genes known to be implicated in this function are listed beside this function. The gene set enrichment analysis will look if the top differentially expressed genes are included in some of these pathways.
- In this protocol we are going to use a file that include pathways from different sources (e.g Gene Ontology, Reactome, Kegg,...). We observed that having the most comprehensive set of pathways gave more sensitive results. Although databases that are included in this file can be overlapping, they are not 100% identical -thus using several give more sensitivity - and the clusters created by the enrichment map from these different sources add confidence about the perturbation of a given biological function.
- The link to the database file compiled by the BaderLab and updated monthly can be found at: http://download.baderlab.org/EM_Genesets/ (look for the current release at the bottom of the list) and a description of how the file is being created at: http://baderlab.org/GeneSets
How to run GSEA
- GSEA can be downloaded from http://www.broadinstitute.org/gsea/index.jsp
  - you need to enter a valid e-mail address before going to the download section
How to create an expression file
How to create a map
What is the next step, how to use the map
- How to create a figure
- How to interpret the results
- What next
(How to preprocess the data using R)
(How to preprocess the data using Excel)

FIRST EXAMPLE WITH AFFYMETRIX MICROARRAY DATA

description of the data

Download the data from GEO

Installation

1) install R (http://www.r-project.org/)
2) install RStudio (http://www.rstudio.com/)
3) Go through on online R tutorial (e.g. this one: http://www.cyclismo.org/tutorial/R/)

-  ⇤ ← Revision 18 as of 2014-09-16 14:31:11 → 
  Size: 2149
  Editor: VeroniqueVoisin
  Comment:
+   ← Revision 26 as of 2014-09-16 14:55:21 → ⇥
  Size: 3449
  Editor: VeroniqueVoisin
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= A to Z protocol to create an EnrichmentMap from gene expression data and using GSEA (Gene Set Enrichment Analysis) =
 * goal
+= A to Z protocol to create an EnrichmentMap from Gene Expression Data and using GSEA (Gene Set Enrichment Analysis) =

 * system requirement to run this workflow:
  * TOBE COMPLETED

 * goal (what is the goal of enrichment analysis)
-Line 4:
+Line 8:
+ * description of the steps:
-Line 11:
+Line 16:
-  * In this protocol we are going to use a file that include pathways from different sources. We observed that
+  * This file contains all known and curated biological functions. For each of these functions,  the names of genes known to be implicated in this function are listed beside this function. The gene set enrichment analysis will look if the top differentially expressed genes are included in some of these pathways.
  * In this protocol we are going to use a file that include pathways from different sources (e.g Gene Ontology, Reactome, Kegg,...). We observed that having the most comprehensive set of pathways gave more sensitive results. Although databases that are included in this file can be overlapping, they are not 100% identical -thus using several give more sensitivity - and the clusters created by the enrichment map from these different sources add confidence about the perturbation of a given biological function. 
  * The link to the database file compiled by the BaderLab and updated monthly can be found at: http://download.baderlab.org/EM_Genesets/ (look for the current release at the bottom of the list) and a description of how the file is being created at: http://baderlab.org/GeneSets
-Line 14:
+Line 21:
+  * GSEA can be downloaded from http://www.broadinstitute.org/gsea/index.jsp
   * you need to enter a valid e-mail address before going to the download section

Navigation

A to Z protocol to create an EnrichmentMap from Gene Expression Data and using GSEA (Gene Set Enrichment Analysis)

FIRST EXAMPLE WITH AFFYMETRIX MICROARRAY DATA

Download the data from GEO

Installation

How to preprocess the data (normalization, QC, differential expression)

How to update the annotations

How to create a rank file

How to create an expression file

How to run GSEA

How to create a map

What is the next step, how to use the map