Differences between revisions 2 and 6 (spanning 4 versions)

A Summary of GSEA (Gene Set Enrichment Analysis)

GSEA goal

The below description is directly cited from Subramanian et al. (2005):

The goal of GSEA is to determine whether members of a gene set S tend to occur toward the top (or bottom) of the ranked gene list L, in which case the gene set is correlated with the phenotypic class distinction.
Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout L or primarily found at the top or bottom. We expect that sets related to the phenotypic distinction will tend to show the latter distribution.

GSEA methods

Three key elements
1. Calculation of an enrichment score (ES)
  1. Walking down the ranked list of genes, increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is not.
  2. The magnitude of the increment depends on the correlation of the gene with the phenotype (or absolute value of the ranking metric).
  3. The ES is the maximum deviation from zero encountered in the random walk.
2. Estimation of significance level of ES (nominal p-value)
3. Adjustment for multiple hypothesis testing (FDR)
Mathematical description
- Enrichment score (ES)
  1. ES is the maximum deviation from zero of Phit – Pmiss.
    - Pmiss is the empirical distribution function of the genes not in the gene set S, which is extended into the ranked gene list L.
    - Phit is the cumulative distribution function of the genes in S with probability density of the rank metric, extended into the ranked gene list L.
  2. ES corresponds to a weighted Kolmogorov–Smirnov-like statistic
    - When p = 0, ES reduces to the standard Kolmogorov–Smirnov statistic.
      - Phit is the empirical distribution function of the genes in S, extended into the ranked gene list L.
      - ES = sup{|Phit - Pmiss|}, used to test whether the two underlying probability distributions differ.
      - Null distribution of ES follows Kolmogorov distribution
    - When p = 1, the null distribution of ES is unknown, and estimated by permutation approach.
- Significance level of a gene set (nominal p-value)
- Significance level for multiple gene sets (FWER and/or FDR)

References

Subramanian et al. (2005)
GSEA user guide
GSEA documentation

-  ⇤ ← Revision 2 as of 2014-11-27 18:24:24 → 
  Size: 855
  Editor: ChangjiangXu
  Comment:
+   ← Revision 6 as of 2014-11-27 19:04:12 → ⇥
  Size: 2859
  Editor: ChangjiangXu
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-= Summary of Gene Set Enrichment Analysis (GSEA) =
+= A Summary of GSEA (Gene Set Enrichment Analysis) =
 Line 6:
-The below description is cited from [[http://www.pnas.org/content/102/43/15545.abstract | Subramanian et al. (2005)]]:
+The below description is directly cited from [[http://www.pnas.org/content/102/43/15545.full | Subramanian et al. (2005)]]:
 Line 9:
+== GSEA methods ==
 * Three key elements
   A. Calculation of an enrichment score (ES)
      1. Walking down the ranked list of genes, increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is not. 
      1. The magnitude of the increment depends on the correlation of the gene with the phenotype (or absolute value of the ranking metric).
      1. The ES is the maximum deviation from zero encountered in the random walk.
   A. Estimation of significance level of ES (nominal p-value)
   A. Adjustment for multiple hypothesis testing (FDR)

 * Mathematical description
   * Enrichment score (ES)
     1. ES is the maximum deviation from zero of Phit – Pmiss.
        * Pmiss is the empirical distribution function of the genes not in the gene set S, which is extended into the ranked gene list L.
        * Phit is the cumulative distribution function of the genes in S with probability density of the rank metric, extended into the ranked gene list L.
     1. ES corresponds to a weighted Kolmogorov–Smirnov-like statistic
        * When p = 0, ES reduces to the standard Kolmogorov–Smirnov statistic.
          * Phit is the empirical distribution function of the genes in S, extended into the ranked gene list L.
          * ES = sup{|Phit - Pmiss|}, used to test whether the two underlying probability distributions differ. 
          * Null distribution of ES follows Kolmogorov distribution
        * When p = 1, the null distribution of ES is unknown, and estimated by permutation approach.
   * Significance level of a gene set (nominal p-value)  
   * Significance level for multiple gene sets (FWER and/or FDR)

== References ==
[[http://www.pnas.org/content/102/43/15545.full | Subramanian et al. (2005)]] <<BR>>
[[http://www.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html | GSEA user guide]] <<BR>>
[[http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page | GSEA documentation]] <<BR>>