Size: 2734
Comment:
|
Size: 2859
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 30: | Line 30: |
* Significance level of a gene set (nominal p-value) * Significance level for multiple gene sets (FWER and/or FDR) |
A Summary of GSEA (Gene Set Enrichment Analysis)
GSEA goal
The below description is directly cited from Subramanian et al. (2005):
- The goal of GSEA is to determine whether members of a gene set S tend to occur toward the top (or bottom) of the ranked gene list L, in which case the gene set is correlated with the phenotypic class distinction.
- Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout L or primarily found at the top or bottom. We expect that sets related to the phenotypic distinction will tend to show the latter distribution.
GSEA methods
- Three key elements
- Calculation of an enrichment score (ES)
- Walking down the ranked list of genes, increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is not.
- The magnitude of the increment depends on the correlation of the gene with the phenotype (or absolute value of the ranking metric).
- The ES is the maximum deviation from zero encountered in the random walk.
- Estimation of significance level of ES (nominal p-value)
- Adjustment for multiple hypothesis testing (FDR)
- Calculation of an enrichment score (ES)
- Mathematical description
- Enrichment score (ES)
- ES is the maximum deviation from zero of Phit – Pmiss.
- Pmiss is the empirical distribution function of the genes not in the gene set S, which is extended into the ranked gene list L.
- Phit is the cumulative distribution function of the genes in S with probability density of the rank metric, extended into the ranked gene list L.
- ES corresponds to a weighted Kolmogorov–Smirnov-like statistic
- When p = 0, ES reduces to the standard Kolmogorov–Smirnov statistic.
- Phit is the empirical distribution function of the genes in S, extended into the ranked gene list L.
- ES = sup{|Phit - Pmiss|}, used to test whether the two underlying probability distributions differ.
- Null distribution of ES follows Kolmogorov distribution
- When p = 1, the null distribution of ES is unknown, and estimated by permutation approach.
- When p = 0, ES reduces to the standard Kolmogorov–Smirnov statistic.
- ES is the maximum deviation from zero of Phit – Pmiss.
- Significance level of a gene set (nominal p-value)
- Significance level for multiple gene sets (FWER and/or FDR)
- Enrichment score (ES)
References
Subramanian et al. (2005)
GSEA user guide
GSEA documentation