Size: 2438
Comment:
|
Size: 2805
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Enrichment Map Genesets = | = Enrichment Map Genesets = |
Line 45: | Line 45: |
1. Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt, Human_KEGG_Entrezgene.gmt, Human_Reactome_Entrezgene.gmt ) {{{ cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt Human_KEGG_Entrezgene.gmt Human_Reactome_Entrezgene.gmt > MyCustomizedSet.gmt }}} |
Enrichment Map Genesets
Summary
Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with GSEA) updated monthly from original source locations available with:
- Entrez gene ids
UniProt accessions
- Gene symbols
Sources
Source |
File Origin |
File Type |
ID extracted |
KEGG |
static (July 2011) |
gmt |
symbol |
IOB |
static (July 2011) |
biopax |
Entrez gene |
Msigdb - c2 |
static (needs to be updated manually) |
gmt |
Entrez gene |
NetPath |
website (scripted grab of file numbered 1-25) |
biopax |
Entrez gene |
HumanCyc |
scripted grab of zipped release from password protected website. |
biopax |
Uniprot |
NCI |
scripted grab from pathwaycommons |
gmt |
Entrez gene |
Biocarta |
msigDB -c2 |
gmt |
Entrez gene |
Reactome |
scripted grab of zipped release from website |
biopax |
Uniprot |
GO |
scripted grab from ftp site |
GAF |
Uniprot |
Specialty GMTs |
grab from Msigdb |
gmt |
Entrez gene |
File Structure
< > denotes directory
<Release> - directory is named according to date sets were updated.
<Species>
<Identifier> - (either Entrez gene, UniProt, Gene symbol)
<GO>
- BP = biological process
- MF = molecular function
- CC = Cellular component
- All = BP + MF + CC
no_GO_IEA - indicates that the file excludes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
with_GO_IEA - indicates that the file includes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
<Pathways>
<miRs>
<TF>
<Disease phenotypes>
In each <identifier> directory There are amalgamated gene set files:
AllPathways - contains all pathway sources in the Pathways directory
- GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory.
Creating customized Genesets
Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt, Human_KEGG_Entrezgene.gmt, Human_Reactome_Entrezgene.gmt )
cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt Human_KEGG_Entrezgene.gmt Human_Reactome_Entrezgene.gmt > MyCustomizedSet.gmt