Enrichment Map Gene Sets

EnrichmentMap is a Cytoscape plugin developed in the Baderlab to help visualize, navigate and analyze functional enrichment results as generated from programs such as Gene Set Enrichment Analysis(GSEA), BiNGO, or David. Some enrichment programs, such as GSEA, allow the user to search against their own gene set database. As annotation (gene set) sources are regularly updated as new information is discovered we set up an automated system to update our gene set collections so we are always using the most up-to-date annotations.

If you use these gene sets, please cite our Enrichment Map paper.

Important Note (January 2016) - with the latest build of pathways we have removed KEGG from the main compilation set of pathways. If you would like to include KEGG in your analysis the sets are located in the misc/ directory and can be appended to your gmt file.

Important Note (April 2012) - Genesets files from December 2011, January 2012, Februrary 2012, and March 2012 had an error in the up-propagation of GO. Up-propagation only followed the is-a relationship and did not follow the part-of relationship which translates into missing annotations. This primarily effects genesets in GO cellular compartment.

Summary

Current Stats

Sources

Source

File Origin

File Type

ID extracted

Frequency source is updated

Number of pathways

KEGG (1) - no longer included in main gmt file (January 2016)

KEGG ftp site (July 2011)

GMT

Symbol

static as of July 1, 2011

236

Msigdb - c2 (2)
(other + Biocarta)

manual download from Msigdb

GMT

Entrez gene

sporadically

Biocarta - 217
Other - 47

NCI (3)

scripted download of zipped release from website

BioPAX

Entrez gene

sporadically

219 pathways

Institute of Bioinformatics (IOB)

received directly from IOB - static (July 2011)

BioPAX

Entrez gene

sporadically

35 pathways -
10 are the same as CellMap,
1 is the same as NetPath

NetPath(4)
[also from IOB]

scripted download of files numbered 1-25

BioPAX

Entrez gene

static

25 pathways -
12 are cancer pathways (10 are CellMap)
13 are immunity pathways

HumanCyc (5)

scripted download of zipped release from password protected website.

BioPAX

UniProt

updated periodically

249 Pathways

Reactome (6)

scripted download of zipped release from website

BioPAX

UniProt

updated release

1117 pathways (release 37)

GO (7)

scripted download from EBI ftp site (human)

GAF

Uniprot

released once a month

13,034 no GO IEA
15,181 with GO IEA

Msigdb - c3 (2)
Specialty GMTs
mirs, transcription factors

manual download from Msigdb

GMT

Entrez gene

sporadically

221 miRs
616 TFs

Panther (8)

scripted download of biopax archive

BioPAX

UniProt

updated periodically

307 Pathways

Pathbank

scripted download of biopax

BioPAX

UniProt

updated periodically

1001 pathways

WikiPathways

scripted download of GMT

GMT

Entrezgene id

updated regularly

864 pathways

https://pfocr.wikipathways.org/

scripted download of GMT

Entrezgene id

updated

49361 pathways

Source

File Origin

File Type

ID extracted

Frequency source is updated

Number of pathways

Reactome (6)

scripted download of zipped release from website

BioPAX

UniProt

updated release

946 pathways (release 37)

GO (7)

scripted download from MGI ftp site (mouse)

GAF

MGI

released once a month

14,563 no GO IEA
15,041 with GO IEA

KEGG (1)

translated from Human using Homologene

GMT

Entrez gene

static as of July 1, 2011

236

Msigdb - c2 (2)
(other + Biocarta)

translated from Human using Homologene

GMT

Entrez gene

sporadically

total 880:
Kegg -186
Reactome - 430
Biocarta - 217
Other - 47

NCI (3)

translated from Human using Homologene

GMT

Entrez gene

sporadically

219 pathways

Institute of Bioinformatics (IOB)

translated from Human using Homologene

GMT

Entrez gene

sporadically

35 pathways -
10 are the same as CellMap,
1 is the same as NetPath

NetPath (4)
[also from IOB]

translated from Human using Homologene

GMT

Entrez gene

static

25 pathways -
12 are cancer pathways (10 are CellMap)
13 are immunity pathways

HumanCyc (5)

translated from Human using Homologene

GMT

Entrez gene

updated periodically

249 Pathways

Panther (8)

translated from Human using Homologene

BioPAX

UniProt

updated periodically

307 Pathways

Pathbank

translated from Human using Homologene

BioPAX

UniProt

updated periodically

1001 pathways

WikiPathways

scripted download of GMT

GMT

Entrezgene id

updated regularly

202 pathways

https://pfocr.wikipathways.org/

scripted download of GMT

Entrezgene id

updated

19595 pathways

Specialty Gene Sets

File Structure

< > denotes directory

Creating customized Gene Sets

  1. Download the desired gene set files you would like to use in your customized set and concatenate the files.
    For example, to combine Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt, you can use the following linux command:

   cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt

References

  1. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2011 Nov 10. PMID: 22080510
    Pubmed

  2. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. PMID: 16199517
    Pubmed

  3. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the Pathway Interaction Database. Nucleic Acids Res. 2009 Jan;37(Database issue):D674-9. PMID: 18832364
    Pubmed

  4. Kandasamy K, et a NetPath: a public resource of curated signal transduction pathways.Genome Biol. 2010 Jan 12;11(1):R3. PMID: 20067622
    Pubmed

  5. Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 2005;6(1):R2. Epub 2004 Dec 22. PMID: 15642094
    Pubmed

  6. Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D'Eustachio P, Stein L. Reactome: a database of reactions, pathways and biological processes Nucleic Acids Res. 2011 Jan;39(Database issue):D691-7. PMID: 21067998
    Pubmed

  7. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000 May;25(1):25-9. PMID: 10802651
    Pubmed

  8. Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, Kitano H, Thomas PD. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D284-8. PubMed PMID: 15608197
    Pubmed

GeneSets (last edited 2024-03-13 13:42:56 by RuthIsserlin)

MoinMoin Appliance - Powered by TurnKey Linux