= Enrichment Map Genesets = <<TableOfContents(3)>> == Summary == * Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with [[http://www.broadinstitute.org/gsea/index.jsp| GSEA]]) updated '''monthly''' from original source locations available with: 1. Entrez gene ids 1. !UniProt accessions 1. Gene symbols == Sources == * '''Human''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || '''Notes''' || || KEGG || KEGG ftp site (July 2011) || gmt || symbol || static as of July 1, 2011 || 236 || Not available in biopax, available in flatfile, translated into gmt files || || Msigdb - c2|| static (needs to be updated manually) || gmt || Entrez gene || sporadically || Biocarta - 217<<BR>> Other - 47 || Only need other and Biocarta as all other sources are currently covered || || NCI || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || sporadically || 219 pathways || || || {X} Biocarta || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || static || 386 pathways || '''Biopax 3 - Complete Mess!''' - currently getting from Msigdb || || IOB || directly from IOB - static (July 2011) || biopax || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || !NetPath || [[www.netpath.org/browse]] (scripted grab of file numbered 1-25) || biopax || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || need biopax pathways fixed so species info is correct but information is still extractable. || || !HumanCyc || scripted grab of zipped release from password protected website. || biopax || Uniprot || updated periodically || 249 Pathways || available in biopax level 2 and level 3 || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || updated release || 1117 pathways (release 37) || No way of getting version of release from biopax file || || GO || scripted grab from EBI ftp site (human) || GAF || Uniprot || released once a month || 13,034 no GO IEA <<BR>> 15,181 with GO IEA || source is direct from original curator of annotations || || msigdb - c3 <<BR>> Specialty GMTs <<BR>> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || sporadically || 221 miRs <<BR>> 616 TFs || || * '''Mouse''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || '''Notes''' || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || updated release || 946 pathways (release 37) || No way of getting version of release from biopax file || || GO || scripted grab from MGI ftp site (human) || GAF || MGI || released once a month || 14,563 no GO IEA <<BR>> 15,041 with GO IEA || source is direct from original curator of annotations || || KEGG || ''translated from Human using Homologene'' || gmt || Entrezgene || static as of July 1, 2011 || 236 || Not available in mouse specific format || || Msigdb - c2|| ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || total 880:<<BR>> Kegg -186<<BR>> Reactome - 430<<BR>> Biocarta - 217<<BR>> Other - 47 || Only need other and Biocarta as all other sources are currently covered || || NCI || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 219 pathways || || || IOB || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || !NetPath || ''translated from Human using Homologene'' || gmt || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || need biopax pathways fixed so species info is correct but information is still extractable. || || !HumanCyc || ''translated from Human using Homologene'' || gmt || Entrez gene || updated periodically || 249 Pathways || available as Mousecyc in biopax but when we parsed it we got a fraction of the pathways that are in human so chose to convert the human files instead || == File Structure == '''< > denotes directory''' * <Release> - directory is named according to date sets were updated. * <Species> * <Identifier> - (either Entrez gene, !UniProt, Gene symbol) * <GO> * BP = biological process * MF = molecular function * CC = Cellular component * All = BP + MF + CC * no_GO_IEA - indicates that the file '''excludes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * with_GO_IEA - indicates that the file '''includes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * <Pathways> * <miRs> * <TF> * <Disease phenotypes> * In each <identifier> directory There are amalgamated gene set files: * AllPathways - contains all pathway sources in the Pathways directory * GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory. == Creating customized Genesets == 1. Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt ) {{{ cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt }}}