= Enrichment Map Genesets = <> == Summary == * Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with [[http://www.broadinstitute.org/gsea/index.jsp| GSEA]]) updated '''monthly''' from original source locations available with: 1. Entrez gene ids 1. !UniProt accessions 1. Gene symbols == Sources == * '''Human''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || '''Notes''' || || KEGG || KEGG ftp site (July 2011) || gmt || symbol || static as of July 1, 2011 || 236 || Not available in biopax, available in flatfile, translated into gmt files || || Msigdb - c2|| static (needs to be updated manually) || gmt || Entrez gene || sporadically || Biocarta - 217<
> Other - 47 || Only need other and Biocarta as all other sources are currently covered || || NCI || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || sporadically || 219 pathways || || || {X} Biocarta || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || static || 386 pathways || '''Biopax 3 - Complete Mess!''' - currently getting from Msigdb || || IOB || directly from IOB - static (July 2011) || biopax || Entrez gene || sporadically || 35 pathways - <
> 10 are the same as !CellMap,<
> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || !NetPath || [[www.netpath.org/browse]] (scripted grab of file numbered 1-25) || biopax || Entrez gene || static || 25 pathways - <
> 12 are cancer pathways (10 are !CellMap) <
> 13 are immunity pathways || need biopax pathways fixed so species info is correct but information is still extractable. || || !HumanCyc || scripted grab of zipped release from password protected website. || biopax || Uniprot || updated periodically || 249 Pathways || available in biopax level 2 and level 3 || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || updated release || 1117 pathways (release 37) || No way of getting version of release from biopax file || || GO || scripted grab from EBI ftp site (human) || GAF || Uniprot || released once a month || 13,034 no GO IEA <
> 15,181 with GO IEA || source is direct from original curator of annotations || || msigdb - c3 <
> Specialty GMTs <
> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || sporadically || 221 miRs <
> 616 TFs || || * '''Mouse''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || '''Notes''' || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || updated release || 946 pathways (release 37) || No way of getting version of release from biopax file || || GO || scripted grab from MGI ftp site (human) || GAF || MGI || released once a month || 14,563 no GO IEA <
> 15,041 with GO IEA || source is direct from original curator of annotations || || KEGG || ''translated from Human using Homologene'' || gmt || Entrezgene || static as of July 1, 2011 || 236 || Not available in mouse specific format || || Msigdb - c2|| ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || total 880:<
> Kegg -186<
> Reactome - 430<
> Biocarta - 217<
> Other - 47 || Only need other and Biocarta as all other sources are currently covered || || NCI || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 219 pathways || || || IOB || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 35 pathways - <
> 10 are the same as !CellMap,<
> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || !NetPath || ''translated from Human using Homologene'' || gmt || Entrez gene || static || 25 pathways - <
> 12 are cancer pathways (10 are !CellMap) <
> 13 are immunity pathways || need biopax pathways fixed so species info is correct but information is still extractable. || || !HumanCyc || ''translated from Human using Homologene'' || gmt || Entrez gene || updated periodically || 249 Pathways || available as Mousecyc in biopax but when we parsed it we got a fraction of the pathways that are in human so chose to convert the human files instead || == File Structure == '''< > denotes directory''' * - directory is named according to date sets were updated. * * - (either Entrez gene, !UniProt, Gene symbol) * * BP = biological process * MF = molecular function * CC = Cellular component * All = BP + MF + CC * no_GO_IEA - indicates that the file '''excludes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * with_GO_IEA - indicates that the file '''includes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * * * * * In each directory There are amalgamated gene set files: * AllPathways - contains all pathway sources in the Pathways directory * GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory. == Creating customized Genesets == 1. Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt ) {{{ cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt }}}