## page was renamed from UpToDateGeneSets/EmGeneSetsReadme
#acl All:read
= Enrichment Map Genesets =
<<TableOfContents(3)>>

== Summary ==
  * Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with [[http://www.broadinstitute.org/gsea/index.jsp| GSEA]]) updated '''monthly''' from original source locations available with:
    1. Entrez gene ids
    1. !UniProt accessions
    1. Gene symbols
  * The GMT File format contains one gene set per line.  Each line contains:
    * Name (tab) Description (tab) Gene (tab) Gene (tab) ...
    * In our format:
      * Name = Gene set Name | Gene set Source | Gene set Source identifier
        * example --> ATP-dependent protein binding|GO|GO:0043008  '''OR'''    arginine biosynthesis IV|HUMANCYC|ARGININE-SYN4-PWY
      * Description = Gene set Name
         * example -->  ATP-dependent protein binding '''OR'''    arginine biosynthesis IV
      * Gene = identified by one of the three possible identifiers (Engrez gene id, !UniProt accession or gene symbols)
   
== Sources ==
 * '''Human''' 
|| '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' ||  '''Number of pathwayss''' ||
|| [[http://www.genome.jp/kegg/|KEGG]] || KEGG ftp site (July 2011) || gmt || symbol ||  static as of July 1, 2011 || 236 ||
|| [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <<BR>> (other + Biocarta) || static (needs to be updated manually) || gmt || Entrez gene || sporadically ||  Biocarta - 217<<BR>> Other - 47 ||
|| [[http://pid.nci.nih.gov/|NCI]] || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || sporadically || 219 pathways ||
|| IOB || directly from IOB - static (July 2011) || biopax || Entrez gene || sporadically  || 35 pathways -  <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath||
|| [[http://www.netpath.org/browse/|NetPath]] || [[www.netpath.org/browse]] (scripted grab of file numbered 1-25) || biopax || Entrez gene || static ||  25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways ||
|| [[http://humancyc.org/|HumanCyc]] || scripted grab of zipped release from password protected website. || biopax || Uniprot || updated periodically || 249 Pathways  ||
|| [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || biopax || Uniprot || updated release ||  1117 pathways (release 37) ||
|| [[http://www.ebi.ac.uk/GO/|GO]] || scripted grab from EBI ftp site (human) || GAF || Uniprot || released once a month || 13,034 no GO IEA <<BR>> 15,181 with GO IEA  ||
|| [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c3]] <<BR>> Specialty GMTs <<BR>> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || sporadically || 221 miRs <<BR>> 616 TFs ||

 * '''Mouse''' 
|| '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' ||  '''Number of pathwayss''' ||
|| [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || biopax || Uniprot || updated release ||  946 pathways (release 37) ||
|| [[http://www.informatics.jax.org/mgihome/GO/project.shtml|GO]] || scripted grab from MGI ftp site (mouse) || GAF || MGI || released once a month || 14,563 no GO IEA <<BR>> 15,041 with GO IEA  ||
|| [[http://www.genome.jp/kegg/|KEGG]] || ''translated from Human using Homologene'' || gmt || Entrezgene ||  static as of July 1, 2011 || 236 ||
|| [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <<BR>> (other + Biocarta)|| ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically ||  total 880:<<BR>> Kegg -186<<BR>> Reactome - 430<<BR>> Biocarta - 217<<BR>> Other - 47 ||
|| [[http://pid.nci.nih.gov/|NCI]] || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 219 pathways ||
|| IOB || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically  || 35 pathways -  <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath||
|| [[http://www.netpath.org/browse/|NetPath]] || ''translated from Human using Homologene'' || gmt || Entrez gene || static ||  25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways ||
|| [[http://humancyc.org/|HumanCyc]] || ''translated from Human using Homologene'' || gmt || Entrez gene || updated periodically || 249 Pathways  ||


== File Structure ==
'''< > denotes directory'''
  * <Release> - directory is named according to date sets were updated.
    * <Species> 
      * <Identifier> - (either Entrez gene, !UniProt, Gene symbol)
        * <GO>
          * BP = biological process
          * MF = molecular function
          * CC = Cellular component
          * All = BP + MF + CC
          * no_GO_IEA - indicates that the file '''excludes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
          * with_GO_IEA - indicates that the file '''includes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
        * <Pathways>
        * <miRs>
        * <TF>
        * <Disease phenotypes>
        
  * In each <identifier> directory There are amalgamated gene set files:
    * AllPathways - contains all pathway sources in the Pathways directory
    * GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory.

== Creating customized Genesets ==
  1. Download the desired gene set files you would like to use in your customized set.  (For example Human_IOB_Entrezgene.gmt	Human_NetPath_Entrezgene.gmt )

{{{
   cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt
}}}