6091
Comment:
|
6046
|
Deletions are marked like this. | Additions are marked like this. |
Line 15: | Line 15: |
* example --> ATP-dependent protein binding|GO|GO:0043008 '''OR''' arginine biosynthesis IV|HUMANCYC|ARGININE-SYN4-PWY | * Example --> ATP-dependent protein binding|GO|GO:0043008 '''OR''' arginine biosynthesis IV|HUMANCYC|ARGININE-SYN4-PWY |
Line 17: | Line 17: |
* example --> ATP-dependent protein binding '''OR''' arginine biosynthesis IV * Gene = identified by one of the three possible identifiers (Engrez gene id, !UniProt accession or gene symbols) |
* Example --> ATP-dependent protein binding '''OR''' arginine biosynthesis IV * Gene = identified by one of the three possible identifiers (Entrez gene id, !UniProt accession or gene symbols) |
Line 22: | Line 22: |
|| '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || || [[http://www.genome.jp/kegg/|KEGG]] || KEGG ftp site (July 2011) || gmt || symbol || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <<BR>> (other + Biocarta) || static (needs to be updated manually) || gmt || Entrez gene || sporadically || Biocarta - 217<<BR>> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || sporadically || 219 pathways || || IOB || directly from IOB - static (July 2011) || biopax || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| || [[www.netpath.org/browse/|NetPath]] || [[www.netpath.org/browse]] (scripted grab of file numbered 1-25) || biopax || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] || scripted grab of zipped release from password protected website. || biopax || Uniprot || updated periodically || 249 Pathways || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || biopax || Uniprot || updated release || 1117 pathways (release 37) || |
|| '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathways''' || || [[http://www.genome.jp/kegg/|KEGG]] || KEGG ftp site (July 2011) || GMT || Symbol || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <<BR>> (other + Biocarta) || static (needs to be updated manually) || GMT || Entrez gene || sporadically || Biocarta - 217<<BR>> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] || [[http://pid.nci.nih.gov/download.shtml|NCI]] || BioPAX || Entrez gene || sporadically || 219 pathways || || IOB || directly from IOB - static (July 2011) || BioPAX || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| || [[http://www.netpath.org/browse/|NetPath]] || (scripted grab of file numbered 1-25) || BioPAX || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] || scripted grab of zipped release from password protected website. || BioPAX || !UniProt || updated periodically || 249 Pathways || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || BioPAX || !UniProt || updated release || 1117 pathways (release 37) || |
Line 31: | Line 31: |
|| [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c3]] <<BR>> Specialty GMTs <<BR>> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || sporadically || 221 miRs <<BR>> 616 TFs || | || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c3]] <<BR>> Specialty GMTs <<BR>> mirs, transcription factors || grab from Msigdb || GMT || Entrez gene || sporadically || 221 miRs <<BR>> 616 TFs || |
Line 34: | Line 34: |
|| '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || biopax || Uniprot || updated release || 946 pathways (release 37) || || GO || scripted grab from MGI ftp site (mouse) || GAF || MGI || released once a month || 14,563 no GO IEA <<BR>> 15,041 with GO IEA || || [[http://www.genome.jp/kegg/|KEGG]] || ''translated from Human using Homologene'' || gmt || Entrezgene || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <<BR>> (other + Biocarta)|| ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || total 880:<<BR>> Kegg -186<<BR>> Reactome - 430<<BR>> Biocarta - 217<<BR>> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 219 pathways || || IOB || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || [[www.netpath.org/browse/|NetPath]] || ''translated from Human using Homologene'' || gmt || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] || ''translated from Human using Homologene'' || gmt || Entrez gene || updated periodically || 249 Pathways || |
|| '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathways''' || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || BioPAX || !UniProt || updated release || 946 pathways (release 37) || || [[http://www.informatics.jax.org/mgihome/GO/project.shtml|GO]] || scripted grab from MGI ftp site (mouse) || GAF || MGI || released once a month || 14,563 no GO IEA <<BR>> 15,041 with GO IEA || || [[http://www.genome.jp/kegg/|KEGG]] || ''translated from Human using Homologene'' || GMT || Entrez gene || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <<BR>> (other + Biocarta)|| ''translated from Human using Homologene'' || GMT || Entrez gene || sporadically || total 880:<<BR>> Kegg -186<<BR>> Reactome - 430<<BR>> Biocarta - 217<<BR>> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] || ''translated from Human using Homologene'' || GMT || Entrez gene || sporadically || 219 pathways || || IOB || ''translated from Human using Homologene'' || GMT || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| || [[http://www.netpath.org/browse/|NetPath]] || ''translated from Human using Homologene'' || GMT || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] || ''translated from Human using Homologene'' || GMT || Entrez gene || updated periodically || 249 Pathways || |
Line 63: | Line 63: |
* AllPathways - contains all pathway sources in the Pathways directory * GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory. |
* !AllPathways - contains all pathway sources in the Pathways directory * GOPathways - contains all GO (MF, BP, CC) and all Pathway sources in the Pathways directory. |
Enrichment Map Genesets
Summary
Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with GSEA) updated monthly from original source locations available with:
- Entrez gene ids
UniProt accessions
- Gene symbols
- The GMT File format contains one gene set per line. Each line contains:
- Name (tab) Description (tab) Gene (tab) Gene (tab) ...
- In our format:
- Name = Gene set Name | Gene set Source | Gene set Source identifier
Example --> ATP-dependent protein binding|GO|GO:0043008 OR arginine biosynthesis IV|HUMANCYC|ARGININE-SYN4-PWY
- Description = Gene set Name
Example --> ATP-dependent protein binding OR arginine biosynthesis IV
Gene = identified by one of the three possible identifiers (Entrez gene id, UniProt accession or gene symbols)
- Name = Gene set Name | Gene set Source | Gene set Source identifier
Sources
Human
Source |
File Origin |
File Type |
ID extracted |
Frequency source is updated |
Number of pathways |
KEGG ftp site (July 2011) |
GMT |
Symbol |
static as of July 1, 2011 |
236 |
|
Msigdb - c2 |
static (needs to be updated manually) |
GMT |
Entrez gene |
sporadically |
Biocarta - 217 |
BioPAX |
Entrez gene |
sporadically |
219 pathways |
||
IOB |
directly from IOB - static (July 2011) |
BioPAX |
Entrez gene |
sporadically |
35 pathways - |
(scripted grab of file numbered 1-25) |
BioPAX |
Entrez gene |
static |
25 pathways - |
|
scripted grab of zipped release from password protected website. |
BioPAX |
UniProt |
updated periodically |
249 Pathways |
|
scripted grab of zipped release from website |
BioPAX |
UniProt |
updated release |
1117 pathways (release 37) |
|
scripted grab from EBI ftp site (human) |
GAF |
Uniprot |
released once a month |
13,034 no GO IEA |
|
Msigdb - c3 |
grab from Msigdb |
GMT |
Entrez gene |
sporadically |
221 miRs |
Mouse
Source |
File Origin |
File Type |
ID extracted |
Frequency source is updated |
Number of pathways |
scripted grab of zipped release from website |
BioPAX |
UniProt |
updated release |
946 pathways (release 37) |
|
scripted grab from MGI ftp site (mouse) |
GAF |
MGI |
released once a month |
14,563 no GO IEA |
|
translated from Human using Homologene |
GMT |
Entrez gene |
static as of July 1, 2011 |
236 |
|
Msigdb - c2 |
translated from Human using Homologene |
GMT |
Entrez gene |
sporadically |
total 880: |
translated from Human using Homologene |
GMT |
Entrez gene |
sporadically |
219 pathways |
|
IOB |
translated from Human using Homologene |
GMT |
Entrez gene |
sporadically |
35 pathways - |
translated from Human using Homologene |
GMT |
Entrez gene |
static |
25 pathways - |
|
translated from Human using Homologene |
GMT |
Entrez gene |
updated periodically |
249 Pathways |
File Structure
< > denotes directory
<Release> - directory is named according to date sets were updated.
<Species>
<Identifier> - (either Entrez gene, UniProt, Gene symbol)
<GO>
- BP = biological process
- MF = molecular function
- CC = Cellular component
- All = BP + MF + CC
no_GO_IEA - indicates that the file excludes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
with_GO_IEA - indicates that the file includes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
<Pathways>
<miRs>
<TF>
<Disease phenotypes>
In each <identifier> directory There are amalgamated gene set files:
AllPathways - contains all pathway sources in the Pathways directory
- GOPathways - contains all GO (MF, BP, CC) and all Pathway sources in the Pathways directory.
Creating customized Genesets
Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt )
cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt