2805
Comment:
|
6097
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from UpToDateGeneSets/EmGeneSetsReadme #acl All:read |
|
Line 11: | Line 13: |
|| '''Source''' || '''File Origin''' || '''File Type''' || ''ID extracted'' || || KEGG || static (July 2011) || gmt || symbol || || IOB || static (July 2011) || biopax || Entrez gene || || Msigdb - c2|| static (needs to be updated manually) || gmt || Entrez gene || || !NetPath || website (scripted grab of file numbered 1-25) || biopax || Entrez gene || || !HumanCyc || scripted grab of zipped release from password protected website. || biopax || Uniprot || || NCI || scripted grab from pathwaycommons || gmt || Entrez gene || || Biocarta || msigDB -c2 || gmt || Entrez gene || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || || GO || scripted grab from ftp site || GAF || Uniprot || || Specialty GMTs <<BR>> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || |
* '''Human''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || '''Notes''' || || KEGG || KEGG ftp site (July 2011) || gmt || symbol || static as of July 1, 2011 || 236 || Not available in biopax, available in flatfile, translated into gmt files || || Msigdb - c2|| static (needs to be updated manually) || gmt || Entrez gene || sporadically || Biocarta - 217<<BR>> Other - 47 || Only need other and Biocarta as all other sources are currently covered || || NCI || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || sporadically || 219 pathways || || || {X} Biocarta || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || static || 386 pathways || '''Biopax 3 - Complete Mess!''' - currently getting from Msigdb || || IOB || directly from IOB - static (July 2011) || biopax || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || !NetPath || [[www.netpath.org/browse]] (scripted grab of file numbered 1-25) || biopax || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || need biopax pathways fixed so species info is correct but information is still extractable. || || !HumanCyc || scripted grab of zipped release from password protected website. || biopax || Uniprot || updated periodically || 249 Pathways || available in biopax level 2 and level 3 || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || updated release || 1117 pathways (release 37) || No way of getting version of release from biopax file || || GO || scripted grab from EBI ftp site (human) || GAF || Uniprot || released once a month || 13,034 no GO IEA <<BR>> 15,181 with GO IEA || source is direct from original curator of annotations || || msigdb - c3 <<BR>> Specialty GMTs <<BR>> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || sporadically || 221 miRs <<BR>> 616 TFs || || * '''Mouse''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || '''Notes''' || || Reactome || scripted grab of zipped release from website || biopax || Uniprot || updated release || 946 pathways (release 37) || No way of getting version of release from biopax file || || GO || scripted grab from MGI ftp site (human) || GAF || MGI || released once a month || 14,563 no GO IEA <<BR>> 15,041 with GO IEA || source is direct from original curator of annotations || || KEGG || ''translated from Human using Homologene'' || gmt || Entrezgene || static as of July 1, 2011 || 236 || Not available in mouse specific format || || Msigdb - c2|| ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || total 880:<<BR>> Kegg -186<<BR>> Reactome - 430<<BR>> Biocarta - 217<<BR>> Other - 47 || Only need other and Biocarta as all other sources are currently covered || || NCI || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 219 pathways || || || IOB || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 35 pathways - <<BR>> 10 are the same as !CellMap,<<BR>> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || !NetPath || ''translated from Human using Homologene'' || gmt || Entrez gene || static || 25 pathways - <<BR>> 12 are cancer pathways (10 are !CellMap) <<BR>> 13 are immunity pathways || need biopax pathways fixed so species info is correct but information is still extractable. || || !HumanCyc || ''translated from Human using Homologene'' || gmt || Entrez gene || updated periodically || 249 Pathways || available as Mousecyc in biopax but when we parsed it we got a fraction of the pathways that are in human so chose to convert the human files instead || |
Line 45: | Line 60: |
1. Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt, Human_KEGG_Entrezgene.gmt, Human_Reactome_Entrezgene.gmt ) | 1. Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt ) |
Line 48: | Line 63: |
cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt Human_KEGG_Entrezgene.gmt Human_Reactome_Entrezgene.gmt > MyCustomizedSet.gmt | cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt |
Enrichment Map Genesets
Summary
Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with GSEA) updated monthly from original source locations available with:
- Entrez gene ids
UniProt accessions
- Gene symbols
Sources
Human
Source |
File Origin |
File Type |
ID extracted |
Frequency source is updated |
Number of pathwayss |
Notes |
KEGG |
KEGG ftp site (July 2011) |
gmt |
symbol |
static as of July 1, 2011 |
236 |
Not available in biopax, available in flatfile, translated into gmt files |
Msigdb - c2 |
static (needs to be updated manually) |
gmt |
Entrez gene |
sporadically |
Biocarta - 217 |
Only need other and Biocarta as all other sources are currently covered |
NCI |
biopax |
Entrez gene |
sporadically |
219 pathways |
|
|
Biocarta |
biopax |
Entrez gene |
static |
386 pathways |
Biopax 3 - Complete Mess! - currently getting from Msigdb |
|
IOB |
directly from IOB - static (July 2011) |
biopax |
Entrez gene |
sporadically |
35 pathways - |
need biopax pathways fixed so species info is correct but information is still extractable. |
NetPath |
www.netpath.org/browse (scripted grab of file numbered 1-25) |
biopax |
Entrez gene |
static |
25 pathways - |
need biopax pathways fixed so species info is correct but information is still extractable. |
HumanCyc |
scripted grab of zipped release from password protected website. |
biopax |
Uniprot |
updated periodically |
249 Pathways |
available in biopax level 2 and level 3 |
Reactome |
scripted grab of zipped release from website |
biopax |
Uniprot |
updated release |
1117 pathways (release 37) |
No way of getting version of release from biopax file |
GO |
scripted grab from EBI ftp site (human) |
GAF |
Uniprot |
released once a month |
13,034 no GO IEA |
source is direct from original curator of annotations |
msigdb - c3 |
grab from Msigdb |
gmt |
Entrez gene |
sporadically |
221 miRs |
|
Mouse
Source |
File Origin |
File Type |
ID extracted |
Frequency source is updated |
Number of pathwayss |
Notes |
Reactome |
scripted grab of zipped release from website |
biopax |
Uniprot |
updated release |
946 pathways (release 37) |
No way of getting version of release from biopax file |
GO |
scripted grab from MGI ftp site (human) |
GAF |
MGI |
released once a month |
14,563 no GO IEA |
source is direct from original curator of annotations |
KEGG |
translated from Human using Homologene |
gmt |
Entrezgene |
static as of July 1, 2011 |
236 |
Not available in mouse specific format |
Msigdb - c2 |
translated from Human using Homologene |
gmt |
Entrez gene |
sporadically |
total 880: |
Only need other and Biocarta as all other sources are currently covered |
NCI |
translated from Human using Homologene |
gmt |
Entrez gene |
sporadically |
219 pathways |
|
IOB |
translated from Human using Homologene |
gmt |
Entrez gene |
sporadically |
35 pathways - |
need biopax pathways fixed so species info is correct but information is still extractable. |
NetPath |
translated from Human using Homologene |
gmt |
Entrez gene |
static |
25 pathways - |
need biopax pathways fixed so species info is correct but information is still extractable. |
HumanCyc |
translated from Human using Homologene |
gmt |
Entrez gene |
updated periodically |
249 Pathways |
available as Mousecyc in biopax but when we parsed it we got a fraction of the pathways that are in human so chose to convert the human files instead |
File Structure
< > denotes directory
<Release> - directory is named according to date sets were updated.
<Species>
<Identifier> - (either Entrez gene, UniProt, Gene symbol)
<GO>
- BP = biological process
- MF = molecular function
- CC = Cellular component
- All = BP + MF + CC
no_GO_IEA - indicates that the file excludes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
with_GO_IEA - indicates that the file includes GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis)
<Pathways>
<miRs>
<TF>
<Disease phenotypes>
In each <identifier> directory There are amalgamated gene set files:
AllPathways - contains all pathway sources in the Pathways directory
- GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory.
Creating customized Genesets
Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt )
cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt