DW ID Mapping: README

The DW subsystem has a command line tool that ties all the different DW components together to generate and validate ID mapping files (tables) based on the specs described elsewhere on the GM wiki. To run the ID mapping tool, do the following:

  1. Unzip the file GMDW.zip.
  2. Change directory to GMDW ('cd GMDW').
  3. Run the program from the command line prompt, using any of the available options. For example, run the command:
    • java -Xms120m -Xmx550m -jar ./dist/GeneMania.jar ENSEMBL_ENTREZ Hs false

Generating ID Mapping/Validation Files

The tool can be used in two modes:

1. One-by-one mode: In this mode, ID mapping/validation files are generated for a specific species and a specific mapping type as follows:

2. Bulk (All-in-one) mode: In this mode, ID mapping/validation files are generated for all species and all 'required' mappings:

Comments

1. Note that the command line mechanism uses preset default parameters if none are provided at the command line prompt. These can be set in the DW.properties file (for either mode, separately).

2. Use the DW.properties file to customize general properties as needed. For example, to generate a copy of the ID Mapping tables, without the changes the validation step introduces, but still generate the validation reports, set the DefIVFix property to 'false' (default setting is 'true'). You can also use it to specify the version of the local mirrors to use in the ID mapping.

3. The default location for the output files is under the DWTools directory.

4. The default name for a mapping file is as follows: mappingType + '_' + speciesName. This file name cannot be customized when running in batch mode.

5. In general, its faster to 'test drive' the ID mapping, without the GMID generating mechanism, first (i.e. setting the 'saveLocal' command line option to false). Then to rerun again while generating IDs.

6. When running the program in the one-by-one mode, you cannot just combine any species with any mapping type. Refer to the wiki for more details on this.

7. The IVReports file names can be partially customized in the DW.properties file.

8. The tool can be packaged in a better way, as part of a 'global' build process. It can also be hooked up to the 'logging machinery' of the build process, as well. The performance can probably be improved, and there are some tips on that (in and out of the code) for the future, but there is no urge for this at the moment.

GeneMania/DWIDMappingREADME (last edited 2009-11-11 20:48:25 by RashadBadrawi)

MoinMoin Appliance - Powered by TurnKey Linux