GeneMania Data Warehouse (DW)
Build Process & Comments
The following activity diagram is a high level representation for the DW build process:
- We are following a 'flush all' policy. In other words, every new release for a particular data source will overwrite the existing one. We might keep the old flat files/table dumps/previous schemas for a while. This update policy will relieve us from the burden of making selective updates.
- In general, the preference is to load a piece of information directly from its primary source, if available inhouse, and not from any secondary sources. For example, you might find synonyms for Entrez genes in the core Ensembl database, but, since Entrez is one of our primary DW resources, we would rely on it for Entrez synonyms.