ExpressionCorrelation Documentation
ExpressionCorrelation is a plug-in built forCytoscape that computes a similarity network from either the genes or conditions in an expression matrix. The Expression Correlation plugin is verified to work with Cytoscape up to version 2.5.2.
Contents
1. Introduction
The ExpressionCorrelation plugin computes a similarity network from either the genes or conditions in an expression matrix (where genes are rows and conditions are columns). Nodes in a similarity network represent genes or conditions. Links represent similarity (or correlations) between vectors of the expression levels of genes across all given conditions (gene correlation network) or the similarity between vectors of the expression levels of all genes in a single condition (condition correlation network). The plugin allows the user to select an Expression Matrix of microarray data directly from Cytoscape and convert it to a visible interaction network in Cytoscape. The similarity matrix is computed using the Pearson Correlation Coefficient. A histogram tool is available for choosing a similarity strength threshold, in order to ease creation of a reasonably sized network. No statistical significance is currently implemented for the similarity network.
2. About
The ExpressionCorrelation plugin is freely available and open-source molecular profile visualization software. Gene expression data (loaded in via Cytoscape) can be used to create a gene or a condition correlation matrix. Any correlation above or below given threshold values, is displayed in Cytoscape as an 'edge' between two 'nodes' (the nodes are the two genes or conditions that are correlated). However, a correlation matrix can be very large, and often cannot be stored in memory, so this program saves only the relevant correlations as they are calculated. Calculation of the correlation matrix is relatively fast.
One problem with this approach is that the cutoff values cannot be lowered without recalculating the entire correlation matrix (they could be raised but a method to do this is not implemented here, instead to ignore low threshold values Cytoscape can be set up to not display them). In addition to losing the values below the threshold, another problem is that Cytoscape begins to have trouble displaying networks above several tens of thousands of edges. This means that good cutoff values must be chosen before the network is created: good cutoff values display as much of the network as possible without causing problems with CPU memory or creating cluttered networks.
To help users choose a good cutoff value, we added a histogram feature, which shows the number of edges associated with particular cutoff values and vice-versa. To view the histogram, the correlation matrix must be calculated (so it will be calculated once for the histogram and once for the network creation), which could cause the entire process to take up to twice as long. The process will be twice as long if the matrix calculation is the time limiting process, which is usually the case when networks contain a few thousand edges or less. However, the edge/node creation process quickly becomes the time limiting process when more than a few thousand edges are created (in this case it could take 100 times longer rather than just twice as long).
It is recommended that if the distribution of the correlation values is not known by the user, then the histogram should be used to limit the networks to a few thousand edges.
When using the histogram to limit the network size, note that depending on what is specified as the cutoffs there may be a difference in number of edges in the generated network than interactions reported in the histogram. This is because the network edges are computed precisely while the histogram number of interactions is based on the number of counts in each bin (corresponding to the cutoffs). Since each bin contains counts of correlations falling within a range of values, extra correlations may be included leading to a discrepancy.
Future directions include using weights in the correlation calculations in order to reduce data noise and down-weight multiple Affymetrix probe set IDs. Also, other similarity metrics and a statistical significance score for similarity links will be considered.
3. Installation Instructions
To use the ExpressionCorrelation Plugin, the user must first obtain a copy of Cytoscape, Version 2.0 or greater (up to version 2.5.2). The user can download a copy from: http://www.cytoscape.org/download_list.php.
Once the user has downloaded Cytoscape and verified that it works, the user can install the ExpressionCorrelation Plugin in one of two ways:
Download the plugin: ExpressionCorrelation.jar and copy the ExpressionCorrelation.jar file to the user [Cytoscape_Home]/plugins directory.
- Open Cytoscape. Under the 'Plugins' manager, select 'Manage Plugins', which will open the plugin manager. Find and select the 'Expression Correlation' plugin under the 'Network Inference' folder, and click on the 'Install' button.
The Plugin installation is now complete.
4. Using the ExpressionCorrelation Plugin
To use the ExpressionCorrelation Plugin:
- Start Cytoscape. This can be done by double clicking the Cytoscape icon in your [Cytoscape_Home] directory, or via the command line.
- On Unix/Linux or MacOS X, run: cytoscape.sh
- On Windows, run: cytoscape.bat
From the Main Menu, Select "File" ---> "Import" ---> "Attribute/Expression Matrix...", select the desired file and click the "Import" button.
From the Main Menu, Select "Plugins" ---> "Expression Correlation Network" --->
- "Construct Correlation Network"
This option will create the condition network and the gene network simultaneously using the default cutoffs "-0.95 & 0.95" or the user selected cutoffs from the previous run of the ExpressionCorrelation Plugin, but will not create the histogram of the data distribution. The two network file name extensions along with the default cutoffs used will appear in the 'Network' frame of the Cytoscape panel. If the network has fewer than the number of nodes specified in the Cytoscape viewThreshold property, a view will be created automatically and the network will appear in the right frame of Cytoscape. The viewThreshold property can be modified in the "Cytoscape Preferences Editor" from the Main Menu by selecting "Edit" ---> "Preferences" ---> "Properties". Otherwise, a view will not be created. In this case, to view the network: select the network by clicking on its file name extension (it will turn green), and from the Main Menu select "Edit" ---> "Create View".
- "Advanced Options"
"Condition Network: Preview Histogram"
This option will calculate and display the histogram of the condition matrix expression data distribution. In the histogram window the user can select the low and high cutoffs by manually typing them into the appropriate "Cutoff" text boxes. The user can choose to use only one set of cutoffs by deselecting the "low" or "high" checkbox. The user can select the number or percent of interactions to be displayed, rather than selecting cutoffs, by typing the number into the "Enter" text box and choosing "Number of Interactions" or "Percent of Interactions". Select "OK" to create the Condition Network using the parameters specified. The parameters specified will be saved for the duration of the Cytoscape session."Condition Network: Using Defaults"
This option will create the condition network using the default cutoffs or the user selected cutoffs from the previous execution of the ExpressionCorrelation Plugin in this Cytoscape session."Gene Network: Preview Histogram"
This option will calculate and display the histogram of the gene matrix expression data distribution and create the gene network according to the parameters specified by the user."Gene Network: Using Defaults"
This option will create the gene network using the default cutoffs or the user selected cutoffs from the previous run of the ExpressionCorrelation Plugin in this Cytoscape session.
- "Construct Correlation Network"
5. Biological Relevance
The ExpressionCorrelation Plugin allows for comparison of multiple networks of similarity relationships between genes that are derived from different subsets of conditions. It may be used to define modules (sets of genes - network nodes - in the simplest form) that can differentiate between stages or types of cancer. The differences between the networks can be computed using the Merge feature in Cytoscape.
6. Sample Data
Sample data containing 300 expression experiments from the Rosetta yeast compendium can be downloaded from here: Rosetta.mrna
Sample data containing 158 expression experiments from the human gene atlas project (http://www.pnas.org/content/101/16/6062) can be downloaded from here: human.mrna
7. Known Issues
- If the plugin runs out of memory (e.g. by trying to build too large a network) there is no out of memory message to the user.
- Under Cytoscape 2.6.x, sessions saved via the plugin cannot be reloaded (please use Cytoscape 2.5.2)
8. Contacts
This plugin was originally developed by Elena Potylitsine, Weston Whitaker and Gary Bader in the Sander Group, Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York City and has been updated by Shirley Hui and Laetitia Morrison in the Bader lab.
This software is made available under the LGPL (Lesser General Public License), which means that you can freely use it within your own software, but if you alter the code itself and distribute it, you must make the source code alterations freely available as well.
Source code is available at http://chianti.ucsd.edu/svn/csplugins/trunk/mskcc/summerstudents/ExpressionCorrelation/
This product includes jmathplot developed by the Yann Richet (http://jmathplot.sourceforge.net/).