Differences between revisions 2 and 43 (spanning 41 versions)

Automatically Annotating an Enrichment Map

Automatic Annotation

Enrichment Maps are often large networks with multiple highly inter-connected clusters representing processes and functions that are well annotated in public databases. Summarizing these networks simplifies it allowing users to focus on the prominent themes. To do so manually is both subjective and time-consuming. Automatic Annotation makes this process easier by clustering similarly connected terms in a network (using clusterMaker) and labelling each cluster using WordCloud.

Requirements

Cytoscape version at least 3.2.0. At the time of this writing (August 2014) this has not yet been released. It is scheduled for release in late 2014. Until then, the latest working build (UNSTABLE) can be downloaded from here. Note that this requires Java 7.
WordCloud version at least 2.0.1
ClusterMaker version at least 0.9.3
EnrichmentMap version at least 2.0.2

Creating an annotation set

Once you have created your Enrichment Map, select “Annotate Clusters” from Enrichment Map under the Apps menu at the top of the screen to begin annotating.

This will open an Annotation Panel in the Cytoscape Control Panel, on the left of the window.

Annotation Panel

In this panel, you can specify certain parameters for the annotation. Once you are satisfied with the parameters (optionally need to be specified, you can annotate using defaults), pressing the 'Annotate!' button will create a new annotation set. This will run clusterMaker to find the clusters in your network, create a WordCloud for each cluster, and create labels for each cluster.

Gene set descriptions

Words contained in the gene set description column will be used to automatically generate labels. Usually these will be under the GS_DESCR column of the

Clustering

You can select a clustering algorithm from the dropdown menu. These algorithms will run in clusterMaker using their default parameters. MCL clustering is recommended by default. For more refined clustering, you can run clusterMaker on your own and then specify the column that it creates with your clusters (these will usually look something like 'mclCluster'). You can also specify your own clusters by entering it into a new column and selecting it here.

Repositioning nodes

Often there will be significant clutter between the clusters, which can potentially make the labels harder to read. Selecting the 'Layout nodes by cluster' checkbox will rearrange the nodes in your network to isolate each cluster.

Creating Groups

Selecting the 'Create Groups for clusters' checkbox will create Cytoscape Groups from the nodes in each cluster. These will allow you to collapse each cluster into a single node.

WordCloud Parameters

To set the parameters of WordCloud globally for an entire annotation set, do this before creating the Annotation Set in the 'WordCloud' panel. It is recommended to turn on network normalization, under 'Advanced', which will help make labels more specific.

The output for the example network, using the default parameters, will look like this:

Default output

Repositioning clusters

Selecting the 'Layout nodes by cluster' option will rearrange the nodes in your network, grouping them by network and applying a Prefuse Force Directed Layout to each group. This reduces the amount of overlap between clusters and makes the annotations easier to read and interpret. Alternatively, repositioning can be done manually by selecting clusters in the table, dragging the nodes to a new location, and pressing the 'Update' button. Multiple clusters can be selected and moved together by doing a Command+click for Mac users or Ctrl+click for other operating systems.

Editing clusters

If you are unsatisfied with the clusters in your network, it is easy to change them by using the buttons at the bottom of the Annotation Panel.

Creating new clusters

Selecting a group of nodes and pressing the 'Extract' button will create a new cluster from the selected node(s). If you are using discrete clustering (rather than fuzzy clustering) then the selected nodes will be removed from their previous clusters.

Merging clusters

Selecting two or more clusters from the table (Ctrl/Command + click) and pressing 'Merge' will create one cluster from the selected clusters.

Deleting clusters

Selecting one or more clusters from the table and pressing 'Delete' will remove the selected clusters from the network.

Display Options

After creating your annotations, there are several visual options that you can adjust in the 'Annotation Display Options Panel' on the right side of the screen in the 'Results Panel'.

Editing labels

Because the labels are computed automatically, they will often need some manual adjustment or reordering of words. This can be done easily, by double clicking on the name of a cluster in the table on the Annotation Panel, typing a new name, and pressing enter. It is recommended to look at the WordClouds when doing this, by selecting WordCloud under 'Show on selection:' in the Display Options menu.

Frequently Asked Questions

How are the labels computed? The labels are computed by gathering all of the descriptions of each gene set and passing them to WordCloud. WordCloud parses these and assigns sizes to each word, proportional to the frequency of the words in the cluster, optionally normalized to consider how frequently the words occur in the entire network. These words are then sorted in order of size. One at a time, the largest word in the cloud is added until the label is longer than the maximum label length (by default four words). The labels also can end when the next largest word is considerably smaller than the most recent word. An advantage is given to words in the same WordCloud groups (colours) as words already added to the label, and words coming from the most central node in the cluster. ‘Considerably smaller’ is defined as below a specified fraction of the previous word’s size - by default set to be 0.3 from the first word to the second, 0.8 from second to third, and 0.9 from third to fourth. All of these parameters can be specified by selecting 'Adjust Label Options' in the 'Annotation Display Options Panel' on the right side of the screen.
Why are some of the WordClouds blank, but still giving labels? Sometimes, WordCloud assigns a font size of 0 to words appearing infrequently. The annotation process will take this into consideration and still create a label out of these words.
Why aren’t all of the clusterMaker algorithms available? Only some of the algorithms in clusterMaker produce output of the format that can be parsed and used to partition the nodes for annotations.

Downloads

Contact

Arkady Arkhangorodsky (aarkhangorodsky@gmail.com)

Ruth Isserlin (ruth.isserlin@utoronto.ca)

Known bugs

ClusterMaker doesn’t run properly on a second enrichment map

This is a clusterMaker problem that the clusterMaker developers are working on fixing
This also creates a problem when trying to save the session file
To get around this, run clusterMaker through its menu rather than from the annotation panel

Groups are deleted upon saving

This is necessary to work around a bug in Cytoscape

-  ⇤ ← Revision 2 as of 2014-08-15 20:31:35 → 
  Size: 3355
  Editor: ArkadyArk
  Comment:
+   ← Revision 43 as of 2014-08-29 14:45:24 → ⇥
  Size: 8101
  Editor: ArkadyArk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= Annotating an Enrichment Map =
+= Automatically Annotating an Enrichment Map =

== Automatic Annotation ==
Enrichment Maps are often large networks with multiple highly inter-connected clusters representing processes and functions that are well annotated in public databases.  Summarizing these networks simplifies it allowing users to focus on the prominent themes. To do so manually is both subjective and time-consuming. Automatic Annotation makes this process easier by clustering similarly connected terms in a network (using clusterMaker) and labelling each cluster using WordCloud.
-Line 4:
+Line 7:
- *Cytoscape version at least 3.2.0
 *WordCloud version at least 2.0.2
 *ClusterMaker version at least 0.9.3
+ *Cytoscape version at least 3.2.0. At the time of this writing (August 2014) this has not yet been released. It is scheduled for release in late 2014. Until then, the latest working build (UNSTABLE) can be downloaded from [[http://code.cytoscape.org/jenkins/job/cytoscape-3-gui-distribution/lastSuccessfulBuild/org.cytoscape.distribution$cytoscape/|here]]. Note that this requires Java 7.
 *WordCloud version at least 2.0.1
 *[[http://apps.cytoscape.org/apps/clustermaker2|ClusterMaker]] version at least 0.9.3
 *EnrichmentMap version at least 2.0.2
-Line 9:
+Line 13:
-Once you have created your Enrichment Map network, selecting “Annotate Clusters” in the EnrichmentMap app menu will let you add annotations. The “Annotate!” button creates a new annotation set. It should be noted that labels are automatically generated using WordCloud, and some manual adjustment will often be necessary.
+Once you have [[http://www.baderlab.org/Software/EnrichmentMap/UserManual|created your Enrichment Map]], select “Annotate Clusters” from Enrichment Map under the Apps menu at the top of the screen to begin annotating.
-Line 11:
+Line 15:
-== Specifying gene set descriptions ==
You will need to select a column from the gene set descriptions drop down menu. This column is used by WordCloud to create a cloud from which the labels for the clusters are generated.
+{{attachment:Annotate Cluster Menu.png|Annotate Clusters Menu}}
-Line 14:
+Line 17:
-== Clustering ==
By default, the MCL (Markov Cluster) algorithm is used by clusterMaker. Alternatively, if you want to tune the parameters of clusterMaker yourself, you can run clusterMaker and select the column outputted by clusterMaker by clicking on the “Select cluster column” radio button in the “Advanced Clustering Options panel. These columns must be either integers, or lists of integers.
+This will open an Annotation Panel in the Cytoscape Control Panel, on the left of the window.
-Line 17:
+Line 19:
-== Laying out clusters ==
Selecting the “Layout nodes by cluster” option will rearrange the nodes in your network, grouping them by network and applying a Prefuse Force Directed Layout to each group. This reduces the amount of overlap between clusters and makes the annotations easier to read and interpret. Alternatively, repositioning can be done manually by selecting clusters in the table, dragging the nodes to a new location, and pressing the “Update” button. Multiple clusters can be selected by doing a Command+click for Mac users or Ctrl+click for other operating systems.
+{{attachment:2 - Annotation Panel.png|Annotation Panel}}
-Line 20:
+Line 21:
-== Changing labels ==
The text labels for each cluster can be manually adjusted. It is recommended to look at the WordClouds when doing this, by selecting “WordCloud” in the “Autofocus Preferences” menu. The parameters of WordCloud, such as specifying words to exclude, delimiters, and stemming, can be adjusted in its panel. For these changes to take effect, you must update both the WordCloud and the annotations. You can also manually edit the labels, by double-clicking on the cluster in the table, and changing the text.
+In this panel, you can specify certain parameters for the annotation. Once you are satisfied with the parameters (optionally need to be specified, you can annotate using defaults), pressing the 'Annotate!' button will create a new annotation set. This will run clusterMaker to find the clusters in your network, create a WordCloud for each cluster, and create labels for each cluster.
 Line 23:
-== Frequency Asked Questions ==
 * '''''How are the labels computed?''''' The labels are computed by gathering all of the descriptions of each gene set and passing them to WordCloud. WordCloud parses these and assigns sizes to each word, proportional to the frequency of the words in the cluster, optionally normalized to consider how frequently the words occur in the entire network. These words are then sorted in order of size. One at a time, the label adds the largest word in the cloud until the label is 4 words long, or the next largest word is considerably smaller than the most recent word.
+=== Gene set descriptions ===
Words contained in the gene set description column will be used to automatically generate labels. Usually these will be under the GS_DESCR column of the 

=== Clustering ===
You can select a clustering algorithm from the dropdown menu. These algorithms will run in clusterMaker using their default parameters. [[http://micans.org/mcl/|MCL clustering]] is recommended by default. For more refined clustering, you can run clusterMaker on your own and then specify the column that it creates with your clusters (these will usually look something like 'mclCluster'). You can also specify your own clusters by entering it into a new column and selecting it here. 

=== Repositioning nodes ===
Often there will be significant clutter between the clusters, which can potentially make the labels harder to read. Selecting the 'Layout nodes by cluster' checkbox will rearrange the nodes in your network to isolate each cluster.

=== Creating Groups ===
Selecting the 'Create Groups for clusters' checkbox will create Cytoscape Groups from the nodes in each cluster. These will allow you to collapse each cluster into a single node.

=== WordCloud Parameters ===
To set the [[Software/WordCloudPlugin/ParameterTutorial|parameters of WordCloud]] globally for an entire annotation set, do this before creating the Annotation Set in the 'WordCloud' panel. It is recommended to turn on network normalization, under 'Advanced', which will help make labels more specific.

The output for the example network, using the default parameters, will look like this:

{{attachment:3 - Output.png|Default output|width=1200}}

== Repositioning clusters ==
Selecting the 'Layout nodes by cluster' option will rearrange the nodes in your network, grouping them by network and applying a Prefuse Force Directed Layout to each group. This reduces the amount of overlap between clusters and makes the annotations easier to read and interpret. Alternatively, repositioning can be done manually by selecting clusters in the table, dragging the nodes to a new location, and pressing the 'Update' button. Multiple clusters can be selected and moved together by doing a Command+click for Mac users or Ctrl+click for other operating systems.

== Editing clusters ==
If you are unsatisfied with the clusters in your network, it is easy to change them by using the buttons at the bottom of the Annotation Panel.

=== Creating new clusters ===
Selecting a group of nodes and pressing the 'Extract' button will create a new cluster from the selected node(s). If you are using discrete clustering (rather than fuzzy clustering) then the selected nodes will be removed from their previous clusters.

=== Merging clusters ===
Selecting two or more clusters from the table (Ctrl/Command + click) and pressing 'Merge' will create one cluster from the selected clusters.

=== Deleting clusters ===
Selecting one or more clusters from the table and pressing 'Delete' will remove the selected clusters from the network.

== Display Options ==
After creating your annotations, there are several visual options that you can adjust in the 'Annotation Display Options Panel' on the right side of the screen in the 'Results Panel'.

=== Editing labels ===
Because the labels are computed automatically, they will often need some manual adjustment or reordering of words. This can be done easily, by double clicking on the name of a cluster in the table on the Annotation Panel, typing a new name, and pressing enter. It is recommended to look at the WordClouds when doing this, by selecting WordCloud under 'Show on selection:' in the Display Options menu.
== Frequently Asked Questions ==
 * '''''How are the labels computed?''''' The labels are computed by gathering all of the descriptions of each gene set and passing them to WordCloud. WordCloud parses these and assigns sizes to each word, proportional to the frequency of the words in the cluster, optionally normalized to consider how frequently the words occur in the entire network. These words are then sorted in order of size. One at a time, the largest word in the cloud is added until the label is longer than the maximum label length (by default four words). The labels also can end when the next largest word is considerably smaller than the most recent word. An advantage is given to words in the same WordCloud groups (colours) as words already added to the label, and words coming from the most central node in the cluster. ‘Considerably smaller’ is defined as below a specified fraction of the previous word’s size - by default set to be 0.3 from the first word to the second, 0.8 from second to third, and 0.9 from third to fourth. All of these parameters can be specified by selecting 'Adjust Label Options' in the 'Annotation Display Options Panel' on the right side of the screen.

 * '''''Why are some of the WordClouds blank, but still giving labels?''''' Sometimes, WordCloud assigns a font size of 0 to words appearing infrequently. The annotation process will take this into consideration and still create a label out of these words.

 * '''''Why aren’t all of the clusterMaker algorithms available?''''' Only some of the algorithms in clusterMaker produce output of the format that can be parsed and used to partition the nodes for annotations.

== Downloads ==
 *[[attachment:EM AutoAnnotate.zip|Enrichment Map, WordCloud, and ClusterMaker]]
 *[[attachment:estrogen_GSEA_Results_12h_24hr.zip|Sample Enrichment Map data set to annotate]]
-Line 36:
+Line 83:
+Groups are deleted upon saving
 *This is necessary to work around a bug in Cytoscape