16074
Comment:
|
19921
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
= MCODE User's Manual = | = MCODE Documentation = |
Line 10: | Line 10: |
To use the MCODE !PlugIn, you must first obtain a copy of Cytoscape. The compatible MCODE and Cytoscape versions are outlined in the downloads section on the [:Software/MCODE: MCODE website]. The lastest MCODE, version 1.2, requires Cytoscape 2.3.2 or later. | To use the MCODE plugin, you must first obtain and install Cytoscape. The compatible MCODE and Cytoscape versions are outlined in the downloads section on the [:Software/MCODE: MCODE website]. The lastest MCODE, version 1.2, requires Cytoscape 2.3.2 or later. |
Line 12: | Line 12: |
Once you have downloaded Cytoscape and verified that it works, proceed with the next steps: | Once you have downloaded and installed Cytoscape and verified that it works: |
Line 14: | Line 14: |
1. Start Cytoscape. For example | 1. Start Cytoscape. This can be done by double-clicking the newly created Cytoscape icon or via commandline: |
Line 18: | Line 18: |
* If it does not, then you likely placed the MCODE.jar file in the wrong directory. Repeat step one. You will have to restart Cytoscape to reload the plugin. | * If it does not, then you likely placed the MCODE.jar file in the wrong directory. Verify that you have completed step one. You will have to restart Cytoscape to reload the plugin. |
Line 25: | Line 25: |
1. Go to the Plugins Menu 1. Move the mouse over MCODE |
1. Go to the Plugins Menu and select the MCODE sub-menu |
Line 28: | Line 27: |
* The main MCODE interface will appear as a tab in the left-hand panel of Cytoscape Note: If MCODE does not appear in the Plugins Menu, then the installation of MCODE was not successfull. Please refer to the previous section of the User's Manual. |
* The main MCODE interface will appear as a tab in the left-hand panel of Cytoscape (Note: If MCODE does not appear in the Plugins Menu, then revisit the installation section above.) |
Line 31: | Line 29: |
This menu provides two additional options: | The MCODE sub-menu provides two additional options: |
Line 33: | Line 31: |
* A quick reference to MCODE acknowledgements, citation information, and a link to the MCODE paper. | * Shows the MCODE credits, citation information, and a link to the MCODE paper. |
Line 35: | Line 33: |
* A link that will open the MCODE website at [:Home: www.baderlab.org] in your default browser for quick access to downloads, contact information, and the User's Manual with help and tutorials. | * Links to the MCODE website at [:Home: www.baderlab.org] in your default web browser for quick access to downloads, contact information, and the User's Manual with help and tutorials. |
Line 37: | Line 35: |
''New to Version 1.2: MCODE can now be seen as a console for the underlying algorithm. It can be started independent of network loading. This provides for a more easily accessible, repeatable and modifyable analysis.'' | ''New to Version 1.2: MCODE can now be seen as a console for the underlying algorithm. It can be started independent of network loading. This provides for a more easily accessible, repeatable and modifiable analysis.'' (TODO: rephrase this, I'm not sure what it means) |
Line 43: | Line 41: |
The '''MCODE Main Panel''' is the starting point for analysis. It contains two main sections: '''Scope''' and '''Advanced Options'''. The latter is intended for fine-tuning of results by experienced users who are familiar with the MCODE Paper. It will be discussed in the next secton. This section will cover some of the basic steps of running MCODE on a network. | The '''MCODE Main Panel''' is the starting point for analysis. It contains two main sections: '''Find Cluster(s)''' and '''Advanced Options'''. The latter option is intended for fine-tuning of results by experienced users who are familiar with the MCODE paper (discussed in the next section). This section covers some of the basic steps of running MCODE on a network. |
Line 45: | Line 43: |
1. '''LOAD YOUR NETWORK.''' To begin, make sure the network to be analyzed is loaded into Cytoscape. You can load as many networks as your computer system can handle, large or small. MCODE will recognize which network you wish to analyze either by noting which network view is on top or by your selection of the network in the Network Tab on the left-hand panel. | || attachment:screenshot_main_panle.tiff || |
Line 47: | Line 45: |
1. '''CHOOSE THE SCOPE.''' Cluster results can be reported in two fundamental ways with MCODE. This is referred to as the '''Scope''' of the process. | 1. '''LOAD YOUR NETWORK.''' To begin, load the network to be analyzed into Cytoscape. You can load as many networks as your computer system can handle, large or small. MCODE will analyze either the network view that is on top or one selected in the Network Tab on the left-hand panel. 1. '''CHOOSE THE SCOPE.''' Clusters can be found in the entire network or from a selection of nodes. Chose an option in the '''Find Cluster(s)''' section of the main MCODE panel. |
Line 51: | Line 51: |
* Only those clusters including the selected node(s) as one of their members will be reported. * Selections can be made either in the view directly or Cytoscape's handy search tool. . The choice of scope is dependent on the your familiarity with the network in question and the desired outcome. Having a particular protein of interest within a network, for example, it may be appropriate to search for only those clusters involving such a protein. On the other hand, uniformed, exploratory work will most benefit from a whole network scope. |
* Only those clusters which include the selected node(s) within them will be reported. * Selections can be made either in the view, the Node Attribute Browser or Cytoscape's handy search tools. . The choice of scope is dependent on your familiarity with the network in question and the desired outcome. If you have a particular protein of interest within a network, for example, it may be appropriate to search for only those clusters involving such a protein. On the other hand, exploratory work will benefit most from analyzing the whole network. |
Line 55: | Line 55: |
1. '''ANALYZE YOUR NETWORK.''' Next, press '''Analyze'''. This will display a task monitor reporing the progress of the scoring, finding, and drawing steps, provided that the task is not too quick. | 1. '''ANALYZE YOUR NETWORK.''' Next, press '''Analyze'''. This will display a progress meter of the scoring, finding, and drawing steps, provided that the task is not too quick. |
Line 58: | Line 58: |
* This means that MCODE failed to detect a loaded network for analysis. You must load a network, and make sure it is selected, before you can anaylze it. | * This means that MCODE failed to detect a loaded network for analysis. You must load a network, and make sure it is selected, before you can analyze it. |
Line 60: | Line 60: |
* This message appears when the Selection Scope is used without an actual selection. You must select the desired node(s) before you MCODE can attempt to find clusters. | * This message appears when the Selection scope is used without an actual selection. You must select the desired node(s) before MCODE can attempt to find clusters. |
Line 62: | Line 62: |
* Parameters are discussed in detail in the following section. For now, you should know that if you attempt to analyze a network twice without changing any of the settings, such as scope, MCODE will let you know that this analysis was previously conducted and will consequently display the previously attained results. | * Parameters are discussed in detail in the following section. For now, you should know that if you attempt to analyze a network twice without changing any of the settings, such as the scope of the analysis, MCODE will let you know that this analysis was previously conducted and will just display the previously obtained results. |
Line 64: | Line 64: |
1. '''BROWSE YOUR RESULTS.''' If everything goes to plan, a new tab will appear in the right-hand panel displaying the results as "Result 1" | 1. '''BROWSE YOUR RESULTS.''' If analysis finds clusters, a new tab will appear in the right-hand panel displaying the result as "Result 1" |
Line 66: | Line 66: |
* On the left side is a graphical representation of the cluster. | * On the left side, in the '''Network''' column, is a graphical representation of the cluster. |
Line 68: | Line 68: |
* The highest scoring node in the cluster is called the '''Seed'''. It is the node from which the cluster was derived and is represented by a square shape. * Other cluster members are circular. |
* The highest scoring node in the cluster is called the '''Seed'''. It is the node from which the cluster was derived and is represented as a square. * Other cluster members are circles. |
Line 72: | Line 72: |
* On the right side is a statisical summary of the cluster. * '''Rank''' is based on the cluster's computed '''Score''' and is used to ID the clusters within each result set. * For example, Cluster 1 is the highest ranked cluster in a given result set, and thus, at the top of the list. * '''Nodes''' and '''Edges''' is a simple enumeration of the clusters members and their interconnections. * These results can be discarded at any time by pressing the '''Close''' button at the bottom of the results panel. |
* On the right side, in the '''Details''' column, is a statistical summary of the cluster. * '''Rank''' is based on the cluster's computed '''Score''' and is used to identify the clusters within each result. * For example, Cluster 1 is the highest ranked cluster in a given result, and thus, at the top of the list. * '''Nodes''' and '''Edges''' is a simple enumeration of the cluster's members and their interconnections. * Results can be discarded at any time by pressing the '''Discard Result''' button at the bottom of the panel. |
Line 78: | Line 78: |
* If the network being analyzed has a view, MCODE will apply a custom visual style utilizing two MCODE generated node attributes. | * If the network being analyzed has a view, MCODE will apply a custom visual style utilizing two MCODE generated node attributes as soon as the network is analyzed. |
Line 80: | Line 80: |
* Square: seed (highest score in the cluster) | * Square: seed (highest scoring node in the cluster) |
Line 86: | Line 86: |
* '''MCODE_Cluster:''' This is an additional list type attribute that indicates which cluster the node belongs to. The MCODE visual style does not make use of it, but it is there in case it may be of some use. Note that if the '''Fluff''' parameter (discussed in the following section) is turned on, some nodes may belong to more than one cluster. | * '''MCODE_Cluster:''' This is an additional list type attribute that indicates which cluster the node belongs to. The MCODE visual style does not use this attribute, but it exists should you need it. Note that if the '''Fluff''' parameter (discussed in the following section) is turned on, some nodes may belong to more than one cluster. |
Line 88: | Line 88: |
* The clusters in the cluster browser table are selectable and will automatically select the corresponding nodes in the network view (if it exists). If no network view is available, the selected nodes can be reviewed in the Cytoscape native node attribute browser. * Secondly, a new '''Cluster Exploration Panel''' will appear below the Cluster Browser. This panel can be collapsed for now -- it's use will be discussed in the Exploring Results section of this Manual. |
* The clusters in the cluster browser table are selectable and will automatically select the corresponding nodes in the network view (if it exists). If no network view is available, the selected nodes can be reviewed in the Cytoscape Node Attribute Browser. * Secondly, a new '''Cluster Exploration Panel''' will appear below the Cluster Browser titled "'''Explore: Cluster [Rank]'''". This panel can be collapsed for now -- its use will be discussed in the '''Exploring Results''' section of this Manual. |
Line 91: | Line 91: |
This is a screeen shot of the MCODE Main Panel and Cluster Browser | The MCODE Main Panel and MCODE Result Panel containing the Cluster Browser |
Line 98: | Line 98: |
By default, MCODE analyzes networks using scoring and finding parameters that have been optimized to produce the best results for the average user and network. However, you may benefit greatly by familiarizing yourself with these parameters. Sometimes even slight customizations can produce considerable differences, reduce unwanted or false results, and increase relevance to your network. This is only an overview -- for a detailed insight into how these parameters function, it is best to consult the MCODE paper. | By default, MCODE analyzes networks using scoring and finding parameters that have been optimized to produce the best results for the average user and network. However, you may achieve better results for your network by familiarizing yourself with these parameters and changing them appropriately. Sometimes even slight customizations can produce considerable differences, reduce unwanted or false results, and increase relevance of results. This is only an overview -- for detailed parameter information, consult the MCODE paper. |
Line 106: | Line 106: |
* This value controls the minimum degree (number of connections) necessary in order for a node to be scored. For example, nodes that share only one connection with one other node have a degree of 1. Valid values are 2 or higher to prevent singly connected nodes from getting an artificially high node score. | * This value controls the minimum degree (number of connections) necessary in order for a node to be scored. For example, nodes that share only one connection with one other node have a degree of 1. Valid values are 2 or higher to prevent singly connected nodes from getting an artificially high node score. |
Line 110: | Line 110: |
1. '''Node Score Cutoff''' * This is the most influential parameter for cluster size and is the basis for the '''Size Threshold Slider''' in the '''Exploring Results''' section. During cluster expansion, new members are added only if their node score deviates from the cluster's seed node's score by less than the set cutoff. This is a percentage, where a value of 0.2 allows for new members' node scores to be no more than 20% less than that of the seed node. Thus, smaller values create smaller clusters and vice versa. |
|
Line 111: | Line 114: |
* Once a cluster has been found, some nodes which may have satisfied the Degree Cutoff parameter during scoring may only be connected to the cluster by one edge. When haircut is turned on, MCODE removes all such singly-connected nodes from clusters. | * Once a cluster has been found, some nodes which may have satisfied the Degree Cutoff parameter during scoring may only be connected to the cluster by one edge. When haircut is turned on, MCODE removes all such singly-connected nodes from clusters. |
Line 114: | Line 117: |
* When turned on, MCODE expands cluster cores by one neighbour shell outwards. This is applied after the optional haircut step and within the confines of the Node Density Cutoff parameter. | * When turned on, MCODE expands cluster cores by one neighbour shell outwards, according to the fluff Node Density Cutoff parameter and after the optional haircut step. |
Line 117: | Line 120: |
* Node density is calculated by dividing the node's connections by the maximum number of connections possible for that node. If Fluff is turned on, this parameter controls the neighbour inclusion criteria. Fluff expansion occurs after the cluster has already been defined by the algorithm and thus allows clusters to overlap at their edges. A higher value will expand clusters more. 1. '''Node Score Cutoff''' * This is the most influential parameter for cluster size and is the basis for the '''Size Slider''' in the '''Exploring Results''' section. During cluster expansion, this cutoff prevents new members from being added if their node score deviates from the cluster's seed node's score by more than the parameter allows. It is taken as a percentage where a value of 0.2 allows for new members' node scores to be no more than 20% less than that of the seed node. Thus, smaller values create smaller clusters and vice versa. |
* Node density is calculated by dividing the node's connections by the maximum number of connections possible for that node. If Fluff is turned on, this parameter controls the neighbour inclusion criteria during 'fluffing'. Fluff expansion occurs after the cluster has already been defined by the algorithm and thus allows clusters to overlap at their edges. A higher value will expand clusters more. |
Line 126: | Line 126: |
* Maximum depth limits the distance from the seed node whithin which MCODE can search for cluster members. By default this is set to an arbitrarily large number so that clusters are virtually unlimited. To limit cluster size, set this parameter to a small number. | * Maximum depth limits the distance from the seed node within which MCODE can search for cluster members. By default this is set to an arbitrarily large number so that clusters are virtually unlimited. To limit cluster size, set this parameter to a small number. |
Line 129: | Line 129: |
''New to Version 1.2: The user can now analyze a network as many times as desired by modifying the parameters. Each result set is stored sequentially for reference and comparison. Viewing different result sets will automatically rewrite the MCODE node attributes and revisualize the network. Note that MCODE can independently determine which portion of the algorithm needs to be conducted based on the user's parameter modifications. If the scoring parameters are altered, the given network will be rescored. If only the cluster finding parameters are altered, only the cluster finding portion will be conducted.'' | ''New to Version 1.2: The user can now analyze a network as many times as desired by modifying the parameters. Each result is stored sequentially for reference and comparison. Viewing different results will automatically rewrite the MCODE node attributes and revisualize the network appropriately. Note that MCODE automatically determines which portion of the algorithm needs to be run based on the user's parameter modifications. For instance, if the scoring parameters are altered, the network will be rescored, but if only the cluster finding parameters are altered, only the cluster finding portion will be run.'' |
Line 133: | Line 133: |
== Exploring Results in Real-Time == | == Exploring Results Interactively == |
Line 135: | Line 135: |
In addition to fine-tuning a multitude of parameters to enhance the analysis process, MCODE provides a real-time cluster exploration feature. This can be divided into two components: exploring cluster boundaries and exploring cluster content. The first exploration allows you to expand or reduce the cluster based on the node score using the '''Size Slider'''. The second is the '''Node Attribute Enumerator''' which provides a summary of the cluster's node attributes and their occurances in the cluster. Together they can inform the user about the cluster's "natural" boundaries in the context of the network and ensure functional consistancy. These are both explained in greater detail bellow. | In addition to fine-tuning a multitude of parameters to enhance the analysis process, MCODE provides a real-time cluster exploration feature. This can be divided into two components: exploring cluster boundaries and exploring cluster content. The first exploration allows you to expand or reduce the cluster based on the node score using the '''Size Threshold Slider'''. The second is the '''Node Attribute Enumerator''' which provides a summary of the cluster's node attributes and their frequency in the cluster. Together they can inform the user about the cluster's "natural" boundaries in the context of the network and ensure functional consistency. These are both explained in greater detail below. |
Line 137: | Line 137: |
=== Size Slider === | === Size Threshold Slider === |
Line 139: | Line 139: |
The slider scale ranges from '''Min''' to '''Max''' and has a marker ('''^''') for the initial position. The main parameter being manipulated is the '''Node Score Cutoff''' which, as previously mentioned, is the most influential cluster size modifier. As such, the initial position marker indicates the Node Score Cutoff value you have entered in the Finding Parameters. In moving the slider, the Node Score Cutoff is set to 0 at Min and 100 at Max, however there are several notable differences between the functions of the Size slider and the Node Score Cutoff Finding Parameter. | The slider scale ranges from '''Min''' to '''Max''' and has an 'origin' marker ('''^''') for its starting position. '''Node Score Cutoff''', which is the most influential cluster size determinant is controlled by the slider. As such, the initial position marker indicates the Node Score Cutoff value originally set in the Finding Parameters section. When moving the slider, the Node Score Cutoff is set to 0 at Min and 100 at Max, however there are several notable differences between the functions of the Size Threshold Slider and the Node Score Cutoff Finding Parameter. 1. During exploration, the cluster is reevaluated without the requirements of satisfying the K-Core parameter. Thus, moving the slider leftwards of the initial position allows the cluster to be reduced to only the seed node. 1. During exploration in the Max direction, the cluster is 'unaware' of other clusters. Unlike in the analysis where every subsequent attempt at finding a cluster is only allowed to expand around previously found clusters, the slider expands the cluster despite adjacent cluster borders. Thus, moving the slider rightwards of the initial position allows the cluster to be expanded to as much as the whole network. * However, the 'awareness' of other clusters is intact in the range between the 'origin' marker and Min to allow the cluster to return to its original size. |
Line 141: | Line 144: |
Firstly, during exploration, the cluster is reevaluated without the requirements of satisfying the K-Core parameter. Thus, moving the slider leftwards of the initial position allows the cluster to be reduced to only the seed node. The second difference is that during exploration in the Max direction, the cluster is "unaware" of other clusters. Unlike in the alrogirthm where every subsequent attempt at finding a cluster is only allowed to grow around previously found clusters, the slider expands the cluster despite adjacent cluster borders. Thus, moving the slider rightwards of the initial position allows the cluster to be expanded to as much as the whole network. However, the "awareness" of other clusters is intact in the range between the marker and Min to allow the cluster to return to its original content. Haircut and Fluff are applied afterwards if they were turned on in the production of the given result set. | Haircut and Fluff are applied after slider movement if they were turned on in the original production of the result that is being explored. |
Line 143: | Line 146: |
In response the the slider, the Cluster Browser will be updated with the new cluster's graph and details (number of nodes and edges and new cluster score). The node selection in the main network view will also be updated. Since clusters can expand to large and sometimes unreasonable sizes, the layouter may need extra time to complete its task. When this occurs, a loader and progress bar will appear in the Cluster Browser. There is no need to wait for the graph to be drawn, the cluster details and node selections will remain responsive to the slider's movements. If the new cluster exceeds 100 nodes a place holder will be drawn instead since the graph representation will take too long to compute and will be of little value. | In response to the slider, the Cluster Browser is updated with the new cluster's network graphic and details (number of nodes and edges and new cluster score). The node selection in the main network view will also be updated. Since clusters can expand to large and sometimes unreasonable sizes, the layouter may need extra time to complete its task. When this occurs, a loader and progress bar will appear in the Cluster Browser. There is no need to wait for the cluster to be drawn, the details and node selections will remain responsive to the slider's movements. If the new cluster exceeds 300 nodes, a place holder ("'''Too big to show'''") will be drawn instead since the graphic representation will take too long to compute and will likely be too crowded to be of any real value. |
Line 146: | Line 149: |
* When exploring a lower ranked cluster (further down the list) it is likely that the cluster's content is quite dependent on the content of higher ranked clusters. This is merely a probability and not a rule since the finding process starts at the highest scoring nodes while clusters are ranked afterwards based on their size and connectivity -- higher scoring seed nodes may not produce higher scoring clusters. Given that, when expanding a cluster, there may be an unexpected initial discontinuity in size since the size slider will ignore the presence of other clusters. If the cluster was produced around a low-scoring seed node then more nodes are likely to satisfy the Node Score Cutoff parameter. Such situations can indicate that the cluster in question may be part of a larger cluster'''???????what else??????????'''. * Sometimes, on the other hand, moving the Size Slider a long distance may produce no changes in cluster size. In such cases, the seed node's score is so high compared to its proximal neighbourhood that the Node Score Cutoff must be increased greatly to include much lower scoring members. This indicates that the cluster is more or less well separated from the surrounding network by a local peak in node scores and as such, it is likely a well defined cluster. * Lastly, if no changes occur during size exploration, the cluster in question must not be connected to the larger network. |
1. '''Cluster Size Explosion''' * When exploring a lower ranked cluster (further down the list) the cluster's size may depend heavily on nearby higher ranked clusters. This may not always occur since the finding process starts at the highest scoring nodes while clusters are ranked afterwards based on their size and connectivity -- higher scoring seed nodes may not produce higher scoring clusters. Given that, when expanding a cluster, there may be an unexpected initial discontinuity in size since the Size Threshold Slider will ignore the presence of other clusters. If the cluster was produced around a low-scoring seed node then more nodes are likely to satisfy the Node Score Cutoff parameter. Such situations can indicate that the cluster in question may be part of a larger cluster. 1. '''Slider Dead-Zone''' * Sometimes, on the other hand, moving the Size Threshold Slider a long distance may produce no changes in cluster size. In such cases, the seed node's score is so high compared to its immediate neighbourhood that the Node Score Cutoff must be increased greatly to include much lower scoring members. This indicates that the cluster is more or less well separated from the surrounding network by a local peak in node scores and as such, it is likely a well defined cluster. 1. '''No Change''' * Lastly, if no changes occur during size exploration, the cluster in question is likely not connected to the larger network and as such cannot expand. |
Line 151: | Line 157: |
The Enumerator provides a numerical summary of node attribute values possessed by the currently explored cluster's members. It is meant to inform the user of the cluster's contents and aid in determining the cluster's functional relevance. All node attributes that are available for the loaded network are listed in the select box. When an attribute selection is made in one exploration, it persists for all cluster explorations within the given result. The table below the select box has two columns, '''Value''' and '''Occurrence'''. The Value column lists all node attribute values that are possessed by the cluster being explored. With a simple string type attribute, such as MCODE_Node_Status, this list well never exceed the number of cluster members since every member can have only one value and some values may be shared by several members. However, list type attributes such as Gene Ontology (GO) terms may outnumber the cluster members since each member can have numerous values. The Occurrence column simply displays the number of nodes possessing the particular attribute value listed in each row. The Enumerator table orders the list by the frequency in descending order where the most commonly occurring attribute value is listed on top. The Occurrence numbers are best interpreted when compared with the number of nodes in the cluster. For example, when enumerating Biological Process GO Terms, it may be a good indicator that the given cluster is biologically relevant if 9 of the 10 cluster members share some specific value. In combination with the Size Threshold Slider, the Enumerator can be used to optimize clusters based on functional relevance. As the slider is being manipulated the Enumerator will automatically report changes in cluster content for the selected attribute. As such one can hone in on a size that, for example, reduces nodes with attribute values that are unrelated to some particular value of interest which is simultaneously maximized. At this stage of MCODE development, the Node Attribute Enumerator is a precursor to more automated methods of accomplishing similar attribute-enhanced clustering and statistical reporting. |
|
Line 153: | Line 170: |
=== Create Sub-Network === Clusters can be output as sub or child-networks of the original network by clicking the '''Create Sub-Network''' button located on the cluster exploration panel which is opened when a cluster is selected in the Cluster Browser. ''New to Version 1.2: Since exploration allows for a cluster size to change, the user can now create as many sub-networks of the same cluster as desired. New networks are named by their result set, cluster rank and score, for example: '''Result 1: Cluster 1 (Score 4.3)'''.'' === Export as Text === Clusters can be summarized in a time-stamped tab-delimited text file consisting of: * Cluster rank * Cluster score (density multiplied by the number of members) * Number of nodes * Number of edges * Cluster member IDs (comma-delimited) The parameters used in scoring and finding the exported result are included in the file as well for future reference. The default parameter settings appear as: * Network Scoring: Include Loops: false Degree Cutoff: 2 K-Core: 2 * Cluster Finding: Node Score Cutoff: 0.2 Haircut: true Fluff: false Max. Depth from Seed: 100 |
|
Line 157: | Line 193: |
TODO: create a simple worked example, based on an example network that is shipped with Cytoscape - see the NetMatch user manual. Also, link to the MCODE tutorial on the main Cytoscape page - http://www.cytoscape.org/tut/modules.complexes.php |
MCODE Documentation
Installation
To use the MCODE plugin, you must first obtain and install Cytoscape. The compatible MCODE and Cytoscape versions are outlined in the downloads section on the [:Software/MCODE: MCODE website]. The lastest MCODE, version 1.2, requires Cytoscape 2.3.2 or later.
You can download a copy of Cytoscape from: http://www.cytsoscape.org.
Once you have downloaded and installed Cytoscape and verified that it works:
- Copy the MCODE.jar file to your [Cytoscape_Home]/plugins directory.
- Start Cytoscape. This can be done by double-clicking the newly created Cytoscape icon or via commandline:
- On Unix/Linux or MacOS X, run: cytoscape.sh
- On Windows, run: cytoscape.bat
- Check that MCODE appears in the Plugins menu of Cytoscape
- If it does not, then you likely placed the MCODE.jar file in the wrong directory. Verify that you have completed step one. You will have to restart Cytoscape to reload the plugin.
Running MCODE
MCODE is an extension for Cytoscape and can only be accessed through Cytoscape.
- Start Cytoscape
- Go to the Plugins Menu and select the MCODE sub-menu
Click Start MCODE
- The main MCODE interface will appear as a tab in the left-hand panel of Cytoscape (Note: If MCODE does not appear in the Plugins Menu, then revisit the installation section above.)
The MCODE sub-menu provides two additional options:
About MCODE
- Shows the MCODE credits, citation information, and a link to the MCODE paper.
Help
- Links to the MCODE website at [:Home: www.baderlab.org] in your default web browser for quick access to downloads, contact information, and the User's Manual with help and tutorials.
New to Version 1.2: MCODE can now be seen as a console for the underlying algorithm. It can be started independent of network loading. This provides for a more easily accessible, repeatable and modifiable analysis. (TODO: rephrase this, I'm not sure what it means)
Getting and Interpreting Results
The MCODE Main Panel is the starting point for analysis. It contains two main sections: Find Cluster(s) and Advanced Options. The latter option is intended for fine-tuning of results by experienced users who are familiar with the MCODE paper (discussed in the next section). This section covers some of the basic steps of running MCODE on a network.
attachment:screenshot_main_panle.tiff |
LOAD YOUR NETWORK. To begin, load the network to be analyzed into Cytoscape. You can load as many networks as your computer system can handle, large or small. MCODE will analyze either the network view that is on top or one selected in the Network Tab on the left-hand panel.
CHOOSE THE SCOPE. Clusters can be found in the entire network or from a selection of nodes. Chose an option in the Find Cluster(s) section of the main MCODE panel.
- Find Clusters in Whole Network
- MCODE will find and report all clusters in the entire network.
- Find Clusters from Selection
- Only those clusters which include the selected node(s) within them will be reported.
- Selections can be made either in the view, the Node Attribute Browser or Cytoscape's handy search tools.
- The choice of scope is dependent on your familiarity with the network in question and the desired outcome. If you have a particular protein of interest within a network, for example, it may be appropriate to search for only those clusters involving such a protein. On the other hand, exploratory work will benefit most from analyzing the whole network.
- Find Clusters in Whole Network
ANALYZE YOUR NETWORK. Next, press Analyze. This will display a progress meter of the scoring, finding, and drawing steps, provided that the task is not too quick.
- You may see several different messages at this step:
- The "No network" message
- This means that MCODE failed to detect a loaded network for analysis. You must load a network, and make sure it is selected, before you can analyze it.
- The "No selection" message
- This message appears when the Selection scope is used without an actual selection. You must select the desired node(s) before MCODE can attempt to find clusters.
- The "Parameters unchanged" message
- Parameters are discussed in detail in the following section. For now, you should know that if you attempt to analyze a network twice without changing any of the settings, such as the scope of the analysis, MCODE will let you know that this analysis was previously conducted and will just display the previously obtained results.
- The "No network" message
- You may see several different messages at this step:
BROWSE YOUR RESULTS. If analysis finds clusters, a new tab will appear in the right-hand panel displaying the result as "Result 1"
Cluster Browser:
On the left side, in the Network column, is a graphical representation of the cluster.
- Cluster members are coloured red.
The highest scoring node in the cluster is called the Seed. It is the node from which the cluster was derived and is represented as a square.
- Other cluster members are circles.
- Edges, representing interactions for example, are blue.
- Edge directionality is represented by cyan arrows.
On the right side, in the Details column, is a statistical summary of the cluster.
Rank is based on the cluster's computed Score and is used to identify the clusters within each result.
- For example, Cluster 1 is the highest ranked cluster in a given result, and thus, at the top of the list.
Nodes and Edges is a simple enumeration of the cluster's members and their interconnections.
Results can be discarded at any time by pressing the Discard Result button at the bottom of the panel.
Network View:
- If the network being analyzed has a view, MCODE will apply a custom visual style utilizing two MCODE generated node attributes as soon as the network is analyzed.
MCODE_Node_Status: Node shapes indicate the cluster status of the nodes.
- Square: seed (highest scoring node in the cluster)
- Circle: clustered
- Diamond: unclustered
MCODE_Score: Node colors represent the node score.
- A range from black to red indicates the MCODE computed node score (lowest to highest, respectively).
- White indicates a score of zero.
MCODE_Cluster: This is an additional list type attribute that indicates which cluster the node belongs to. The MCODE visual style does not use this attribute, but it exists should you need it. Note that if the Fluff parameter (discussed in the following section) is turned on, some nodes may belong to more than one cluster.
- If the network being analyzed has a view, MCODE will apply a custom visual style utilizing two MCODE generated node attributes as soon as the network is analyzed.
Cluster Selection:
- The clusters in the cluster browser table are selectable and will automatically select the corresponding nodes in the network view (if it exists). If no network view is available, the selected nodes can be reviewed in the Cytoscape Node Attribute Browser.
Secondly, a new Cluster Exploration Panel will appear below the Cluster Browser titled "Explore: Cluster [Rank]". This panel can be collapsed for now -- its use will be discussed in the Exploring Results section of this Manual.
The MCODE Main Panel and MCODE Result Panel containing the Cluster Browser
- attachment:main_panel.gif attachment:cluster_browser.gif
Fine-Tuning Your Analysis
By default, MCODE analyzes networks using scoring and finding parameters that have been optimized to produce the best results for the average user and network. However, you may achieve better results for your network by familiarizing yourself with these parameters and changing them appropriately. Sometimes even slight customizations can produce considerable differences, reduce unwanted or false results, and increase relevance of results. This is only an overview -- for detailed parameter information, consult the MCODE paper.
Scoring Parameters
Include Loops
- When turned on, MCODE will include loops (self-edges) in the neighbourhood density calculation. This is expected to make a small difference in the results.
Degree Cutoff
- This value controls the minimum degree (number of connections) necessary in order for a node to be scored. For example, nodes that share only one connection with one other node have a degree of 1. Valid values are 2 or higher to prevent singly connected nodes from getting an artificially high node score.
Finding Parameters
Node Score Cutoff
This is the most influential parameter for cluster size and is the basis for the Size Threshold Slider in the Exploring Results section. During cluster expansion, new members are added only if their node score deviates from the cluster's seed node's score by less than the set cutoff. This is a percentage, where a value of 0.2 allows for new members' node scores to be no more than 20% less than that of the seed node. Thus, smaller values create smaller clusters and vice versa.
Haircut
- Once a cluster has been found, some nodes which may have satisfied the Degree Cutoff parameter during scoring may only be connected to the cluster by one edge. When haircut is turned on, MCODE removes all such singly-connected nodes from clusters.
Fluff
- When turned on, MCODE expands cluster cores by one neighbour shell outwards, according to the fluff Node Density Cutoff parameter and after the optional haircut step.
Node Density Cutoff
- Node density is calculated by dividing the node's connections by the maximum number of connections possible for that node. If Fluff is turned on, this parameter controls the neighbour inclusion criteria during 'fluffing'. Fluff expansion occurs after the cluster has already been defined by the algorithm and thus allows clusters to overlap at their edges. A higher value will expand clusters more.
K-Core
- This parameter filters out clusters that do not contain a maximally inter-connected sub-cluster of at least k degrees. For example, a triangle (3 nodes, 3 edges) is a 2-core (2 connections per node). Two nodes with 2 edges between them satisfy the 2-core rule as well. Since the default value is 2, this ensures that clusters must in the very least contain one of these two sub-clusters. Increasing this value will exclude smaller clusters.
Max. Depth
- Maximum depth limits the distance from the seed node within which MCODE can search for cluster members. By default this is set to an arbitrarily large number so that clusters are virtually unlimited. To limit cluster size, set this parameter to a small number.
New to Version 1.2: The user can now analyze a network as many times as desired by modifying the parameters. Each result is stored sequentially for reference and comparison. Viewing different results will automatically rewrite the MCODE node attributes and revisualize the network appropriately. Note that MCODE automatically determines which portion of the algorithm needs to be run based on the user's parameter modifications. For instance, if the scoring parameters are altered, the network will be rescored, but if only the cluster finding parameters are altered, only the cluster finding portion will be run.
Exploring Results Interactively
In addition to fine-tuning a multitude of parameters to enhance the analysis process, MCODE provides a real-time cluster exploration feature. This can be divided into two components: exploring cluster boundaries and exploring cluster content. The first exploration allows you to expand or reduce the cluster based on the node score using the Size Threshold Slider. The second is the Node Attribute Enumerator which provides a summary of the cluster's node attributes and their frequency in the cluster. Together they can inform the user about the cluster's "natural" boundaries in the context of the network and ensure functional consistency. These are both explained in greater detail below.
Size Threshold Slider
The slider scale ranges from Min to Max and has an 'origin' marker (^) for its starting position. Node Score Cutoff, which is the most influential cluster size determinant is controlled by the slider. As such, the initial position marker indicates the Node Score Cutoff value originally set in the Finding Parameters section. When moving the slider, the Node Score Cutoff is set to 0 at Min and 100 at Max, however there are several notable differences between the functions of the Size Threshold Slider and the Node Score Cutoff Finding Parameter.
- During exploration, the cluster is reevaluated without the requirements of satisfying the K-Core parameter. Thus, moving the slider leftwards of the initial position allows the cluster to be reduced to only the seed node.
- During exploration in the Max direction, the cluster is 'unaware' of other clusters. Unlike in the analysis where every subsequent attempt at finding a cluster is only allowed to expand around previously found clusters, the slider expands the cluster despite adjacent cluster borders. Thus, moving the slider rightwards of the initial position allows the cluster to be expanded to as much as the whole network.
- However, the 'awareness' of other clusters is intact in the range between the 'origin' marker and Min to allow the cluster to return to its original size.
Haircut and Fluff are applied after slider movement if they were turned on in the original production of the result that is being explored.
In response to the slider, the Cluster Browser is updated with the new cluster's network graphic and details (number of nodes and edges and new cluster score). The node selection in the main network view will also be updated. Since clusters can expand to large and sometimes unreasonable sizes, the layouter may need extra time to complete its task. When this occurs, a loader and progress bar will appear in the Cluster Browser. There is no need to wait for the cluster to be drawn, the details and node selections will remain responsive to the slider's movements. If the new cluster exceeds 300 nodes, a place holder ("Too big to show") will be drawn instead since the graphic representation will take too long to compute and will likely be too crowded to be of any real value.
Several peculiarities may arise during size exploration:
Cluster Size Explosion
- When exploring a lower ranked cluster (further down the list) the cluster's size may depend heavily on nearby higher ranked clusters. This may not always occur since the finding process starts at the highest scoring nodes while clusters are ranked afterwards based on their size and connectivity -- higher scoring seed nodes may not produce higher scoring clusters. Given that, when expanding a cluster, there may be an unexpected initial discontinuity in size since the Size Threshold Slider will ignore the presence of other clusters. If the cluster was produced around a low-scoring seed node then more nodes are likely to satisfy the Node Score Cutoff parameter. Such situations can indicate that the cluster in question may be part of a larger cluster.
Slider Dead-Zone
- Sometimes, on the other hand, moving the Size Threshold Slider a long distance may produce no changes in cluster size. In such cases, the seed node's score is so high compared to its immediate neighbourhood that the Node Score Cutoff must be increased greatly to include much lower scoring members. This indicates that the cluster is more or less well separated from the surrounding network by a local peak in node scores and as such, it is likely a well defined cluster.
No Change
- Lastly, if no changes occur during size exploration, the cluster in question is likely not connected to the larger network and as such cannot expand.
Node Attribute Enumerator
The Enumerator provides a numerical summary of node attribute values possessed by the currently explored cluster's members. It is meant to inform the user of the cluster's contents and aid in determining the cluster's functional relevance. All node attributes that are available for the loaded network are listed in the select box. When an attribute selection is made in one exploration, it persists for all cluster explorations within the given result.
The table below the select box has two columns, Value and Occurrence. The Value column lists all node attribute values that are possessed by the cluster being explored. With a simple string type attribute, such as MCODE_Node_Status, this list well never exceed the number of cluster members since every member can have only one value and some values may be shared by several members. However, list type attributes such as Gene Ontology (GO) terms may outnumber the cluster members since each member can have numerous values. The Occurrence column simply displays the number of nodes possessing the particular attribute value listed in each row. The Enumerator table orders the list by the frequency in descending order where the most commonly occurring attribute value is listed on top.
The Occurrence numbers are best interpreted when compared with the number of nodes in the cluster. For example, when enumerating Biological Process GO Terms, it may be a good indicator that the given cluster is biologically relevant if 9 of the 10 cluster members share some specific value.
In combination with the Size Threshold Slider, the Enumerator can be used to optimize clusters based on functional relevance. As the slider is being manipulated the Enumerator will automatically report changes in cluster content for the selected attribute. As such one can hone in on a size that, for example, reduces nodes with attribute values that are unrelated to some particular value of interest which is simultaneously maximized.
At this stage of MCODE development, the Node Attribute Enumerator is a precursor to more automated methods of accomplishing similar attribute-enhanced clustering and statistical reporting.
Outputting Results
Create Sub-Network
Clusters can be output as sub or child-networks of the original network by clicking the Create Sub-Network button located on the cluster exploration panel which is opened when a cluster is selected in the Cluster Browser.
New to Version 1.2: Since exploration allows for a cluster size to change, the user can now create as many sub-networks of the same cluster as desired. New networks are named by their result set, cluster rank and score, for example: Result 1: Cluster 1 (Score 4.3).
Export as Text
Clusters can be summarized in a time-stamped tab-delimited text file consisting of:
- Cluster rank
- Cluster score (density multiplied by the number of members)
- Number of nodes
- Number of edges
- Cluster member IDs (comma-delimited)
The parameters used in scoring and finding the exported result are included in the file as well for future reference. The default parameter settings appear as:
- Network Scoring: Include Loops: false Degree Cutoff: 2 K-Core: 2
- Cluster Finding: Node Score Cutoff: 0.2 Haircut: true Fluff: false Max. Depth from Seed: 100
Tutorials (Coming Soon)
BiNGO Validation
TODO: create a simple worked example, based on an example network that is shipped with Cytoscape - see the NetMatch user manual. Also, link to the MCODE tutorial on the main Cytoscape page - http://www.cytoscape.org/tut/modules.complexes.php