| Size: 1829 Comment:  | Size: 3970 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 4: | Line 4: | 
| == About This Page == This is a more detailed description of our project proposal ''Semantic Network Summary'' (idea #6 in the GenMAPP list). <<BR>> This project was started by * [[RuthIsserlin|Ruth Isserlin]] * [[DanieleMerico|Daniele Merico]] | |
| Line 22: | Line 17: | 
| === Input Description === === Available Solutions === === Expected Output === | Biological networks can be visualized and analyzed using Cytoscape. Since biological networks have a large number of nodes (a whole cell protein network has up to 20k nodes), it is common to summarize networks using by clustering, i.e. identifying groups of highly inter-connected nodes. Clusters can be identified algorithmically, or hand-picked by experts (with the help of the network layout). | 
| Line 26: | Line 19: | 
| The Environment for this project will be Cytoscape. We want the semantic network summary to be available as a Cytoscape plugin. In this way, the functionality will be available for any network loaded in Cytoscape. == About The Team == We are part of the [[/|BaderLab]] | Once clustering have been identified, however, it not trivial to summarize their meaning. Bio-entities typically have rich semantics, which are encoded by long string attributes. The purpose of the Semantic Network Summary module will be to generate more concise summaries. ==== Input/Output Description ==== The module will receive in input (S, A), a set of nodes S = {s1, s2, ..., sn} together with their string attributes A = {a1, a2, ..., an}. For every input (S, A), a graphical summary of the string attributes will have to be generated. Attributes A are typically stored in biological databases. They can be free-text descriptions, or controlled-vocabulary terms (e.g. Gene Ontology). === Available Solutions: Word Frequency === A first simple solution we have implemented: * break down {a1, a2, ..., an} into single words * count word frequencies * use coefficients based on information theory, or a statistical test p-value This simple idea can be improved by: * removing common-place words (e.g. "of", "by", etc...) * dividing the word frequencies in A by the word frequencies in the full network (i.e. all nodes) [[www.wordle.net/|Wordle]] is a cool graphical representation based on word frequency. === Going Beyond Simple Solutions === We would like applicants to be creative, and come up with good ideas on how to improve the frequency-based semantic summary. We think taking into account relations between words would be very useful to make the semantic summary richer and more informative. In fact, breaking down description into words can make it harder to grasp the original meaning of string attributes. === Environment: Cytoscape === * We want the semantic network summary to be implemented as a Cytoscape plugin * Cytoscape plug-ins are coded in Java using the Cytoscape API == About == This project was started by * [[RuthIsserlin|Ruth Isserlin]] * [[DanieleMerico|Daniele Merico]] We are part of [[Home|Gary Bader's lab]] at University of Toronto - CCBR (Toronto, ON Canada). Our lab is strongly engaged in biological network research. Feel free to have a look at our [[Home|home page]] for more details on the lab research areas, and at our home-pages for our own research interests. [[Software/EnrichmentMap|Here]] is also a Cytoscape plugin we have recently developed. | 
Google Summer of Code 2010: Semantic Network Summary
Short Description
Goal 
  Develop a visual summary of a set of node attributes 
Description 
  When biological networks are investigated, it is common to look for clusters, i.e. sets of nodes that are highly inter-connected. To figure out the "biological meaning" of a cluster, the user has to sift through the long textual annotations that are associated to biological entities. We are interested in producing a graphical summary of such annotations. Word frequency in annotations is a good starting point. This can be visualized as a "tag cloud". In addition, the word layout can reflect similarity relations among words (e.g. co-occurrence in the same annotations). 
This functionality can be applied to several networks outside biology, whenever nodes are associated to verbose textual annotation. An example is professional social networks (e.g. linked-in), where individuals are "annotated" by a short CV.
Language and Skills 
  Java, basic statistics  
Longer Description
Problem Description
Biological networks can be visualized and analyzed using Cytoscape. Since biological networks have a large number of nodes (a whole cell protein network has up to 20k nodes), it is common to summarize networks using by clustering, i.e. identifying groups of highly inter-connected nodes. Clusters can be identified algorithmically, or hand-picked by experts (with the help of the network layout).
Once clustering have been identified, however, it not trivial to summarize their meaning. Bio-entities typically have rich semantics, which are encoded by long string attributes.
The purpose of the Semantic Network Summary module will be to generate more concise summaries.
Input/Output Description
The module will receive in input (S, A), a set of nodes S = {s1, s2, ..., sn} together with their string attributes A = {a1, a2, ..., an}.
For every input (S, A), a graphical summary of the string attributes will have to be generated.
Attributes A are typically stored in biological databases. They can be free-text descriptions, or controlled-vocabulary terms (e.g. Gene Ontology).
Available Solutions: Word Frequency
A first simple solution we have implemented:
- break down {a1, a2, ..., an} into single words
- count word frequencies
- use coefficients based on information theory, or a statistical test p-value
This simple idea can be improved by:
- removing common-place words (e.g. "of", "by", etc...)
- dividing the word frequencies in A by the word frequencies in the full network (i.e. all nodes)
Wordle is a cool graphical representation based on word frequency.
Going Beyond Simple Solutions
We would like applicants to be creative, and come up with good ideas on how to improve the frequency-based semantic summary.
We think taking into account relations between words would be very useful to make the semantic summary richer and more informative. In fact, breaking down description into words can make it harder to grasp the original meaning of string attributes.
Environment: Cytoscape
- We want the semantic network summary to be implemented as a Cytoscape plugin
- Cytoscape plug-ins are coded in Java using the Cytoscape API
About
This project was started by
We are part of Gary Bader's lab at University of Toronto - CCBR (Toronto, ON Canada). Our lab is strongly engaged in biological network research. Feel free to have a look at our home page for more details on the lab research areas, and at our home-pages for our own research interests. Here is also a Cytoscape plugin we have recently developed.
