257
Comment:
|
2476
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
Download the implementation of Topological Clustering Semantic Similarity (TCSS) algorithm from the link given below: | |
Line 4: | Line 3: |
==== Downloads ==== | = An improved method for scoring protein-protein interactions using semantic similarity within the Gene Ontology = Shobhit Jain and Gary Bader == Background == Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over- or under-estimate similarity. == Results == We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs. == Conclusions == The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F_{1} score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations. == Downloads == |
Line 8: | Line 23: |
''TCSS'': (July 12, 2010) <<BR>> | ''TCSS'': (September 22, 2010) <<BR>> |
Line 10: | Line 25: |
==== Datasets ==== Datasets used in the analysis of TCSS (Description of each dataset is provided in Methods section). 1. Yeast PPI dataset 2. Human PPI dataset 3. Yest expression test dataset Datasets: [[attachment:datasets.tar.gz]] ==== TCSS results ==== |
An improved method for scoring protein-protein interactions using semantic similarity within the Gene Ontology
Shobhit Jain and Gary Bader
Background
Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over- or under-estimate similarity.
Results
We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs.
Conclusions
The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the F_{1} score over Resnik, the next best method, on our Saccharomyces cerevisiae PPI dataset and 2 times on our Homo sapiens PPI dataset using cellular component, biological process and molecular function GO annotations.
Downloads
Latest Release
TCSS: (September 22, 2010)
Source: TCSS.tar.gz
Datasets
Datasets used in the analysis of TCSS (Description of each dataset is provided in Methods section).
- Yeast PPI dataset
- Human PPI dataset
- Yest expression test dataset
Datasets: datasets.tar.gz