GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks
Brian Law and Gary Bader
Abstract
The increasing ease and accuracy of protein-protein interaction detection has resulted in the ability to map the interactomes of multiple species. We now have an opportunity to compare species to better understand how interactomes evolve. As DNA and protein sequence alignment algorithms were required for comparative genomics, network alignment algorithms are required for comparative interactomics. A number of network alignment methods have been developed for protein-protein interaction networks, where proteins are represented as vertices linked by edges if they interact. Recently, protein interactions have been mapped at the level of amino acid positions, which can be represented as an interface-interaction network (IIN), where vertices represent binding sites, such as protein domains and short sequence motifs. However, current algorithms are not designed to align these networks and generally fail to do so in practice. We present a greedy algorithm, GreedyPlus, for IIN alignment, combining data from diverse sources, including network, protein and binding site properties, to identify putative orthologous relationships between interfaces in available worm and yeast data. GreedyPlus is fast and simple, allowing for easy customization of behaviour, yet still capable of generating biologically meaningful network alignments.
Downloads
Latest Release
GreedyPlus: (Feb 13, 2015)
Source: GreedyPlus_v0.1.zip
Note: Initial release.
Usage Open a terminal and type in java -jar <file location>/GreedyPlus_v0.1.jar <parameters file (optional)>
By default, with no parameters file specified, this implementation of GreedyPlus will use the file at <file location>/worm_yeast/best.reduced_max.params as the input parameter file. This file represents the trained, minimal set of parameters as described in the paper. The output will be generated at <file location>/GreedyPlus.worm_yeast.reduced_max.xgmml, which can be then directly imported into Cytoscape.
- Description of parameters file: The parameters file contains 6 sections. Lines are read in order from the top. Comment lines are indicated by a preceding !
Network Adjacency Files Each line in the adjacency file represents two nodes in the network and an edge between them. Each node should be specified in the format <protein name>, <start position>, <end position>. The ordering of the nodes/edges is not relevant. These files do not need to contain a complete list of all nodes to be aligned; the algorithm uses the scoring matrix files (described below) as the definitive source for nodes, as there may be disconnected nodes with the input networks.
- Orthologous Proteins File Each line in this file should be two orthologous proteins, in tab-delimited format, with the first protein coming from the first species and the second protein coming from the second species (order as specified in the first two lines of the parameters file).
- Orthologous Nodes File Each line in this file should be two orthologous nodes, in tab-delimited format, with the first node coming from the first species and the second node coming from the second species (order as specified in the first two lines of the parameters file).
- Scoring Matrix Files This file should be a tab-delimited file listing the similarity scores between every pair of proteins/domains/binding sites in the input networks. Those from the first species should be listed in the first row, while those from the second species should be listed in the first column.