GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks
Brian Law and Gary Bader
Abstract
The increasing ease and accuracy of protein-protein interaction detection has resulted in the ability to map the interactomes of multiple species. We now have an opportunity to compare species to better understand how interactomes evolve. As DNA and protein sequence alignment algorithms were required for comparative genomics, network alignment algorithms are required for comparative interactomics. A number of network alignment methods have been developed for protein-protein interaction networks, where proteins are represented as vertices linked by edges if they interact. Recently, protein interactions have been mapped at the level of amino acid positions, which can be represented as an interface-interaction network (IIN), where vertices represent binding sites, such as protein domains and short sequence motifs. However, current algorithms are not designed to align these networks and generally fail to do so in practice. We present a greedy algorithm, GreedyPlus, for IIN alignment, combining data from diverse sources, including network, protein and binding site properties, to identify putative orthologous relationships between interfaces in available worm and yeast data. GreedyPlus is fast and simple, allowing for easy customization of behaviour, yet still capable of generating biologically meaningful network alignments.
Downloads
Latest Release
GreedyPlus: (May 02, 2015) 
 Source: GreedyPlus_v0.2.zip 
 Note: Minor updates. 
Usage
Open a terminal and type in java -jar <file location>/GreedyPlus_v0.1.jar <parameters file (optional)>
By default, with no parameters file specified, this implementation of GreedyPlus will use the file at <file location>/worm_yeast/best.reduced_max.params as the input parameter file. This file represents the trained, minimal set of parameters as described in the paper. The output will be generated at <file location>/GreedyPlus.worm_yeast.reduced_max.xgmml, which can be then directly imported into Cytoscape.
Description of parameters file:
The parameters file contains 6 sections. Lines are read in order from the top. Comment lines are indicated by a preceding !
The first two lines are the locations of two adjacency lists, each representing one of the two input networks. The third line is the location of a file listing all pairs of orthologous proteins between the two input networks. The fourth line is the location of a file list all pairs of orthologous nodes between the two input networks. The fifth line specifies where the output should be written to. The .xgmml extension will be appended automatically. The remaining lines specify the input scoring matrix files for the algorithm. They should be listed, one per line, in tab-delimited format: file location, type (p for protein, d for domain, b for binding site/ligand), weight, and threshold (- for no threshold).
Network Adjacency Files
Each line in the adjacency file represents two nodes in the network and an edge between them. Each node should be specified in the format <protein name>, <start position>, <end position>. The ordering of the nodes/edges is not relevant. These files do not need to contain a complete list of all nodes to be aligned; the algorithm uses the scoring matrix files (described below) as the authoritative source for nodes, as there may be disconnected nodes with the input networks.
Orthologous Proteins File
Each line in this file should be two orthologous proteins, in tab-delimited format, with the first protein coming from the first species and the second protein coming from the second species (order as specified in the first two lines of the parameters file).
This file may be omitted, but the RPO statistic will not be calculated.
Orthologous Nodes File
Each line in this file should be two orthologous nodes, in tab-delimited format, with the first node coming from the first species and the second node coming from the second species (order as specified in the first two lines of the parameters file).
This file may be omitted, but the OVP statistic will not be calculated.
Scoring Matrix Files
This file should be a tab-delimited file listing the similarity scores between every pair of proteins/domains/binding sites in the input networks. Those from the first species should be listed in the first row, while those from the second species should be listed in the first column.
These files need to correspond with the input network files: every node listed in the network files must be also be present in either every domain scoring matrix file or every binding site scoring matrix file, and the protein name for each node must be present in every protein scoring matrix file. The reverse is not required.
These files are used as the authoritative source for what nodes are to be aligned. Consequently, there must be at least one of each matrix type; the weight may be set to 0 to force the aligner to ignore the score values in a given matrix file.
