#acl GeneManiaGroup:read,write,revert #acl All:read = GeneMANIA news and blog = == What is this? == This is a temporary home for some informal news and blog-style writing about GeneMANIA. In terms of news, we will use this space to talk about upcoming GeneMANIA releases and changes. In terms of blogs, we will walk you through some of the types of analyses that you can do with GeneMANIA. There's lots to talk about; we need a lot of space to show you just how powerful a research tool GeneMANIA can be. What we talk about here is meant to supplement our [[http://www.openhelix.com|OpenHelix]] video tutorial (which will be available shortly) and our Nucleic Acids Research webserver submission (which has just been accepted). == July 16, 2010 == Here's another protocol for using GeneMANIA that I think is a lot of fun and highlights some new features that will be available with the new release. Protocol: 1) Go to http://qa.genemania.org 2) Choose “yeast” from the species box 3) Wait for the default list to load 4) Open the advanced options panel by clicking on “Show advanced options” 5) Enable “all” networks by clicking on “all” beside enable 6) Choose “50” genes from the “Number of gene result” box 7) Press “GO” at the upper right corner of the interface 8) Wait. While you are waiting, GeneMANIA is assigning a percentage weight to each of the networks according to how much more connected genes in your input are to each other compared to genes in the rest of the network. Then it is making a new, list-specific composite network that’s equal to a weighted average of the selected networks. Then the GeneMANIA engine will do label propagation on the composite network to score all the other genes in the networks according to how strongly associated they are to the query genes. Once this process is done, GeneMANIA takes the top N most highly associated genes and displays them, along with the query genes, in a browseable network. Once the network returns, you’ll see that it’s a big ball of string. That’s because many of the genes in your list are co-expressed with each other and co-localized in the nucleus, so we’re going to hide networks derived from these sources and re-layout the network 9) Click on the Networks tab 10) In the networks tab, click the boxes beside the “Co-expression”, “Co-localization”, and “Predicted” 11) Choose “Reset layout” in the Actions menu. You should see two prominent clusters of nodes. Now we are going to colour the graph according to function of the nodes. 12) Open the function tab 13) Hover over “M Phase” in the networks tab – see the genes annotated with M Phase change colour. Click on the “plus” sign to colour the nodes 14) Now go down the list of annotations, find “DNA repair”, click on the “plus” sign besides it to colour the nodes that are annotated with DNA repair 15) Continue down the list until you find “anaphase-promoting complex”, click on the “plus” sign beside it to colour nodes. You can now save the result of your analysis either as a publication-ready figure or a spreadsheet with a list of all of your interactions. 16) First add a few legends: click on the “Functions legend” to get the legend for node colourings and click on the “Networks legend” to get the legend for the link colours. 17) Click on the “Save” menu. If you choose “Generate report”, you’ll get a PDF report of your analysis, if you choose “Export network as text”, you’ll get a tab-delimited text file with all of your interactions. Cool, eh? GeneMANIA found the two different groups. == May 30, 2010 == Today I am going to work through an example analysis featured in a recent PLoS Computational Biology education article by Curtis Huttenhower and Oliver Hofmann entitled [[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000779|"A Quick Guide to Large-Scale Genomic Data Mining"]]. It's a nice introduction to some of the databases and tools that you can use to do that types of analysis that GeneMANIA was designed to make easy. In their paper, Curtis and Oliver have worked through an example of finding potential yeast cell cycle kinase targets. Paraphrasing their workflow, the steps involved are the following: 1. Download a list of yeast genes assigned the Gene Ontology annotations "cell cycle" and "protein kinases activity" (there are 51 such genes) 1. Download interaction databases to find other genes whose protein products have physical interactions with those of the genes in our list of cell cycle kinases. These are potential targets. 1. Download yeast expression data and filter the list of the potential kinase targets for those that are significantly co-expressed with one of our cell cycle kinases. Sounds easy enough, but it's not. See their detailed workflow ([[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000779#pcbi.1000779.s001|Supplementary Text S1]]) for yourself. After you download the initial list of yeast cell cycle kinases, there are 25 more steps before you get your final list. Some of those steps involve installing specialized software and some of them involve downloading all of the yeast gene expression data in Gene Expression Omnibus. It would probably take you all day long, if you were lucky. ==== How would you do this with GeneMANIA? ==== GeneMANIA is designed to make this type of analysis easy. We've already downloaded all the data you need and put it into a convenient format. All you need to do is click to select the relevant datasets and paste in your gene list. The attached file [[attachment:cell_cycle_kinases.txt |cell_cycle_kinases.txt]] is a text file containing their list of 51 yeast cell cycle kinases. This is GeneMANIA's workflow (I've added a few pictures to show the results of some of the steps): 1. Open the text file, select your gene list and copy it to your clipboard. For me, on a Mac, Command-A followed by Command-C does the trick. 1. Go to [[http://www.genemania.org|GeneMANIA]] 1. Choose "yeast" from the list of species. Please wait a few seconds for the yeast networks to load -- sorry about the wait, we're working on fixing that. 1. Click on the gene box. It will open up with the default gene list already highlighted, so if you paste (Command-V or Control-V as usual) your own list into the box, your list will overwrite the default list. 1. Open up the advanced options panel. At the top left, besides "Enable" click on "none". This will deselect all of the default networks. Now click on the checkboxes besides "Co-expression" and "Physical interactions". 1. In the "Number of gene results" part of the advanced options panel (it's at the bottom), select "50" genes. This should be a good place to start from. 1. Click "GO" In case you get lost, here's a few pictures that show what the screen should look like after ---- '''Step 3:''' {{attachment:choose_yeast.png | Choose yeast |align="middle" width=700}} ---- '''Step 4:''' {{attachment:add_genelist.png | Add genelist |align="middle" width=700}} ---- '''Step 5:''' {{attachment:open_advanced.png | Open advanced options |align="middle" width=700}} ---- That's it, you are done, now you have to wait about thirty seconds (maybe a minute) for GeneMANIA to run your analysis and upload the network display to your computer. GeneMANIA will find the co-expression and physical interaction networks most relevant to your query (i.e. those related to cell cycle) and then find the list of 50 genes that are most highly connected to the cell cycle kinases in your query list. Once GeneMANIA comes back you can examine the list by clicking on the Genes tab (it's on the right, near the top). You can save your result by choosing the "Save" menu item (it's on the left, near the top). This is something like what it should look like: {{attachment:after_go.png | GeneMANIA after your query |align="middle" width=700}} You can also navigate through the network to see how the genes are connected to one another. Right now, the network is a big ball of purple. That's because there are a lot of co-expression relationships among the genes in your query list (and the result list). Turn off the display of co-expression networks by clicking on the Networks tab (beside the Genes tab) and clicking the check box besides "Co-expression". I like to reset the network layout after I do this (choose the "Actions" menu and choose "Reset layout"). After you do this, the window should look more like this: {{attachment:reorganized.png | Network re-organized |align="middle" width=700}} The gray nodes are the cell cycle kinases and the white ones are the predicted targets. You can find out more about the genes by clicking on the nodes -- this opens up the genes tabs with a quick summary of the gene function and from there you can link out to SGD or Entrez-Gene to find out more. ==== So, how does this compare with Curtis and Oliver's workflow? ==== Well, first of all, it's much faster. It probably took you no more than 10 minutes to get through this. Maybe 30 minutes if you were new to GeneMANIA. Second, the list of potential cell cycle kinase targets is much better. Out of 174 candidates that Curtis and Oliver produce, only 45 have a cell cycle annotation. In contrast, if we ask for 100 candidates, we get at least 42 genes with cell cycle annotations. That's a 60% improvement in hit rate (from about 25% to just over 40%). Third, GeneMANIA has a lot more added value, you have a browseable network display and you can bring in other data (like genetic interactions) to help refine the list even more. On the downside, there's no guarantee that every gene in our result list will be both co-expressed and physically interacting with one of the cell cycle genes but most of them are and you can use the network display to filter for those genes, if you want. Now, we understand that Curtis and Oliver were trying to show how to use simple scripting to solve a complex data analysis problem. Their workflow provides a nice template that you can use to do more of this scripting. But we designed GeneMANIA because we believed that you shouldn't need to learn shell scripting to do these types of analyses. You also shouldn't need to install specialized software to transform data into a usable network form. We do that all for you, and keep up-to-date on the latest interaction data from all sources. And we do it faster and better. To learn more about GeneMANIA, go to our [[http://www.genemania.org | GeneMANIA]] webpage and click on the about link. Or you can read our Genome Biology paper: [[http://genomebiology.com/2008/9/S1/S4 | GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function]]. Or wait for the NAR webserver paper to come out. If you have a longer gene list you want analyzed, consider using our [[http://www.cytoscape.org/ | Cytoscape]] plugin. Our [[http://www.genemania.org/plugin/ | plugin webpage]] gives details on how to install it.