Size: 6848
Comment:
|
Size: 6910
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 19: | Line 19: |
Sounds easy enough, but it's not. See their detailed workflow ([[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000779#pcbi.1000779.s001|Supplementary Text S1]] for yourself. After you download the initial list of yeast cell cycle kinases, there are 25 more steps before you get your final list. Some of those steps involve installing specialized software and some of them involve downloading all of the yeast gene expression data in Gene Expression Omnibus. It would probably take you all day long, if you were lucky. | Sounds easy enough, but it's not. See their detailed workflow ([[http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000779#pcbi.1000779.s001|Supplementary Text S1]]) for yourself. After you download the initial list of yeast cell cycle kinases, there are 25 more steps before you get your final list. Some of those steps involve installing specialized software and some of them involve downloading all of the yeast gene expression data in Gene Expression Omnibus. It would probably take you all day long, if you were lucky. |
Line 23: | Line 23: |
GeneMANIA is designed to make this type of analysis easy. We've already downloaded all the data you need and put it into a convenient format. All you need to do is click to select the relevant datasets and paste in your gene list. [[attachment:cell_cycle_kinases.txt |Here's]] their list of 51 yeast cell cycle kinases. | GeneMANIA is designed to make this type of analysis easy. We've already downloaded all the data you need and put it into a convenient format. All you need to do is click to select the relevant datasets and paste in your gene list. The attached file [[attachment:cell_cycle_kinases.txt |cell_cycle_kinases.txt]] is a text file containing their list of 51 yeast cell cycle kinases. |
Line 25: | Line 25: |
Here's GeneMANIA's workflow: | This is GeneMANIA's workflow: |
GeneMANIA news and blog
What is this?
This is a temporary home for some informal news and blog-style writing about GeneMANIA. In terms of news, we will use this space to talk about upcoming GeneMANIA releases and changes. In terms of blogs, we will walk you through some of the types of analyses that you can do with GeneMANIA. There's lots to talk about; we need a lot of space to show you just how powerful a research tool GeneMANIA can be. What we talk about here is meant to supplement our OpenHelix video tutorial (which will be available shortly) and our Nucleic Acids Research webserver submission (which has just been accepted).
May 30, 2010
Today I am going to work through an example analysis featured in a recent PLoS Computational Biology education article by Curtis Huttenhower and Oliver Hofmann entitled "A Quick Guide to Large-Scale Genomic Data Mining". It's a nice introduction to some of the databases and tools that you can use to do that types of analysis that GeneMANIA was designed to make easy.
In their paper, Curtis and Oliver have worked through an example of finding potential yeast cell cycle kinase targets. Paraphrasing their workflow, the steps involved are the following:
- Download a list of yeast genes assigned the Gene Ontology annotations "cell cycle" and "protein kinases activity" (there are 51 such genes)
- Download interaction databases to find other genes whose protein products have physical interactions with those of the genes in our list of cell cycle kinases. These are potential targets.
- Download yeast expression data and filter the list of the potential kinase targets for those that are significantly co-expressed with one of our cell cycle kinases.
Sounds easy enough, but it's not. See their detailed workflow (Supplementary Text S1) for yourself. After you download the initial list of yeast cell cycle kinases, there are 25 more steps before you get your final list. Some of those steps involve installing specialized software and some of them involve downloading all of the yeast gene expression data in Gene Expression Omnibus. It would probably take you all day long, if you were lucky.
How would you do this with GeneMANIA?
GeneMANIA is designed to make this type of analysis easy. We've already downloaded all the data you need and put it into a convenient format. All you need to do is click to select the relevant datasets and paste in your gene list. The attached file cell_cycle_kinases.txt is a text file containing their list of 51 yeast cell cycle kinases.
This is GeneMANIA's workflow:
- Open the text file, select your gene list and copy it to your clipboard. For me, on a Mac, Command-A followed by Command-C does the trick.
Go to GeneMANIA
- Choose "yeast" from the list of species. Please wait a few seconds for the yeast networks to load -- sorry about the wait, we're working on fixing that.
- Click on the gene box. It will open up with the default gene list already highlighted, so if you paste (Command-V or Control-V as usual) your own list into the box, your list will overwrite the default list.
- Open up the advanced options panel. At the top left, besides "Enable" click on "none". This will deselect all of the default networks. Now click on the checkboxes besides "Co-expression" and "Physical interactions".
- In the "Number of gene results" part of the advanced options panel (it's at the bottom), select "50" genes. This should be a good place to start from.
- Click "GO"
That's it, you are done, now you have to wait about thirty seconds (maybe a minute) for GeneMANIA to run your analysis and upload the network display to your computer. GeneMANIA will find the co-expression and physical interaction networks most relevant to your query (i.e. those related to cell cycle) and then find the list of 50 genes that are most highly connected to the cell cycle kinases in your query list. Once GeneMANIA comes back you can examine the list by clicking on the Genes tab (it's on the right, near the top). You can save your result by choosing the "Save" menu item (it's on the left, near the top).
You can also navigate through the network to see whose connected to who. Right now, the network is a big ball of purple. That's because there are a lot of co-expression relationships among the genes in your query list (and the result list). Turn off the display of co-expression networks by clicking on the Networks tab (beside the Genes tab) and clicking the check box besides "Co-expression". I like to reset the network layout after I do this (choose the "Actions" menu and choose "Reset layout"). The gray nodes are the cell cycle kinases and the white ones are the predicted targets. You can find out more about the genes by clicking on the nodes -- this opens up the genes tabs with a quick summary of the gene function and from there you can link out to SGD or Entrez-Gene to find out more.
So, how does this compare with Curtis and Oliver's workflow?
Well, first of all, it's much faster. It probably took you no more than 10 minutes to get through this. Maybe 30 minutes if you were new to GeneMANIA. Second, the list of potential cell cycle kinase targets is much better. Out of 174 candidates that Curtis and Oliver produce, only 45 have a cell cycle annotation. In contrast, if we ask for 100 candidates, we get at least 42 genes with cell cycle annotations. That's a 60% improvement in hit rate (from about 25% to just over 40%). Third, GeneMANIA has a lot more added value, you have a browseable network display and you can bring in other data (like genetic interactions) to help refine the list even more. On the downside, there's no guarantee that every gene in our result list will be both co-expressed and physically interacting with one of the cell cycle genes but most of them are and you can use the network display to filter for those genes, if you want.
Now, we understand that Curtis and Oliver were trying to show how to use simple scripting to solve a complex data analysis problem. Their workflow provides a nice template that you can use to do more of this scripting. But the whole point of GeneMANIA is that you shouldn't need shell scripts to do these types of analyses. You also shouldn't need to install specialized software to transform data into a useable network form. We do that all for you, and keep up-to-date on the latest interaction data from all sources. And we do it faster and better.