Diff for "CSCPathwayAnalysisService/SOP" - Bader Lab @ The University of Toronto

Differences between revisions 7 and 13 (spanning 6 versions)

Service SOP

Consulting Meeting

Goal: to help plan an experiment before it is run. We can recommend case studies that user can learn from.
- Genomics technologies are very sensitive: they can detect small amounts of variation. A good experimental design takes into account all possible variables (or factors) and ensure better quality data. You are encouraged to come to talk with us (and/or with the biostatistician Shaheena Bashir)about your planned experiment. Will your design enable to answer your questions? Are there variables you did not think of? Do you have enough replicates? What is your control?
- These consulting meetings can also generate a follow up plan, where additional meetings can be scheduled during and after an experiment is run to answer questions and check that the experimental design is good: it is up to you to decide if you need it or not. We are always available to speak with you about your data.

Initial Meeting (when a dataset is ready to be analyzed)

Goal: We will discuss your project, the biological question(s) you want to answer, the experimental design, the enrichment analysis, statistical data data input formats, and create a project name.
Once correct input data are received and the quality controls are good, we will issue an initial pathway analysis plan (see below).
time estimate: 30min to 1 hour
Initial Meeting and Data Input Requirement: please have these data and information ready:
- During the first initial meeting, we are going to discuss :
  - the biological question(s) you want to answer
  - the experimental design
  - the platform you used to generate your data (e.g Affymetrix or Illumina, the chip model,...)
  - the quality controls and the input data format
- Your data should have been statistically analyzed (you should provide us with one file containing this information):
  - The data should have been normalized.
  - Some control quality plots should have been done:
    - Box-plot of intensity (before and after normalization)
    - Principal Component Analysis (PCA)
    - Hierarchical clustering of samples (performed on all the data)
    - Please provide a powerpoint presentation with a figure for each analysis
  - An appropriate statistical test testing your hypothesis (your biological question) should have been performed
    - for example : moderated t-test, paired t-test, ANOVA,...
  - If you need support for your statistical analyses, please contact Shaheena Bashir (Ph.D. in Statistics) at sbashir@uhnres.utoronto.ca. Located at MaRS TMDT 15th floor, she offers free consultation for statistical analyses for Cancer Stem Cell program. She will analyze your data and output the results in the right format for subsequent enrichment analyses. You are encouraged to contact Shaheena as soon as you plan your experiment: these genomics technologies are very sensitive to noise and a well designed experiment is very important for best results. Statistical consultation at the design stage is crucial for improved data quality.
- You need to provide us with 1 file (.txt) for the enrichment analysis :
  - Name your file as follow: yourname_date.txt (example: veronique_March21.txt)
  - Please rename your file with a new date if you resubmit your file
  - Please follow the format description:
    - the first column corresponds to Entrez ID.
      - An Entrez ID is a numerical value that uniquely identifies genes.
      - For example the Entrez ID for Myc (myelocytomatosis oncogene [ Mus musculus ]) is 17869: http://www.ncbi.nlm.nih.gov/gene/17869.
    - the second column corresponds to a unique array identifier (ProbesetID for Affymetrix and sampleID for Illumina).
    - the third column corresponds to gene name (official gene symbol).
    - the fourth column corresponds to the gene description (full gene name).
    - the fifth and sixth columns contain the statistical values :
      - the statistical values are the ones that enable you to tell if a gene is significantly differentially expressed or not, it could be for example the t value and the p-value if you applied a t-test.
      - the whole table is ranked on the basis of adjusted p-value.
    - the additional columns contain the transformed (log2 for example) and normalized (RMA or quantile normalization for example) values for each sample (= each chip if gene expression data).
  - Example:

Entrez ID	Probeset ID	Gene Name	Gene Description	t value	p-value	sample1	sample2	sample3
17218	10572906	Mcm5	minichromosome maintenance deficient 5, cell division cycle	44.0079	0.001	9.13084	9.7166	8.76638
27279	10448307	Tnfrsf12a	tumor necrosis factor receptor superfamily, member 12a	-41.815	0.001	8.58977	9.29698	8.80844
13215	10582809	Tk1	thymidine kinase 1	39.9456	0.001	8.94519	9.56513	8.38612
12937	10384145	H2afv	H2A histone family, member V	-33.6475	0.001	10.574	10.7741	10.5401
207277	10526848	A430033K04Rik	A430033K04Rik	33.3352	0.001	8.25088	8.4121	8.2783

Note:
- Each row of the table should correspond to a different gene. If several rows correspond to the same gene (same Entrez ID), there are 2 possibilities to remove the redundancy:
  - for a same gene, only the row corresponding to the best t-value is conserved
  - for a same gene, the average of the different normalized values is calculated before the t-test is applied
  - the choice has to be made before the statistical data are performed. We can discuss it during the initial meeting.

Analysis

flowchart2b

Pathway Analysis Plan
- Goal: A pathway analysis plan is a document that state the different analyses that are going to be performed and a time estimate. We write the pathway analysis plan once correct input data are received. It needs to be sign off by researchers and P.I. We send it to you as a Google document.
- A meeting can be scheduled if requested to explain the Pathway Analysis Plan.
Run analysis, interpret the map and produce a report
- Status : the analysis status will be visible on the website page; We will communicate with you very regularly during the process to ensure effective interpretation of results.
- Analysis Report: A report including a global figure of the map and a detailed focus analysis of several pathways as examples will be written at the end of the analysis.
- Result Meeting
  - Goal: discuss the analysis and report.
    - Examples of questions we can discuss: Do the results meet your expectations?
      - Is there anything unexpected in the results? If you had the resource, which experiments would you conduct based on the results of this analysis?
  - Time estimate: 30min to 1hour
  - Two options are available after this meeting:
    - We need to perform additional bioinformatics analyses : customized analyses
    - You are satisfied with the map and we let you play with the data and perform some validation experiments before a follow-up meeting
Training session
- Goal: You can book a training session if you wish to do your enrichment analysis on your own of if you want to explore the map once we have performed the analysis for you. We will explain you how to install Cytoscape and the different plugins (Enrichment Map, WordCloud and GeneMANIA) on your computer and how to play with your data.
- time estimate: 30min to 1 hour
- link to tutorial page
Customized analyses
- Meeting with Researcher to explain the results of the customized analyses
Follow-up
- Goal: you may have performed validation experiments or generated new research hypotheses based on your genomics study. You may need to go back and focus on a different aspect of your data. We can help you to re-analyse your data, provide with additional bioinformatics tools or help planned a next genomics experiment.

List of projects

This section summarizes the current projects, and the analysis status for each project is very regularly updated. You can see progress in the analysis of your project and see the different priorities assigned to each project.

project	lab	data received	data checked; OK for analysis	GSEA	First Map	Analysis report	additional analysis	status	priority
EZ01	Zacksenhaus	Feb 22	Feb 23	Feb 24	Feb 25	-	-	writing the report	1
JD02-map1	Dick	-	-	-	-	-	-	-	?
JD02-map2	Dick	-	-	-	-	-	-	-	?
JD03	Dick	-	-	-	-	-	-	-	?
JD04	Dick	-	-	-	-	-	-	-	?
JD05	Guidos	-	-	-	-	-	-	-	?

-  ⇤ ← Revision 7 as of 2011-03-28 14:22:47 → 
  Size: 9306
  Editor: VeroniqueVoisin
  Comment:
+   ← Revision 13 as of 2011-03-28 20:53:34 → ⇥
  Size: 9455
  Editor: VeroniqueVoisin
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
-  * ~+Goal+~: to help plan an experiment before it is run. We can recommend case studies that user can learn from.
+  * ~+'''Goal'''+~: to help plan an experiment before it is run. We can recommend case studies that user can learn from.
 Line 13:
-  * ~+Goal+~:  We will discuss your project, the biological question(s) you want to answer, the experimental design, the enrichment analysis, statistical data  data input formats, and create a project name.
  * Once correct input data are received and the quality controls are good, we will issue an initial pathway analysis plan (see below).
  * ~+time estimate+~: 30min to 1 hour   
  * Initial Meeting and  Data Input Requirement: please have these data and information ready:
     * During the first initial meeting,  we are going to discuss :
+  * ~+'''Goal'''+~:  We will discuss your project, the biological question(s) you want to answer, the experimental design, the enrichment analysis, statistical data  data input formats, and create a project name.
  * Once correct input data are received and the quality controls are good, we will issue an '''initial pathway analysis plan''' (see below).
  * ~+'''time estimate'''+~: 30min to 1 hour   
  * '''Initial Meeting and  Data Input Requirement''': please have these data and information ready:
     * '''During the first initial meeting,  we are going to discuss :'''
 Line 23:
-     * Your data should have been statistically analyzed (you should provide us with one file containing this information):
+     * '''Your data should have been statistically analyzed''' (you should provide us with one file containing this information):
 Line 31:
-     * An appropriate statistical test testing your hypothesis (your biological question) should have been performed
         * for example : moderated t-test, paired t-test, ANOVA,...
     * If you need support for your statistical analyses, please contact Shaheena Bashir (Ph.D. in Statistics) at sbashir@uhnres.utoronto.ca. Located at MaRS TMDT 15th floor, she offers free consultation for statistical analyses for Cancer Stem Cell program. She will analyze your data and output the results in the right format for subsequent enrichment analyses. You are encouraged to contact Shaheena as soon as you plan your experiment: these genomics technologies are very sensitive to noise and a well designed experiment is very important for best results.  It will ensure better quality data.
+      * An appropriate statistical test testing your hypothesis (your biological question) should have been performed
          * for example : moderated t-test, paired t-test, ANOVA,...
      * If you need support for your statistical analyses, please contact Shaheena Bashir (Ph.D. in Statistics) at sbashir@uhnres.utoronto.ca. Located at MaRS TMDT 15th floor, she offers free consultation for statistical analyses for Cancer Stem Cell program. She will analyze your data and output the results in the right format for subsequent enrichment analyses. You are encouraged to contact Shaheena as soon as you plan your experiment: these genomics technologies are very sensitive to noise and a well designed experiment is very important for best results.  Statistical consultation at the design stage is crucial for improved data quality.
 Line 36:
-  * You need to provide us with 1 file (.txt) for the enrichment analysis : 
      * Name your file as follow: yourname_date.txt (example: veronique_March21.txt)
      * Please rename your file with a new date if you resubmit your file
      * Please follow the format description:
           * the first column corresponds to Entrez ID.
              * An Entrez ID is a numerical value that uniquely identifies genes.
              * For example the Entrez ID for Myc (myelocytomatosis oncogene [ Mus musculus ]) is 17869: http://www.ncbi.nlm.nih.gov/gene/17869.
           * the second column corresponds to a unique array identifier (ProbesetID for Affymetrix and sampleID for Illumina).
           * the third column corresponds to gene name (official gene symbol).
           * the fourth column corresponds to the gene description (full gene name).
           * the fifth and sixth columns contain the statistical values : 
               * the statistical values are the ones that enable you to tell if a gene is significantly differentially expressed or not, it could be for example the t value and the p-value if you applied a t-test.
               * the whole table is ordered by the '''absolute value of the fifth column''' ( t value in this example) in a decreasing order.
           * the additional columns contain the transformed (log2 for example) and normalized (RMA or quantile normalization for example) values for each sample (= each chip if gene expression data).
+   * '''You need to provide us with 1 file (.txt) for the enrichment analysis''' : 
       * Name your file as follow: yourname_date.txt (example: veronique_March21.txt)
       * Please rename your file with a new date if you resubmit your file
       * Please follow the format description:
            * the first column corresponds to Entrez ID.
               * An Entrez ID is a numerical value that uniquely identifies genes.
               * For example the Entrez ID for Myc (myelocytomatosis oncogene [ Mus musculus ]) is 17869: http://www.ncbi.nlm.nih.gov/gene/17869.
            * the second column corresponds to a unique array identifier (ProbesetID for Affymetrix and sampleID for Illumina).
            * the third column corresponds to gene name (official gene symbol).
            * the fourth column corresponds to the gene description (full gene name).
            * the fifth and sixth columns contain the statistical values : 
                * the statistical values are the ones that enable you to tell if a gene is significantly differentially expressed or not, it could be for example the t value and the p-value if you applied a t-test.
                * the whole table is ranked  on the basis of adjusted p-value.
            * the additional columns contain the transformed (log2 for example) and normalized (RMA or quantile normalization for example) values for each sample (= each chip if gene expression data).
 Line 51:
-      * Example:
+       * '''Example''':
 Line 59:
-      * Note:
+      * '''Note''':
 Line 70:
- * ~+ '''Initial Pathway Analysis Plan''' +~
+ * ~+ '''Pathway Analysis Plan''' +~
 Line 77:
-  * ~+ '''Analysis Report Meeting'''+~
+  * ~+ '''Result Meeting'''+~
 Line 91:
+  * [[CancerStemCellProject/VeroniqueVoisin/PathwayAnalysisService/Tutorials | link to tutorial page ]]

Navigation

Service SOP

Consulting Meeting

Initial Meeting (when a dataset is ready to be analyzed)

Analysis

List of projects

? Link to results and reports ?