Diff for "DanieleMerico/HowtoDirectory" - Bader Lab @ The University of Toronto

Differences between revisions 8 and 14 (spanning 6 versions)

Daniele Merico - HowTo Directory

Affymetrix Microarray Analysis

Importing raw data and generating standard gene expression metrics (signals, calls, etc...)

[:DanieleMerico/HowtoDirectory/AffyCelCalSig: Importing Affymetrix CEL files, calculating MAS5 calls and signals]BR CEL files are the almost-raw files generated after chip image processing by Affymetrix software; BR the "fun" usually starts from the CEL files onwards; here's is the simplest things you can do with CEL files.
[:DanieleMerico/HowtoDirectory/ExprSet: Importing Affymetrix CEL files, bothering about the R exprSet object, calculating MAS5 calls and signals]BR if the experimental design is quite complex, or you are using a function requiring an expression set (exprSet),BR then, sorry, but you probably need to read this part instead of the previous one.

Computing Differential Expression

2-class methods BR these methods require a dicotomic classification of the samples (e.g. case vs control), and reproducibility of samples belonging to the same class
- [:DanieleMerico/HowtoDirectory/PLGEM: PLGEM]BR Features:
  - statistic used: corrected signal-to-noise, every gene treated as an independent entity; signal-to-noise is corrected according to an error model for the global estimation of varibility;
    - error model requires: linear relation between signal mean and standard deviation
  - significance: estimated by randomly permuting the data (by column), and computing the statistic;
  - recommended when: the number of replicates is uneven between case and control, with one of the two having very few, or just one replicate;
  - proteomics: successfully applied to tandem mass-spec proteomics data, where the signal was generated as abundancy-normalized peptide counts (NSAF)BR
  References
  - Pubmed.ID: 15606915 (main)
  - Pubmed.ID: 18029349 (proteomic application)
- SAMBR Features:
  - statistic used: corrected signal-to-noise, every gene treated as an independent entity;
  - significance: estimated by randomly permuting the data (by column), and computing the statistic;
  - recommended when: the number of replicates is 3 or more, and even between case and control;
  - proteomics: unknownBR
  References:
  - Pubmed.ID: 11309499 (main)

General Computational Techniques

Computational Techniques for multi-dimensional data:

[:DanieleMerico/HowtoDirectory/Distances: A few tips on distances] (especially for binary strings)

Tuning Visualization in R

[:DanieleMerico/HowtoDirectory/Boxplots: Hacks for boxplots tuning]
[http://research.stowers-institute.org/efg/R/Graphics/Basics/mar-oma/index.htm A graphical description of the main graphical parameters for R graphs]BR and [http://research.stowers-institute.org/efg/R/ a broader how-to for R graphics]

-  ⇤ ← Revision 8 as of 2007-12-04 16:40:53 → 
  Size: 2006
  Editor: DanieleMerico
  Comment:
+   ← Revision 14 as of 2007-12-04 23:43:03 → ⇥
  Size: 3234
  Editor: DanieleMerico
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 11:
-Importing raw data and generating standard gene expression metrics (signals, calls, etc...)
+__Importing raw data and generating standard gene expression metrics__ (signals, calls, etc...)
 Line 20:
-Computing Differential Expression
 * __2-class methods__ [[BR]]
+__Computing Differential Expression__
 * '''2-class methods''' [[BR]]
 Line 25:
-   PLGEM is an error-model based method; it is recommended when the number of replicates is uneven between case and control; it has been successfully applied to tandem mass-spec proteomics data, where the signal was generated as abundancy-normalized peptide counts (NSAF)
     * PubMed.ID: 15606915 (main)
     * PubMed.ID: 18029349 (proteomic application)
   * SAM
+   Features:
     * statistic used: corrected signal-to-noise, every gene treated as an independent entity; signal-to-noise is corrected according to an error model for the global estimation of varibility;
       * error model requires: linear relation between signal mean and standard deviation
     * significance: estimated by randomly permuting the data (by column), and computing the statistic;
     * recommended when: the number of replicates is uneven between case and control, with one of the two having very few, or just one replicate; 
     * proteomics: successfully applied to tandem mass-spec proteomics data, where the signal was generated as abundancy-normalized peptide counts (NSAF)[[BR]]
   References
     * Pubmed.ID: 15606915 (main)
     * Pubmed.ID: 18029349 (proteomic application)
   * SAM[[BR]]
   Features:
     * statistic used: corrected signal-to-noise, every gene treated as an independent entity;
     * significance: estimated by randomly permuting the data (by column), and computing the statistic;
     * recommended when: the number of replicates is 3 or more, and even between case and control; 
     * proteomics: unknown[[BR]]
   References:
     * Pubmed.ID: 11309499 (main)
-Line 32:
+Line 44:
-Computational Techniques for multi-dimensional data:
+__Computational Techniques for multi-dimensional data__:
-Line 35:
+Line 47:
+=== Tuning Visualization in R ===
 * [:DanieleMerico/HowtoDirectory/Boxplots: Hacks for boxplots tuning]
 * [http://research.stowers-institute.org/efg/R/Graphics/Basics/mar-oma/index.htm A graphical description of the main graphical parameters for R graphs][[BR]]
 and [http://research.stowers-institute.org/efg/R/ a broader how-to for R graphics]