Differences between revisions 2 and 5 (spanning 3 versions)

Summary of Affymetrix Microarray Data Analysis

1. Affymetrix Microarray Data

CEL files: contain intensity values, higher intensity (transcript abundance) more active genes
CDF (chip description file) files: specify the probe and probe set to which each cell belongs
Terms:
- Probe: oligonucleotides of 25 base (pair) length used to probe RNA targets (25 base sequence)
- Probe pair: a unit composed of a perfect match (PM) and its mismatch (MM)
- Probe pair set: PMs and MMs related to a common affyID (a group of probe pairs corresponds to a particular gene or a fraction of a gene. Some genes are represented by more than one probe set.)
- affyID: an identification for a probe set (which can be a gene or a fraction of a gene) represented on the array

2. Microarray Experimental Designs

Biological and technical replicates
Pooling (biological averaging), blocking, randomized
Sample size determination

3. Data Exploration

MA plots
- M values are log fold changes, M=log2(T/C)=log2(T)-log2(C)
- A values are average log intensities between two arrays, A=(log2(T)+log2(C))/2
Images, residual images
Histograms, boxplots
RNA degradation plots

4. Data Preprocessing

Approaches: background correction, normalization, PM correction, and summarization
- Background correction methods:
  - rma: robust multiarray average method (Irizarry et al. 2003)
  - mas: Affymetrix Microarray Suite background correction method (2002)
  - GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004)
- Normalization methods:
  - quantile, contrast and loess: discussed and compared by Bolstad et al. (2003)
  - constant (scaling): taken by Affymetrix, usually done after summarization
  - invariantset: used in the dChip software (Li and Wong 2001)
  - qspline: normalized by fitting splines to the quantiles (Workman et al. 2002).
- PM correction methods:
  - mas: an ideal mismatch subtracted from PM (Affymetrix 2002)
  - pmonly: no adjustment to the PM values.
  - subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999)
- Summarization methods:
  - avgdiff: the average (Affymetrix MAS 4.0 1999)
  - mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002)
  - liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset:
    - y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij
    - y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij
  - medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset
    - y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities
  - playerout: Lazaridis et al. (2002)
Popular methods

Background correction

Normalization

PM correction

Summarization

RMA

rma

quantile

pmonly

medianpolish (log2 scale)

MAS5: mas background correction, what normalization, mas PM correction, and mas summarization (log2 scale) MBEI: no background correction, invariantset normalization, subtractmm PM correction, and liwong summarization

-  ⇤ ← Revision 2 as of 2014-07-07 22:26:26 → 
  Size: 1245
  Editor: ChangjiangXu
  Comment:
+   ← Revision 5 as of 2014-07-07 23:05:27 → ⇥
  Size: 3306
  Editor: ChangjiangXu
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
-== Affymetrix Microarray Data ==
 * CEL files: contain processed intensity values, higher intensity (transcript abundance) more active genes
+. Affymetrix Microarray Data 
 * CEL files: contain intensity values, higher intensity (transcript abundance) more active genes
 Line 14:
-== Microarray Experimental Designs ==
+. Microarray Experimental Designs
 Line 16:
- * Pooling (biological averaging), blocking
+ * Pooling (biological averaging), blocking, randomized
 Line 19:
-== Data Exploration ==
+. Data Exploration
 Line 26:
+. Data Preprocessing
 * Approaches: background correction, normalization, PM correction, and summarization
   * Background correction methods:  
     * rma: robust multiarray average method (Irizarry et al. 2003)
     * mas: Affymetrix Microarray Suite background correction method (2002)
     * GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004)
  * Normalization methods:
    * quantile, contrast and loess: discussed and compared by Bolstad et al. (2003)
    * constant (scaling): taken by Affymetrix, usually done after summarization
    * invariantset: used in the dChip software (Li and Wong 2001)
    * qspline: normalized by fitting splines to the quantiles (Workman et al. 2002).
  * PM correction methods:
    * mas: an ideal mismatch subtracted from PM (Affymetrix 2002)
    * pmonly: no adjustment to the PM values.
    * subtractmm: subtract MM from PM (Affymetrix MAS 4.0  1999)
  * Summarization methods:
    * avgdiff: the average (Affymetrix MAS 4.0 1999)
    * mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0  2002)
    * liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset:
      * y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij
      * y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij 
    * medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset
      * y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities
    * playerout: Lazaridis et al. (2002)

 
 || Popular methods || Background correction || Normalization || PM correction || Summarization ||
 || RMA || rma || quantile || pmonly || medianpolish (log2 scale)||

MAS5: mas background correction, what normalization, mas PM correction, and mas summarization (log2 scale)
MBEI: no background correction, invariantset normalization, subtractmm PM correction, and liwong summarization