Differences between revisions 2 and 6 (spanning 4 versions)

Summary of Affymetrix Microarray Data Analysis

1. Affymetrix Microarray Data

CEL files: contain intensity values, higher intensity (transcript abundance) more active genes
CDF (chip description file) files: specify the probe and probe set to which each cell belongs
Terms:
- Probe: oligonucleotides of 25 base (pair) length used to probe RNA targets (25 base sequence)
- Probe pair: a unit composed of a perfect match (PM) and its mismatch (MM)
- Probe pair set: PMs and MMs related to a common affyID (a group of probe pairs corresponds to a particular gene or a fraction of a gene. Some genes are represented by more than one probe set.)
- affyID: an identification for a probe set (which can be a gene or a fraction of a gene) represented on the array

2. Microarray Experimental Designs

Biological and technical replicates
Pooling (biological averaging), blocking, randomized
Sample size determination

3. Data Exploration

MA plots
- M values are log fold changes, M=log2(T/C)=log2(T)-log2(C)
- A values are average log intensities between two arrays, A=(log2(T)+log2(C))/2
Images, residual images
Histograms, boxplots
RNA degradation plots

4. Data Preprocessing

Approaches: background correction, normalization, PM correction, and summarization
- Background correction methods:
  - rma: robust multiarray average method (Irizarry et al. 2003)
  - mas: Affymetrix Microarray Suite background correction method (2002)
  - GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004)
- Normalization methods:
  - quantile, contrast and loess: discussed and compared by Bolstad et al. (2003)
  - constant (scaling): taken by Affymetrix, usually done after summarization
  - invariantset: used in the dChip software (Li and Wong 2001)
  - qspline: normalized by fitting splines to the quantiles (Workman et al. 2002).
- PM correction methods:
  - mas: an ideal mismatch subtracted from PM (Affymetrix 2002)
  - pmonly: no adjustment to the PM values.
  - subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999)
- Summarization methods:
  - avgdiff: the average (Affymetrix MAS 4.0 1999)
  - mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002)
  - liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset:
    - y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij
    - y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij
  - medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset
    - y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities
  - playerout: Lazaridis et al. (2002)

Popular methods

Popular methods	Background correction	Normalization	PM correction	Summarization
RMA	rma	quantile	pmonly	medianpolish (log2 scale)
MAS5	mas	constant	mas	mas (log2 scale)
MBEI	PM only	invariantset	pmonly or subtractmm	liwong

-  ⇤ ← Revision 2 as of 2014-07-07 22:26:26 → 
  Size: 1245
  Editor: ChangjiangXu
  Comment:
+   ← Revision 6 as of 2014-07-07 23:13:01 → ⇥
  Size: 3233
  Editor: ChangjiangXu
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
-== Affymetrix Microarray Data ==
 * CEL files: contain processed intensity values, higher intensity (transcript abundance) more active genes
+. Affymetrix Microarray Data 
 * CEL files: contain intensity values, higher intensity (transcript abundance) more active genes
 Line 14:
-== Microarray Experimental Designs ==
+. Microarray Experimental Designs
 Line 16:
- * Pooling (biological averaging), blocking
+ * Pooling (biological averaging), blocking, randomized
 Line 19:
-== Data Exploration ==
+. Data Exploration
 Line 26:
+. Data Preprocessing
 * Approaches: background correction, normalization, PM correction, and summarization
   * Background correction methods:  
     * rma: robust multiarray average method (Irizarry et al. 2003)
     * mas: Affymetrix Microarray Suite background correction method (2002)
     * GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004)
  * Normalization methods:
    * quantile, contrast and loess: discussed and compared by Bolstad et al. (2003)
    * constant (scaling): taken by Affymetrix, usually done after summarization
    * invariantset: used in the dChip software (Li and Wong 2001)
    * qspline: normalized by fitting splines to the quantiles (Workman et al. 2002).
  * PM correction methods:
    * mas: an ideal mismatch subtracted from PM (Affymetrix 2002)
    * pmonly: no adjustment to the PM values.
    * subtractmm: subtract MM from PM (Affymetrix MAS 4.0  1999)
  * Summarization methods:
    * avgdiff: the average (Affymetrix MAS 4.0 1999)
    * mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0  2002)
    * liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset:
      * y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij
      * y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij 
    * medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset
      * y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities
    * playerout: Lazaridis et al. (2002)

 * Popular methods
 || Popular methods || Background correction || Normalization || PM correction || Summarization ||
 || RMA || rma || quantile || pmonly || medianpolish (log2 scale)||
 || MAS5 || mas || constant || mas || mas (log2 scale)||
 || MBEI || PM only || invariantset || pmonly or subtractmm || liwong ||