Differences between revisions 2 and 13 (spanning 11 versions)

Illumina Expression Bead``Chip Data Analysis Using R

Experimental design and data
- Platform: Illumina BeadChips
- Design: patients, groups (markers), and chips
- Files (txt files)
  - raw data: each gene corresponds to one row.
  - sample names and array barcodes
  - annotation file
Data preprocessing using lumi package
1. Data input: using function lumiR or lumiR.batch
2. Preprocessing
  - using encapsulating function lumiExpresso
  - Functions lumiB (background correction), lumiT (variance stabilizing transform), lumiN (normalization) and lumiQ (quality control), designed for preprocessing and quality control
3. Filtering
  - remove the undetectable (unexpressed) genes based on detection pvalue threshold given by
    1. quantile of all p-values, e.g., 50% quantile if the half of total probes are not detectable
    2. false positive rate, e.g., threshold = 0.10 (p-values follow an uniform distribution under null hypothesis)
  - remove technical replicates and/or irrelevant patients
4. Visualizing
  - using function plot, including density, boxplot, MAplot, pair, and sampleRelation. See the details using help("plot-methods").
  - boxplot and density plot of both raw and normalized intensities on log2 scale
5. Clustering
  - Using function plotSampleRelation: estimate the sample relations based on selected probes (based on large coefficient of variance (mean/standard variance)). Two methods can be used: MDS (Multi-Dimensional Scaling) or hierarchical clustering methods. Example: plot(lumi.data.object, what='sampleRelation', cv.Th = 0.10)
  - Detect the outlier: The current outlier detection is based on the distance from the sample to the center (average of all samples after removing 10 percent samples farthest away from the center).
    Example: temp <- detectOutlier(lumi.data.object, ifPlot=TRUE); any(temp) #if FALSE, there does not exist an outlier.
  - Using function hclust (cluster samples using Euclidean distance)
    Exampe: X <- exprs(lumi.data.object); temp <- hclust(dist(t(X)), method="average"); plot(temp)
  - Using principal component analysis (PCA)
    Example: X <- exprs(lumi.data.object); temp <- prcomp(t(X), scale=TRUE); groupColors <- palette(rainbow(length(levels(group))))
    1. Clusters using two components: plot(temp$x[, 1:2], col=groupColors[group], pch=19, main="PCA"); legend("topright", levels(group), col=groupColors, pch=19)
    2. Clusters using three components: scatterplot3d(temp$x[, 1:3], color=groupColors[group], pch=19, main="PCA"); legend("topleft", levels(group)), col=groupColors, pch=19)
Statistical analysis of differential expressions using limma package
1. Model design matrix generated using function model.matrix
  - define three factor variables: patient, marker (or group), and chip
  - unpaired design: design <- model.matrix(~ 0 + marker + chip)
  - paired design: the patient or sample effects may be different when measured twice or more.
    - design <- model.matrix(~ 0 + marker + chip + patient)
2. Fitting linear models
  - fit <- lmFit(X, design)
  - X: a matrix of gene expressions, each row consists of expressions of one gene.
  - For gene i, fitting a linear model: x_i= design * b_i + e_i
3. Fitting contrasts (e.g., 3 contrasts)
  - contrasts <- c("marker3-marker1", "marker3-marker2", "marker2-marker1")
  - contrast.matrix <- makeContrasts(contrasts = contrasts, levels=design)
  - fit1 <- contrasts.fit(fit, contrast.matrix)
4. Empirical Bayes
  - fit2 <- eBayes(fit1)
5. Generating a top table with adjusted p-values and combining with annotations of interest
  - topfit based on F-statistic
    - topfit <- topTable(fit2, number=nrow(X), adjust="BH")
  - topfit based on t-statistic for each contrast (e.g., contrast k)
    - topfit <- topTable(fit2, number=nrow(X), adjust="BH", coef=k)
  - combining with annotations and mean expressions
    - cbind(annotations, mean.expressions, topfit)

References

Du (2008) lumi: a pipeline for processing Illumina microarray, Bioinformatics.
Du et al. (2014) Using lumi, a package processing Illumina microarray
Du et al. (2014) Evaluation of VST algorithm in lumi package
Lin at al. (2008) Model-based variance-stabilizing transformation for Illumina microarray data, Nucleic Acids Res.
Smyth et al. (2014) limma: linear models for microarray data user’s guide
Smyth (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments, Stat Appl Genet Mol Biol.

-  ⇤ ← Revision 2 as of 2014-08-11 13:52:45 → 
  Size: 5542
  Editor: ChangjiangXu
  Comment:
+   ← Revision 13 as of 2014-08-12 15:27:09 → ⇥
  Size: 5381
  Editor: ChangjiangXu
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
+## page was renamed from CSCBiostatService/IlluminChipDataAnalysis
## page was renamed from CSCBiostatService/IlluminBeadchipDataAnalysis
-Line 3:
+Line 5:
-= Summary of Illumina Expression BeadChip Data Analysis =
+= Illumina Expression Bead``Chip Data Analysis Using R =
-Line 5:
+Line 7:
-. '''Data and experimental design'''
   * Platform: Illumina BeadChips
+. '''Experimental design and data'''
   * Platform: Illumina Bead``Chips
-Line 14:
+Line 16:
-   * Data input: using function lumiR or lumiR.batch
   * Preprocessing
+   i. Data input: using function lumiR or lumiR.batch
   i. Preprocessing
-Line 17:
+Line 19:
-     * Functions lumiB, lumiT, lumiN and lumiQ, designed for preprocessing and quality control
       1. Quality control: using functions lumiQ and plot
       1. Background correction: using function lumiB. The data has been background corrected. Therefore, no background correction was used
       1. Variance stabilizing transform: using function lumiT
       1. Data normalization: using function lumiN
   * Filtering
+     * Functions lumiB (background correction), lumiT (variance stabilizing transform), lumiN (normalization) and lumiQ (quality control), designed for preprocessing and quality control 
   i. Filtering
-Line 24:
+Line 22:
-       * quantile of all p-values, e.g., 50% quantile if the half of total probes are not detectable
       * false positive rate, e.g., threshold = 0.10 (p-values follow an uniform distribution under null hypothesis)
+       a. quantile of all p-values, e.g., 50% quantile if the half of total probes are not detectable
       a. false positive rate, e.g., threshold = 0.10 (p-values follow an uniform distribution under null hypothesis)
-Line 27:
+Line 25:
-   * Visualizing
+   i. Visualizing
-Line 30:
+Line 28:
-   * Clustering
+   i. Clustering
-Line 32:
+Line 30:
-       * Example: plot(lumi.data.object, what='sampleRelation', cv.Th = 0.10)
+     Example: plot(lumi.data.object, what='sampleRelation', cv.Th = 0.10)
-Line 34:
+Line 32:
-       * Example: temp <- detectOutlier(lumi.data.object, ifPlot=TRUE); any(temp) #if FALSE, there does not exist an outlier.
+     Example: temp <- detectOutlier(lumi.data.object, ifPlot=TRUE); any(temp) #if FALSE, there does not exist an outlier.
-Line 36:
+Line 34:
-       * Exampe: X <- exprs(lumi.data.object); temp <- hclust(dist(t(X)), method="average"); plot(temp)
     * Using PCA
       * Example: X <- exprs(lumi.data.object); temp <- prcomp(t(X), scale=TRUE); groupColors <- palette(rainbow(length(levels(group))))
       * clsuters using two components:
         * plot(temp$x[, 1:2], col=groupColors[group], pch=19, main="PCA"); legend("topright", levels(group), col=groupColors, pch=19)
       * clsuters using three components:
         * scatterplot3d(temp$x[, 1:3], color=groupColors[group], pch=19, main="PCA"); legend("topleft", levels(group)), col=groupColors, pch=19)
+     Exampe: X <- exprs(lumi.data.object); temp <- hclust(dist(t(X)), method="average"); plot(temp)
     * Using principal component analysis (PCA)
     Example: X <- exprs(lumi.data.object); temp <- prcomp(t(X), scale=TRUE); groupColors <- palette(rainbow(length(levels(group))))
       a. Clusters using two components:
       plot(temp$x[, 1:2], col=groupColors[group], pch=19, main="PCA"); legend("topright", levels(group), col=groupColors, pch=19)
       a. Clusters using three components:
       scatterplot3d(temp$x[, 1:3], color=groupColors[group], pch=19, main="PCA"); legend("topleft", levels(group)), col=groupColors, pch=19)
-Line 44:
+Line 42:
-. '''Statistical analysis of gene differential expressions using limma package'''
+. '''Statistical analysis of differential expressions using limma package'''
-Line 49:
+Line 47:
-        * design <- model.matrix(~ 0 + marker + chip + patient)
+        design <- model.matrix(~ 0 + marker + chip + patient)
-Line 60:
+Line 58:
-   i. Generating a top table and combining with annotations of interest
+   i. Generating a top table with adjusted p-values and combining with annotations of interest
-Line 62:
+Line 60:
-        * topfit <- topTable(fit2, number=nrow(X), adjust="BH")
        * sort.by = "logFC" to sort by the (absolute) coefficient representing the log2-fold-change;
+        topfit <- topTable(fit2, number=nrow(X), adjust="BH")
-Line 65:
+Line 62:
-        * topfit <- topTable(fit2, number=nrow(X), adjust="BH", coef=k)
+        topfit <- topTable(fit2, number=nrow(X), adjust="BH", coef=k)
-Line 67:
+Line 64:
-        * cbind(annotations, mean.expressions, topfit)
+        cbind(annotations, mean.expressions, topfit)
-Line 75:
+Line 72:
-[[http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf | Smyth et al. (2014) limma: linear models for microarray data user’s guide
]] <<BR>>
+[[http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf | Smyth et al. (2014) limma: linear models for microarray data user’s guide]] <<BR>>