Size: 5542
Comment:
|
Size: 5381
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from CSCBiostatService/IlluminChipDataAnalysis ## page was renamed from CSCBiostatService/IlluminBeadchipDataAnalysis |
|
Line 3: | Line 5: |
= Summary of Illumina Expression BeadChip Data Analysis = | = Illumina Expression Bead``Chip Data Analysis Using R = |
Line 5: | Line 7: |
1. '''Data and experimental design''' * Platform: Illumina BeadChips |
1. '''Experimental design and data''' * Platform: Illumina Bead``Chips |
Line 14: | Line 16: |
* Data input: using function lumiR or lumiR.batch * Preprocessing |
i. Data input: using function lumiR or lumiR.batch i. Preprocessing |
Line 17: | Line 19: |
* Functions lumiB, lumiT, lumiN and lumiQ, designed for preprocessing and quality control 1. Quality control: using functions lumiQ and plot 1. Background correction: using function lumiB. The data has been background corrected. Therefore, no background correction was used 1. Variance stabilizing transform: using function lumiT 1. Data normalization: using function lumiN * Filtering |
* Functions lumiB (background correction), lumiT (variance stabilizing transform), lumiN (normalization) and lumiQ (quality control), designed for preprocessing and quality control i. Filtering |
Line 24: | Line 22: |
* quantile of all p-values, e.g., 50% quantile if the half of total probes are not detectable * false positive rate, e.g., threshold = 0.10 (p-values follow an uniform distribution under null hypothesis) |
a. quantile of all p-values, e.g., 50% quantile if the half of total probes are not detectable a. false positive rate, e.g., threshold = 0.10 (p-values follow an uniform distribution under null hypothesis) |
Line 27: | Line 25: |
* Visualizing | i. Visualizing |
Line 30: | Line 28: |
* Clustering | i. Clustering |
Line 32: | Line 30: |
* Example: plot(lumi.data.object, what='sampleRelation', cv.Th = 0.10) | Example: plot(lumi.data.object, what='sampleRelation', cv.Th = 0.10) |
Line 34: | Line 32: |
* Example: temp <- detectOutlier(lumi.data.object, ifPlot=TRUE); any(temp) #if FALSE, there does not exist an outlier. | Example: temp <- detectOutlier(lumi.data.object, ifPlot=TRUE); any(temp) #if FALSE, there does not exist an outlier. |
Line 36: | Line 34: |
* Exampe: X <- exprs(lumi.data.object); temp <- hclust(dist(t(X)), method="average"); plot(temp) * Using PCA * Example: X <- exprs(lumi.data.object); temp <- prcomp(t(X), scale=TRUE); groupColors <- palette(rainbow(length(levels(group)))) * clsuters using two components: * plot(temp$x[, 1:2], col=groupColors[group], pch=19, main="PCA"); legend("topright", levels(group), col=groupColors, pch=19) * clsuters using three components: * scatterplot3d(temp$x[, 1:3], color=groupColors[group], pch=19, main="PCA"); legend("topleft", levels(group)), col=groupColors, pch=19) |
Exampe: X <- exprs(lumi.data.object); temp <- hclust(dist(t(X)), method="average"); plot(temp) * Using principal component analysis (PCA) Example: X <- exprs(lumi.data.object); temp <- prcomp(t(X), scale=TRUE); groupColors <- palette(rainbow(length(levels(group)))) a. Clusters using two components: plot(temp$x[, 1:2], col=groupColors[group], pch=19, main="PCA"); legend("topright", levels(group), col=groupColors, pch=19) a. Clusters using three components: scatterplot3d(temp$x[, 1:3], color=groupColors[group], pch=19, main="PCA"); legend("topleft", levels(group)), col=groupColors, pch=19) |
Line 44: | Line 42: |
1. '''Statistical analysis of gene differential expressions using limma package''' | 1. '''Statistical analysis of differential expressions using limma package''' |
Line 49: | Line 47: |
* design <- model.matrix(~ 0 + marker + chip + patient) | design <- model.matrix(~ 0 + marker + chip + patient) |
Line 60: | Line 58: |
i. Generating a top table and combining with annotations of interest | i. Generating a top table with adjusted p-values and combining with annotations of interest |
Line 62: | Line 60: |
* topfit <- topTable(fit2, number=nrow(X), adjust="BH") * sort.by = "logFC" to sort by the (absolute) coefficient representing the log2-fold-change; |
topfit <- topTable(fit2, number=nrow(X), adjust="BH") |
Line 65: | Line 62: |
* topfit <- topTable(fit2, number=nrow(X), adjust="BH", coef=k) | topfit <- topTable(fit2, number=nrow(X), adjust="BH", coef=k) |
Line 67: | Line 64: |
* cbind(annotations, mean.expressions, topfit) | cbind(annotations, mean.expressions, topfit) |
Line 75: | Line 72: |
[[http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf | Smyth et al. (2014) limma: linear models for microarray data user’s guide ]] <<BR>> |
[[http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf | Smyth et al. (2014) limma: linear models for microarray data user’s guide]] <<BR>> |
Illumina Expression Bead``Chip Data Analysis Using R
Experimental design and data
Platform: Illumina BeadChips
- Design: patients, groups (markers), and chips
- Files (txt files)
- raw data: each gene corresponds to one row.
- sample names and array barcodes
- annotation file
Data preprocessing using lumi package
- Data input: using function lumiR or lumiR.batch
- Preprocessing
- using encapsulating function lumiExpresso
- Functions lumiB (background correction), lumiT (variance stabilizing transform), lumiN (normalization) and lumiQ (quality control), designed for preprocessing and quality control
- Filtering
- remove the undetectable (unexpressed) genes based on detection pvalue threshold given by
- quantile of all p-values, e.g., 50% quantile if the half of total probes are not detectable
- false positive rate, e.g., threshold = 0.10 (p-values follow an uniform distribution under null hypothesis)
- remove technical replicates and/or irrelevant patients
- remove the undetectable (unexpressed) genes based on detection pvalue threshold given by
- Visualizing
- using function plot, including density, boxplot, MAplot, pair, and sampleRelation. See the details using help("plot-methods").
- boxplot and density plot of both raw and normalized intensities on log2 scale
- Clustering
- Using function plotSampleRelation: estimate the sample relations based on selected probes (based on large coefficient of variance (mean/standard variance)). Two methods can be used: MDS (Multi-Dimensional Scaling) or hierarchical clustering methods. Example: plot(lumi.data.object, what='sampleRelation', cv.Th = 0.10)
- Detect the outlier: The current outlier detection is based on the distance from the sample to the center (average of all samples after removing 10 percent samples farthest away from the center).
Example: temp <- detectOutlier(lumi.data.object, ifPlot=TRUE); any(temp) #if FALSE, there does not exist an outlier.
- Using function hclust (cluster samples using Euclidean distance)
Exampe: X <- exprs(lumi.data.object); temp <- hclust(dist(t(X)), method="average"); plot(temp)
- Using principal component analysis (PCA)
Example: X <- exprs(lumi.data.object); temp <- prcomp(t(X), scale=TRUE); groupColors <- palette(rainbow(length(levels(group))))
- Clusters using two components: plot(temp$x[, 1:2], col=groupColors[group], pch=19, main="PCA"); legend("topright", levels(group), col=groupColors, pch=19)
- Clusters using three components: scatterplot3d(temp$x[, 1:3], color=groupColors[group], pch=19, main="PCA"); legend("topleft", levels(group)), col=groupColors, pch=19)
Statistical analysis of differential expressions using limma package
- Model design matrix generated using function model.matrix
- define three factor variables: patient, marker (or group), and chip
unpaired design: design <- model.matrix(~ 0 + marker + chip)
- paired design: the patient or sample effects may be different when measured twice or more.
design <- model.matrix(~ 0 + marker + chip + patient)
- Fitting linear models
fit <- lmFit(X, design)
- X: a matrix of gene expressions, each row consists of expressions of one gene.
- For gene i, fitting a linear model: x_i= design * b_i + e_i
- Fitting contrasts (e.g., 3 contrasts)
contrasts <- c("marker3-marker1", "marker3-marker2", "marker2-marker1")
contrast.matrix <- makeContrasts(contrasts = contrasts, levels=design)
fit1 <- contrasts.fit(fit, contrast.matrix)
- Empirical Bayes
fit2 <- eBayes(fit1)
- Generating a top table with adjusted p-values and combining with annotations of interest
- topfit based on F-statistic
topfit <- topTable(fit2, number=nrow(X), adjust="BH")
- topfit based on t-statistic for each contrast (e.g., contrast k)
topfit <- topTable(fit2, number=nrow(X), adjust="BH", coef=k)
- combining with annotations and mean expressions
- cbind(annotations, mean.expressions, topfit)
- topfit based on F-statistic
- Model design matrix generated using function model.matrix
References
Du (2008) lumi: a pipeline for processing Illumina microarray, Bioinformatics.
Du et al. (2014) Using lumi, a package processing Illumina microarray
Du et al. (2014) Evaluation of VST algorithm in lumi package
Lin at al. (2008) Model-based variance-stabilizing transformation for Illumina microarray data, Nucleic Acids Res.
Smyth et al. (2014) limma: linear models for microarray data user’s guide
Smyth (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments, Stat Appl Genet Mol Biol.