4781
Comment:
|
6355
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
1. Affymetrix Microarray Data * CEL files: contain intensity values, higher intensity (transcript abundance) more active genes * CDF (chip description file) files: specify the probe and probe set to which each cell belongs * Terms: |
1. '''Affymetrix Microarray Data''' * CEL files: contain intensity values, higher intensity (transcript abundance) more active genes * CDF (chip description file) files: specify the probe and probe set to which each cell belongs * Terms: |
Line 14: | Line 14: |
2. Microarray Experimental Designs * Biological and technical replicates * Pooling (biological averaging), blocking, randomized * Sample size determination |
1. '''Microarray Experimental Designs''' * Biological and technical replicates * Pooling (biological averaging), blocking, randomized * Sample size determination |
Line 19: | Line 19: |
3. Data Exploration * MA plots |
1. '''Data Exploration''' * MA plots |
Line 23: | Line 23: |
* Images, residual images * Histograms, boxplots * RNA degradation plots |
* Images, residual images * Histograms, boxplots * RNA degradation plots |
Line 27: | Line 27: |
4. Data Preprocessing * Approaches: background correction, normalization, PM correction, and summarization * Background correction methods: |
1. '''Data Preprocessing''' * Approaches: background correction, normalization, PM correction, and summarization a. Background correction methods: |
Line 33: | Line 33: |
* Normalization methods: * quantile, contrast and loess: discussed and compared by Bolstad et al. (2003) * constant (scaling): taken by Affymetrix, usually done after summarization * invariantset: used in the dChip software (Li and Wong 2001) * qspline: normalized by fitting splines to the quantiles (Workman et al. 2002). * PM correction methods: * mas: an ideal mismatch subtracted from PM (Affymetrix 2002) * pmonly: no adjustment to the PM values. * subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999) * Summarization methods: * avgdiff: the average (Affymetrix MAS 4.0 1999) * mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002) * liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset: * y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij * y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij * medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset * y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities * playerout: Lazaridis et al. (2002) |
a. Normalization methods: * quantile, contrast and loess: discussed and compared by Bolstad et al. (2003) * constant (scaling): taken by Affymetrix, usually done after summarization * invariantset: used in the dChip software (Li and Wong 2001) * qspline: normalized by fitting splines to the quantiles (Workman et al. 2002). a. PM correction methods: * mas: an ideal mismatch subtracted from PM (Affymetrix 2002) * pmonly: no adjustment to the PM values. * subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999) a. Summarization methods: * avgdiff: the average (Affymetrix MAS 4.0 1999) * mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002) * liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset: * y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij * y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij * medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset * y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities * playerout: Lazaridis et al. (2002) |
Line 52: | Line 52: |
* Popular methods || Methods || Background correction || Normalization || PM correction || Summarization || || RMA || rma || quantile || pmonly || medianpolish (log2 scale)|| || MAS5 || mas || constant || mas || mas (log2 scale)|| || MBEI || PM only || invariantset || pmonly or subtractmm || liwong || |
* Popular methods || '''Methods''' || '''Background correction''' || '''Normalization''' || '''PM correction''' || '''Summarization''' || || RMA || rma || quantile || pmonly || medianpolish (log2 scale)|| || MAS5 || mas || constant || mas || mas (log2 scale)|| || MBEI || PM only || invariantset || pmonly or subtractmm || liwong || |
Line 58: | Line 58: |
* Comparison of methods: compare the performance (power and FDR) of the methods using the data where the truth is known. | * Comparison of methods: compare the performance (power and FDR) of the methods using the data where the truth is known. |
Line 60: | Line 60: |
* R function expresso: combining the preprocessing methods together, but not every method can be combined. * rma background correction adjusting only PM probe intensities should only be used in conjunction with the pmonly PM correction. * subtractmm PM correction should not be used in conjunction with mas and medianpolish summarization methods because of likely negative corrections. |
* R function expresso: combining the preprocessing methods together, but not every method can be combined. * rma background correction adjusting only PM probe intensities should only be used in conjunction with the pmonly PM correction. * subtractmm PM correction should not be used in conjunction with mas and medianpolish summarization methods because of likely negative corrections. |
Line 64: | Line 64: |
5. Analysis of Differentially Expressed Genes * Approaches * Parametric test: t-test * Non-parametric tests: Wilcoxon sign-rank/rank-sum tests * Linear models of microarrays (limma package): * linear models and design matrix * Contrasts and contrasts matrix * ANOVA and MANOVA * Multiple testings (p-value adjustments): * FWER: Bonferroni * FDR: Benjamini Hochberg |
1. '''Analysis of Differentially Expressed Genes''' * Approaches * Parametric test: t-test * Non-parametric tests: Wilcoxon sign-rank/rank-sum tests * Linear models of microarrays (limma package): * linear models and design matrix * Contrasts and contrasts matrix * ANOVA and MANOVA * Multiple testings (p-value adjustments): * FWER: Bonferroni * FDR: Benjamini Hochberg |
Line 76: | Line 76: |
6. Clustering of Differentially Expressed Genes * Annotation; Gene ontology * Venn diagrams; clustering; classification * Diagnostics |
1. '''Clustering of Differentially Expressed Genes''' * Annotation; Gene ontology * Venn diagrams; clustering; classification * Diagnostics |
Line 82: | Line 82: |
References | '''References''' |
Line 84: | Line 84: |
Bolstad (2014), [[http://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/builtinMethods.pdf | Built-in Processing Methods]] Affymetrix Microarray Suite (MAS) 5.0 Smyth (2004) Statistical Applications in Genetics and Molecular Biology Wu et al. (2004) JASA Bolstad et al. (2003) Bioinformatics Irizarry et al. (2003) Biostatistics Li and Wong (2001) PNAS Lazaridis et al. (2002) Math Biosci. Workman et al. (2002) Genome Biol. |
[[http://www.bioconductor.org/packages/release/bioc/vignettes/affy/inst/doc/builtinMethods.pdf | B. Bolstad (2014) Built-in Processing Methods]] <<BR>> [[http://media.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf | Affymetrix, Statistical algorithms description document, 2002.]] <<BR>> [[http://www.ncbi.nlm.nih.gov/pubmed/16646809 | Smyth (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments.]] <<BR>> [[http://amstat.tandfonline.com/doi/abs/10.1198/016214504000000683#.U7yutPm-30s | Wu et al. (2004) A model-based background adjustment for oligonucleotide expression arrays, JASA.]] <<BR>> [[http://bioinformatics.oxfordjournals.org/content/19/2/185.full.pdf | Bolstad et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics.]] <<BR>> [[http://biostatistics.oxfordjournals.org/content/4/2/249.full.pdf+html | Irizarry et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics.]] <<BR>> [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC126873/pdf/gb-2002-3-9-research0048.pdf | Workman et al. (2002) A new non-linear normalization method for reducing variability in DNA microarray experiments, Genome Biol.]] <<BR>> [[http://www.ncbi.nlm.nih.gov/pubmed/11867083 | Lazaridis et al. (2002) A simple method to improve probe set estimates from oligonucleotide arrays, Math Biosci.]] <<BR>> [[http://www.pnas.org/content/98/1/31.long | Li and Wong (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection, PNAS.]] <<BR>> [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC55329/ | Li and Wong (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application, Genome Biology]] <<BR>> |
Summary of Affymetrix Microarray Data Analysis
Affymetrix Microarray Data
- CEL files: contain intensity values, higher intensity (transcript abundance) more active genes
- CDF (chip description file) files: specify the probe and probe set to which each cell belongs
- Terms:
- Probe: oligonucleotides of 25 base (pair) length used to probe RNA targets (25 base sequence)
- Probe pair: a unit composed of a perfect match (PM) and its mismatch (MM)
- Probe pair set: PMs and MMs related to a common affyID (a group of probe pairs corresponds to a particular gene or a fraction of a gene. Some genes are represented by more than one probe set.)
- affyID: an identification for a probe set (which can be a gene or a fraction of a gene) represented on the array
Microarray Experimental Designs
- Biological and technical replicates
- Pooling (biological averaging), blocking, randomized
- Sample size determination
Data Exploration
- MA plots
- M values are log fold changes, M=log2(T/C)=log2(T)-log2(C)
- A values are average log intensities between two arrays, A=(log2(T)+log2(C))/2
- Images, residual images
- Histograms, boxplots
- RNA degradation plots
- MA plots
Data Preprocessing
- Approaches: background correction, normalization, PM correction, and summarization
- Background correction methods:
- rma: robust multiarray average method (Irizarry et al. 2003)
- mas: Affymetrix Microarray Suite background correction method (2002)
- GCRMA: modified RMA to estimate nonspecific binding (Wu et al. 2004)
- Normalization methods:
- quantile, contrast and loess: discussed and compared by Bolstad et al. (2003)
- constant (scaling): taken by Affymetrix, usually done after summarization
- invariantset: used in the dChip software (Li and Wong 2001)
- qspline: normalized by fitting splines to the quantiles (Workman et al. 2002).
- PM correction methods:
- mas: an ideal mismatch subtracted from PM (Affymetrix 2002)
- pmonly: no adjustment to the PM values.
- subtractmm: subtract MM from PM (Affymetrix MAS 4.0 1999)
- Summarization methods:
- avgdiff: the average (Affymetrix MAS 4.0 1999)
- mas: Tukey biweight on log2(PM-CM) (Affymetrix MAS 5.0 2002)
- liwong: model-based expression index (MBEI) (Li and Wong 2001), fitting the following multi-chip model to each probeset:
- y_ij = theta_i * phi_j + epsilon_ij, where y_ij = PM_ij - MM_ij
- y_ij = mu_i + theta_i * phi_j + epsilon_ij, where y_ij = PM_ij
- medianpolish: used in the RMA expression summary (Irizarry et al. 2003). A multichip linear model is fit to data from each probeset
- y_ij = alpha_i + beta_j + epsilon_ij, where y_ij are the background-adjusted, normalized, and log-transformed PM intensities
- playerout: Lazaridis et al. (2002)
- Background correction methods:
- Popular methods
Methods
Background correction
Normalization
PM correction
Summarization
RMA
rma
quantile
pmonly
medianpolish (log2 scale)
MAS5
mas
constant
mas
mas (log2 scale)
MBEI
PM only
invariantset
pmonly or subtractmm
liwong
- Comparison of methods: compare the performance (power and FDR) of the methods using the data where the truth is known.
- R function expresso: combining the preprocessing methods together, but not every method can be combined.
- rma background correction adjusting only PM probe intensities should only be used in conjunction with the pmonly PM correction.
- subtractmm PM correction should not be used in conjunction with mas and medianpolish summarization methods because of likely negative corrections.
- Approaches: background correction, normalization, PM correction, and summarization
Analysis of Differentially Expressed Genes
- Approaches
- Parametric test: t-test
- Non-parametric tests: Wilcoxon sign-rank/rank-sum tests
- Linear models of microarrays (limma package):
- linear models and design matrix
- Contrasts and contrasts matrix
- ANOVA and MANOVA
- Multiple testings (p-value adjustments):
- FWER: Bonferroni
- FDR: Benjamini Hochberg
- Approaches
Clustering of Differentially Expressed Genes
- Annotation; Gene ontology
- Venn diagrams; clustering; classification
- Diagnostics
References
B. Bolstad (2014) Built-in Processing Methods
Affymetrix, Statistical algorithms description document, 2002.
Smyth (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments.
Wu et al. (2004) A model-based background adjustment for oligonucleotide expression arrays, JASA.
Bolstad et al. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics.
Irizarry et al. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics.
Workman et al. (2002) A new non-linear normalization method for reducing variability in DNA microarray experiments, Genome Biol.
Lazaridis et al. (2002) A simple method to improve probe set estimates from oligonucleotide arrays, Math Biosci.
Li and Wong (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection, PNAS.
Li and Wong (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application, Genome Biology