3551
Comment:
|
3554
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
== High-Throughput Gene Expression Assays == | <<TableOfContents(2)>> == RNA Sequencing == |
RNA-seq Data Analysis
Contents
RNA Sequencing
- Next-generation sequencing (NGS)
- Platforms:
Illumina/Solexa's Genome Analyzer, HiSeq systems, MiSeq etc.
- Applied Biosystems' SOLiD
- Roche's 454 Life Sciences
Helicos BioSciences' HeliScope
- Terminology
- Sequencing Depth or Coverage: Total number of reads mapped to the genome/transcriptome, also known as library size.
- Transcript/gene length: Number of bases in a gene.
- Read counts: Number of reads mapping to that gene/transcript (expression measurement).
- Illumina's sequencing technology
- One flow cell: 8 lanes
- One lane is often used for the control sample.
- Multiplexing:
- a way to save money by sequencing multiple samples on a single unit (an Illumina's flow cell)
- offers the exibility to construct balanced blocked designs for the purpose of testing differential expression.
- Barcoding:
- to separate inputs, can have many barcodes in a single unit
- 12 different samples can be indexed with unique subsequences and loaded onto each lane. In total, 96 samples can be sequenced per run.
- the output can be deconvoluted to individual samples.
- Variations
- Different genes have different variances and are potentially subject to different errors and biases.
- Sources of variation affecting only a minority of genes should be integrated into the design as well (PCR-based GC bias). Complexity of the library.
- Technical variability (experimental errors and biases): Two main sources of variation that may contribute to confounding effects:
- Batch effects: errors that occur after random fragmentation of the RNA until it is input to the flow cell (PCR, reverse transcription).
- Lane effects: errors that occur from the flow cell until obtaining the data from the sequencing machine (bad sequencing cycles, base-calling)
- Biological variability
RNA Sequencing Pipeline
RNA Sequencing Experimental Design
- Aims
- estimate the biological variation (using biological replicates)
- avoid or reduce the technical variation (using experimental design or sequencing design)
- Sequencing design (reads, depth, variability)
- Sampling: Subject sampling, RNA sampling, and fragment sampling
- Randomization: assigning individuals at random to groups (reduce the sample variability or variation)
- Replication: The biological replicates allow for the estimation of within-treatment group (biological) variability, provide information that is necessary for making inferences between treatment groups.
- Blocking: Experimental units are grouped into homogeneous clusters
- Modes of Sequencing
- Single-end Read: One read sequenced from one end of each cDNA insert
- Paired-end Read: two reads sequenced from each cDNA sample insert (one from each end)
- The reads are typically 30 ~ 400 bp, depending on the DNA-sequencing technology used.
- The costs of paired end sequencing are higher than single end sequencing
- Balanced Block Designs
- Barcoding: DNA fragments can be labeled or barcoded with sample specific sequences that allow multiple samples to be included in the same sequencing.
- Multiplexing
- Pooling: All the samples of RNA are pooled into the same batch and then sequenced in one lane of a flow cell.
Balanced incomplete block designs (BIBD) (the samples cannot be in one lane when treatments > barcodes)