741
Comment:
|
5112
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
= Tempora: cell trajectory inference using time-series single-cell RNA sequencing data = This page contains supplementary data for: |
|
Line 3: | Line 5: |
= Tempora: cell trajectory inference using time-series single-cell RNA sequencing data = | ''Tempora: cell trajectory inference using time-series single-cell RNA sequencing data'' Thinh N. Tran+, Gary D. Bader* Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada The Donnelly Centre for Cellular and Biomolecular Research +Current address: Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY, USA *Corresponding author |
Line 7: | Line 19: |
'''Tempora source code is at [[https://github.com/BaderLab/Tempora]]''' | '''Tempora source code and vignettes can be accessed at https://github.com/BaderLab/Tempora.''' |
Line 9: | Line 21: |
== This page contains sample data for Tempora == | == Sample Data for Tempora == The Tempora package was validated using two datasets: an ''in vitro'' differentiation of human skeletal muscle myoblasts and ''in vivo'' early development of murine cerebral cortex. For both datasets, cells from all timepoints were filtered to remove low-quality reads, normalized with scran and corrected for batch effect using Harmony. All cells were then iteratively clustered until the number of differentially expressed genes between neighboring clusters reached 0. The datasets were exported as Seurat objects, ready to be imported into Tempora. === Human skeletal muscle myoblast data (HSMM) === The HSMM dataset contains approximately 271 cells collected at 0, 24, 48 and 72 hours after the switch of human myoblast culture from growth to differentiation media. Cells were sequenced using Fluidigm C1. Raw sequencing reads can be accessed in the Gene Expression Omnibus, accession number GSE52529. Original reference: Trapnell, Cole, et al. "The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells." ''Nature biotechnology'' 32.4 (2014): 381. doi:https://doi.org/10.1038/nbt.2859. |
Line 13: | Line 31: |
=== Murine cerebral cortex data (MouseCortex) === The MouseCortex dataset contains approximately 6,000 neural cells collected at embryonic days 11.5 (E11.5), E13.5, E15.5 and E17.58. Cells were sequenced using DropSeq. These cells cover a wide spectrum of neuronal development, from the early precursors (apical and radial precursors) to intermediate progenitors and differentiated cortical neurons. As per the original publication, data was filtered to remove non-cortical cells, as done in the original publication. Removed cells included cells expressing Aif1 (microglia), hemoglobin genes (blood cells), collagen genes (mesenchymal cells), as well as Dlx transcription factors and/or interneuron genes (ganglionic eminence-derived cells). All retained cells were then iteratively clustered as described above. Raw sequencing reads can be accessed in the Gene Expression Omnibus, accession number GSE107122. Original reference: Yuzwa, Scott A., et al. "Developmental emergence of adult neural stem cells as revealed by single-cell transcriptional profiling." ''Cell reports ''21.13 (2017): 3970-3986. doi:https://doi.org/10.1016/j.celrep.2017.12.017. |
|
Line 14: | Line 37: |
== File Format == Both datasets are packaged as Seurat objects (https://satijalab.org/seurat/) with the following slots: . '''raw.data''': a sparse matrix containing raw sequencing reads of cells from all time points. Each column represents a cell and each row represents a gene. . '''data*''': a matrix containing processed counts of cells (after filter, normalization and batch effect correction) from all time points . '''scale.data''': a matrix containing scaled expression of all genes, required for downstream principal component analysis. . '''var.genes''': a list of genes identified as variable genes across all cells. . '''ident''': a vector containing the cluster identity of all cells at the chosen resolution (1.5 for HSMM and 0.6 for MouseCortex data) . '''meta.data*''': a data frame containing various meta data information for all cells, including the number of genes expressed, number of UMIs detected, the collection time point of each cell, and cluster identity of cells at various clustering resolutions. . '''dr''': a list of key results from various dimensionality reduction techniques applied on the dataset, including principal component analysis (accessible at dr$pca), tSNE (accessible at dr$tsne) and Harmony (accessible at dr$harmony). . '''hvg.info''': a data frame containing information about the most highly variable genes across all cells. All slots can be accessed by object_name@slot_name. Slots required by Tempora are denoted with an asterisk. . '''calc.params''': a list of parameters used in each analysis step to produce the final Seurat objects. |
Tempora: cell trajectory inference using time-series single-cell RNA sequencing data
This page contains supplementary data for:
Tempora: cell trajectory inference using time-series single-cell RNA sequencing data
Thinh N. Tran+, Gary D. Bader*
Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
The Donnelly Centre for Cellular and Biomolecular Research
+Current address: Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY, USA
*Corresponding author
Tempora is a novel cell trajectory inference method that orders cells using time information from time-series scRNAseq data. Tempora uses biological pathway information to help identify cell type relationships and can identify important time-dependent pathways to help interpret the inferred trajectory.
Tempora source code and vignettes can be accessed at https://github.com/BaderLab/Tempora.
Sample Data for Tempora
The Tempora package was validated using two datasets: an in vitro differentiation of human skeletal muscle myoblasts and in vivo early development of murine cerebral cortex. For both datasets, cells from all timepoints were filtered to remove low-quality reads, normalized with scran and corrected for batch effect using Harmony. All cells were then iteratively clustered until the number of differentially expressed genes between neighboring clusters reached 0. The datasets were exported as Seurat objects, ready to be imported into Tempora.
Human skeletal muscle myoblast data (HSMM)
The HSMM dataset contains approximately 271 cells collected at 0, 24, 48 and 72 hours after the switch of human myoblast culture from growth to differentiation media. Cells were sequenced using Fluidigm C1. Raw sequencing reads can be accessed in the Gene Expression Omnibus, accession number GSE52529.
Original reference: Trapnell, Cole, et al. "The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells." Nature biotechnology 32.4 (2014): 381. doi:https://doi.org/10.1038/nbt.2859.
Human muscle cell development time course scRNA-seq data (99.5 MB)
Murine cerebral cortex data (MouseCortex)
The MouseCortex dataset contains approximately 6,000 neural cells collected at embryonic days 11.5 (E11.5), E13.5, E15.5 and E17.58. Cells were sequenced using DropSeq. These cells cover a wide spectrum of neuronal development, from the early precursors (apical and radial precursors) to intermediate progenitors and differentiated cortical neurons. As per the original publication, data was filtered to remove non-cortical cells, as done in the original publication. Removed cells included cells expressing Aif1 (microglia), hemoglobin genes (blood cells), collagen genes (mesenchymal cells), as well as Dlx transcription factors and/or interneuron genes (ganglionic eminence-derived cells). All retained cells were then iteratively clustered as described above. Raw sequencing reads can be accessed in the Gene Expression Omnibus, accession number GSE107122.
Original reference: Yuzwa, Scott A., et al. "Developmental emergence of adult neural stem cells as revealed by single-cell transcriptional profiling." Cell reports 21.13 (2017): 3970-3986. doi:https://doi.org/10.1016/j.celrep.2017.12.017.
Mouse brain cortex development time course scRNA-seq data (1.02 GB)
File Format
Both datasets are packaged as Seurat objects (https://satijalab.org/seurat/) with the following slots:
raw.data: a sparse matrix containing raw sequencing reads of cells from all time points. Each column represents a cell and each row represents a gene.
data*: a matrix containing processed counts of cells (after filter, normalization and batch effect correction) from all time points
scale.data: a matrix containing scaled expression of all genes, required for downstream principal component analysis.
var.genes: a list of genes identified as variable genes across all cells.
ident: a vector containing the cluster identity of all cells at the chosen resolution (1.5 for HSMM and 0.6 for MouseCortex data)
meta.data*: a data frame containing various meta data information for all cells, including the number of genes expressed, number of UMIs detected, the collection time point of each cell, and cluster identity of cells at various clustering resolutions.
dr: a list of key results from various dimensionality reduction techniques applied on the dataset, including principal component analysis (accessible at dr$pca), tSNE (accessible at dr$tsne) and Harmony (accessible at dr$harmony).
hvg.info: a data frame containing information about the most highly variable genes across all cells.
All slots can be accessed by object_name@slot_name. Slots required by Tempora are denoted with an asterisk.
calc.params: a list of parameters used in each analysis step to produce the final Seurat objects.