UCell 2.0.1
In single-cell RNA-seq analysis, gene signature (or “module”) scoring constitutes a simple yet powerful approach to evaluate the strength of biological signals, typically associated to a specific cell type or biological process, in a transcriptome.
UCell is an R package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with Seurat objects.
Install UCell through BiocManager:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("UCell")
To test your installation, load a small sample dataset and run UCell:
library(UCell)
data(sample.matrix)
gene.sets <- list(Tcell_signature = c("CD2","CD3E","CD3D"),
Myeloid_signature = c("SPI1","FCER1G","CSF1R"))
scores <- ScoreSignatures_UCell(sample.matrix, features=gene.sets)
head(scores)
## Tcell_signature_UCell Myeloid_signature_UCell
## L5_ATTTCTGAGGTCGTGA 0.8993333 0
## L4_TCACTATTCATCTCTA 0.6904444 0
## L1_TCCTTCTTCTTTACAC 0.5935556 0
## L5_AAAGTGAAGGCGCTCT 0.6093333 0
## E2L3_CCTCAGTAGTGCAGGT 0.8508889 0
## L5_CCCTCTCGTTCTAAGC 0.6562222 0
For this demo, we will download a single-cell dataset of lung cancer (Zilionis et al. (2019) Immunity) through the scRNA-seq package. This dataset contains >170,000 single cells; for the sake of simplicity, in this demo will we focus on immune cells, according to the annotations by the authors, and downsample to 5000 cells.
library(scRNAseq)
#For this demo, limit the analysis to 5000 immune cells
lung <- ZilionisLungData()
immune <- lung$Used & lung$used_in_NSCLC_immune
lung <- lung[,immune]
lung <- lung[,1:5000]
exp.mat <- Matrix::Matrix(counts(lung),sparse = TRUE)
Here we define some simple gene sets based on the “Human Cell Landscape” signatures Han et al. (2020) Nature. You may edit existing signatures, or add new one as elements in a list.
signatures <- list(
Immune = c("PTPRC"),
Tcell = c("CD3D","CD3E","CD3G","CD2","TRAC"),
Bcell = c("MS4A1","CD79A","CD79B","CD19","BANK1"),
Myeloid = c("CD14","LYZ","CSF1R","FCER1G","SPI1","LCK-"),
NK = c("KLRD1","NCAM1","NKG7","CD3D-","CD3E-"),
Plasma_cell = c("IGKC","IGHG3","IGHG1","IGHA1","CD19-")
)
Run ScoreSignatures_UCell
and get directly signature scores for all cells
u.scores <- ScoreSignatures_UCell(exp.mat,features=signatures)
u.scores[1:10,1:3]
## Immune_UCell Tcell_UCell Bcell_UCell
## bcHTNA 0.4493333 0.0000000 0.00000000
## bcHNVA 0.5900000 0.0000000 0.00000000
## bcALZN 0.6473333 0.0000000 0.00000000
## bcFWBP 0.4156667 0.0000000 0.08446667
## bcBJYE 0.9346667 0.0000000 0.00000000
## bcGSBJ 0.9096667 0.0000000 0.00000000
## bcHQGJ 0.8860000 0.0000000 0.00000000
## bcHKKM 0.8590000 0.1282667 0.00000000
## bcIGQU 0.0000000 0.4175333 0.00000000
## bcDVGG 0.6413333 0.3861333 0.00000000
Show the distribution of predicted scores
library(reshape2)
library(ggplot2)
melted <- reshape2::melt(u.scores)
colnames(melted) <- c("Cell","Signature","UCell_score")
p <- ggplot(melted, aes(x=Signature, y=UCell_score)) +
geom_violin(aes(fill=Signature), scale = "width") +
geom_boxplot(width=0.1, outlier.size=0) +
theme_bw() + theme(axis.text.x=element_blank())
p
The time- and memory-demanding step in UCell is the calculation of gene rankings for each individual cell. If we plan to experiment with signatures, editing them or adding new cell subtypes, it is possible to pre-calculate the gene rankings once and for all and then apply new signatures over these pre-calculated ranks. Run the StoreRankings_UCell
function to pre-calculate gene rankings over a dataset:
set.seed(123)
ranks <- StoreRankings_UCell(exp.mat)
ranks[1:5,1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
## bcHTNA bcHNVA bcALZN bcFWBP bcBJYE
## 5S_rRNA . . . . .
## 5_8S_rRNA . . . . .
## 7SK . . . . .
## A1BG . . . . .
## A1BG-AS1 . . . . .
Then, we can apply our signature set, or any other new signature to the pre-calculated ranks. The calculations will be considerably faster.
set.seed(123)
u.scores.2 <- ScoreSignatures_UCell(features=signatures,
precalc.ranks = ranks)
melted <- reshape2::melt(u.scores.2)
colnames(melted) <- c("Cell","Signature","UCell_score")
p <- ggplot(melted, aes(x=Signature, y=UCell_score)) +
geom_violin(aes(fill=Signature), scale = "width") +
geom_boxplot(width=0.1, outlier.size = 0) +
theme_bw() + theme(axis.text.x=element_blank())
p
new.signatures <- list(Mast.cell = c("TPSAB1","TPSB2","CPA3","MS4A2"),
Lymphoid = c("LCK"))
u.scores.3 <- ScoreSignatures_UCell(features=new.signatures,
precalc.ranks = ranks)
melted <- reshape2::melt(u.scores.3)
colnames(melted) <- c("Cell","Signature","UCell_score")
p <- ggplot(melted, aes(x=Signature, y=UCell_score)) +
geom_violin(aes(fill=Signature), scale = "width") +
geom_boxplot(width=0.1, outlier.size=0) +
theme_bw() + theme(axis.text.x=element_blank())
p
If your machine has multi-core capabilities and enough RAM, running UCell in parallel can speed up considerably your analysis. The example below runs on a single core - you may modify this behavior by setting e.g. workers=4
to parallelize to 4 cores:
BPPARAM <- BiocParallel::MulticoreParam(workers=1)
u.scores <- ScoreSignatures_UCell(exp.mat,features=signatures,
BPPARAM=BPPARAM)
SingleCellExperiment and Seurat are popular environments for single-cell analysis. The UCell package implements functions to interact directly with these pipelines, as described below.
The function ScoreSignatures_UCell()
allows operating directly on sce
objects. UCell scores are returned in a altExp object (altExp(sce, 'UCell'
))
library(SingleCellExperiment)
sce <- SingleCellExperiment(list(counts=exp.mat))
sce <- ScoreSignatures_UCell(sce, features=signatures,
assay = 'counts', name=NULL)
altExp(sce, 'UCell')
## class: SummarizedExperiment
## dim: 6 5000
## metadata(0):
## assays(1): UCell
## rownames(6): Immune Tcell ... NK Plasma_cell
## rowData names(0):
## colnames(5000): bcHTNA bcHNVA ... bcGVZB bcHGKL
## colData names(0):
Dimensionality reduction and visualization
library(scater)
library(patchwork)
#PCA
sce <- logNormCounts(sce)
sce <- runPCA(sce, scale=TRUE, ncomponents=20)
#UMAP
set.seed(1234)
sce <- runUMAP(sce, dimred="PCA")
Visualize UCell scores on low-dimensional representation (UMAP)
pll <- lapply(names(signatures), function(x) {
plotUMAP(sce, colour_by = x, by_exprs_values = "UCell",
point_size=0.2)
})
wrap_plots(pll)
The function AddModuleScore_UCell()
allows operating directly on Seurat objects. UCell scores are returned as metadata columns in the Seurat object. To see how this function differs from Seurat’s own AddModuleScore()
(not based on per-cell ranks) see this vignette
library(Seurat)
seurat.object <- CreateSeuratObject(counts = exp.mat,
project = "Zilionis_immune")
seurat.object <- AddModuleScore_UCell(seurat.object,
features=signatures, name=NULL)
head(seurat.object[[]])
## orig.ident nCount_RNA nFeature_RNA Immune Tcell Bcell
## bcHTNA Zilionis_immune 7516 2613 0.4493333 0 0.00000000
## bcHNVA Zilionis_immune 5684 1981 0.5900000 0 0.00000000
## bcALZN Zilionis_immune 4558 1867 0.6473333 0 0.00000000
## bcFWBP Zilionis_immune 2915 1308 0.4156667 0 0.08446667
## bcBJYE Zilionis_immune 3576 1548 0.9346667 0 0.00000000
## bcGSBJ Zilionis_immune 2796 1270 0.9096667 0 0.00000000
## Myeloid NK Plasma_cell
## bcHTNA 0.5234000 0 0.00000000
## bcHNVA 0.5120000 0 0.01991667
## bcALZN 0.3593333 0 0.00000000
## bcFWBP 0.1558000 0 0.00000000
## bcBJYE 0.4639333 0 0.00000000
## bcGSBJ 0.5460000 0 0.00000000
Generate PCA and UMAP embeddings
seurat.object <- NormalizeData(seurat.object)
seurat.object <- FindVariableFeatures(seurat.object,
selection.method = "vst", nfeatures = 500)
seurat.object <- ScaleData(seurat.object)
seurat.object <- RunPCA(seurat.object, npcs = 20,
features=VariableFeatures(seurat.object))
seurat.object <- RunUMAP(seurat.object, reduction = "pca",
dims = 1:20, seed.use=123)
Visualize UCell scores on low-dimensional representation (UMAP)
FeaturePlot(seurat.object, reduction = "umap",
features = names(signatures), ncol=3, order=TRUE)
Please report any issues with UCell at its GitHub repository. More demos available at UCell demo repository
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] sp_1.5-0 SeuratObject_4.1.0
## [3] Seurat_4.1.1 patchwork_1.1.1
## [5] scater_1.24.0 scuttle_1.6.2
## [7] ggplot2_3.3.6 reshape2_1.4.4
## [9] scRNAseq_2.10.0 SingleCellExperiment_1.18.0
## [11] SummarizedExperiment_1.26.1 Biobase_2.56.0
## [13] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
## [15] IRanges_2.30.0 S4Vectors_0.34.0
## [17] BiocGenerics_0.42.0 MatrixGenerics_1.8.0
## [19] matrixStats_0.62.0 UCell_2.0.1
## [21] BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.25
## [3] tidyselect_1.1.2 htmlwidgets_1.5.4
## [5] RSQLite_2.2.14 AnnotationDbi_1.58.0
## [7] grid_4.2.0 BiocParallel_1.30.3
## [9] Rtsne_0.16 munsell_0.5.0
## [11] ScaledMatrix_1.4.0 codetools_0.2-18
## [13] ica_1.0-2 miniUI_0.1.1.1
## [15] future_1.26.1 withr_2.5.0
## [17] spatstat.random_2.2-0 colorspace_2.0-3
## [19] progressr_0.10.1 filelock_1.0.2
## [21] highr_0.9 knitr_1.39
## [23] ROCR_1.0-11 tensor_1.5
## [25] listenv_0.8.0 labeling_0.4.2
## [27] GenomeInfoDbData_1.2.8 polyclip_1.10-0
## [29] bit64_4.0.5 farver_2.1.0
## [31] parallelly_1.32.0 vctrs_0.4.1
## [33] generics_0.1.2 xfun_0.31
## [35] BiocFileCache_2.4.0 R6_2.5.1
## [37] ggbeeswarm_0.6.0 rsvd_1.0.5
## [39] AnnotationFilter_1.20.0 spatstat.utils_2.3-1
## [41] bitops_1.0-7 cachem_1.0.6
## [43] DelayedArray_0.22.0 assertthat_0.2.1
## [45] promises_1.2.0.1 BiocIO_1.6.0
## [47] scales_1.2.0 rgeos_0.5-9
## [49] beeswarm_0.4.0 gtable_0.3.0
## [51] beachmat_2.12.0 globals_0.15.0
## [53] goftest_1.2-3 ensembldb_2.20.2
## [55] rlang_1.0.2 splines_4.2.0
## [57] rtracklayer_1.56.0 lazyeval_0.2.2
## [59] spatstat.geom_2.4-0 abind_1.4-5
## [61] BiocManager_1.30.18 yaml_2.3.5
## [63] GenomicFeatures_1.48.3 httpuv_1.6.5
## [65] tools_4.2.0 bookdown_0.27
## [67] spatstat.core_2.4-4 ellipsis_0.3.2
## [69] jquerylib_0.1.4 RColorBrewer_1.1-3
## [71] ggridges_0.5.3 Rcpp_1.0.8.3
## [73] plyr_1.8.7 sparseMatrixStats_1.8.0
## [75] progress_1.2.2 zlibbioc_1.42.0
## [77] purrr_0.3.4 RCurl_1.98-1.7
## [79] prettyunits_1.1.1 rpart_4.1.16
## [81] deldir_1.0-6 pbapply_1.5-0
## [83] viridis_0.6.2 cowplot_1.1.1
## [85] zoo_1.8-10 ggrepel_0.9.1
## [87] cluster_2.1.3 magrittr_2.0.3
## [89] scattermore_0.8 data.table_1.14.2
## [91] RSpectra_0.16-1 magick_2.7.3
## [93] lmtest_0.9-40 RANN_2.6.1
## [95] ProtGenerics_1.28.0 fitdistrplus_1.1-8
## [97] hms_1.1.1 mime_0.12
## [99] evaluate_0.15 xtable_1.8-4
## [101] XML_3.99-0.10 gridExtra_2.3
## [103] compiler_4.2.0 biomaRt_2.52.0
## [105] tibble_3.1.7 KernSmooth_2.23-20
## [107] crayon_1.5.1 htmltools_0.5.2
## [109] mgcv_1.8-40 later_1.3.0
## [111] tidyr_1.2.0 DBI_1.1.3
## [113] ExperimentHub_2.4.0 dbplyr_2.2.0
## [115] MASS_7.3-57 rappdirs_0.3.3
## [117] Matrix_1.4-1 cli_3.3.0
## [119] parallel_4.2.0 igraph_1.3.2
## [121] pkgconfig_2.0.3 GenomicAlignments_1.32.0
## [123] spatstat.sparse_2.1-1 plotly_4.10.0
## [125] xml2_1.3.3 vipor_0.4.5
## [127] bslib_0.3.1 XVector_0.36.0
## [129] stringr_1.4.0 digest_0.6.29
## [131] sctransform_0.3.3 RcppAnnoy_0.0.19
## [133] spatstat.data_2.2-0 Biostrings_2.64.0
## [135] rmarkdown_2.14 leiden_0.4.2
## [137] uwot_0.1.11 DelayedMatrixStats_1.18.0
## [139] restfulr_0.0.15 curl_4.3.2
## [141] shiny_1.7.1 Rsamtools_2.12.0
## [143] rjson_0.2.21 nlme_3.1-158
## [145] lifecycle_1.0.1 jsonlite_1.8.0
## [147] BiocNeighbors_1.14.0 viridisLite_0.4.0
## [149] fansi_1.0.3 pillar_1.7.0
## [151] lattice_0.20-45 KEGGREST_1.36.2
## [153] fastmap_1.1.0 httr_1.4.3
## [155] survival_3.3-1 interactiveDisplayBase_1.34.0
## [157] glue_1.6.2 png_0.1-7
## [159] BiocVersion_3.15.2 bit_4.0.4
## [161] stringi_1.7.6 sass_0.4.1
## [163] blob_1.2.3 BiocSingular_1.12.0
## [165] AnnotationHub_3.4.0 memoise_2.0.1
## [167] dplyr_1.0.9 irlba_2.3.5
## [169] future.apply_1.9.0