uSORT package is designed to uncover the intrinsic cell progression path from single-cell RNA-seq data. It incorporates data pre-processing, preliminary PCA gene selection, preliminary cell ordering, refined gene selection, refined cell ordering, and post-analysis interpretation and visualization. The schematic overview of the uSORT workflow is shown in the figure below:
The uSORT workflow can be applied through either the user-friendly GUI or calling the main function.
After the installation of the uSORT pacakge, the GUI can be easily launched by a single command.
require(uSORT)
## Loading required package: uSORT
## Loading required package: tcltk
## No methods found in package 'BiocGenerics' for request: 'clusterEvalQ' when loading 'uSORT'
## No methods found in package 'BiocGenerics' for request: 'parCapply' when loading 'uSORT'
## No methods found in package 'BiocGenerics' for request: 'parRapply' when loading 'uSORT'
# uSORT_GUI()
On mac, the GUI will appear as shown below:
On the GUI, user can choose their input file (currently support TPM and CPM format in txt file), specify the priliminary sorting method and refined sorting method. By click the parameter button, user can further customize the parameters for each method. A parameter panel for autoSPIN
method appears like below:
In the main GUI window, give a project name and choose the result path, then click submit. The program will run and details will be printed on the R console. Once the analysis is done, results will be saved under the selected result path.
User can also directly call the main function named uSORT
of the pacakge. The documentation file can be extracted using command ?uSORT
. The usage and parameters of uSORT
function is shown below:
args(uSORT)
## function (exprs_file, log_transform = TRUE, remove_outliers = TRUE,
## preliminary_sorting_method = c("autoSPIN", "sWanderlust",
## "monocle", "Wanderlust", "SPIN", "none"), refine_sorting_method = c("autoSPIN",
## "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"),
## project_name = "uSORT", result_directory = getwd(), nCores = 1,
## save_results = TRUE, reproduce_seed = 1234, scattering_cutoff_prob = 0.75,
## driving_force_cutoff = NULL, qval_cutoff_featureSelection = 0.05,
## pre_data_type = c("linear", "cyclical"), pre_SPIN_option = c("STS",
## "neighborhood"), pre_SPIN_sigma_width = 1, pre_autoSPIN_alpha = 0.2,
## pre_autoSPIN_randomization = 20, pre_wanderlust_start_cell = NULL,
## pre_wanderlust_dfmap_components = 4, pre_wanderlust_l = 15,
## pre_wanderlust_num_waypoints = 150, pre_wanderlust_waypoints_seed = 2711,
## pre_wanderlust_flock_waypoints = 2, ref_data_type = c("linear",
## "cyclical"), ref_SPIN_option = c("STS", "neighborhood"),
## ref_SPIN_sigma_width = 1, ref_autoSPIN_alpha = 0.2, ref_autoSPIN_randomization = 20,
## ref_wanderlust_start_cell = NULL, ref_wanderlust_dfmap_components = 4,
## ref_wanderlust_l = 15, ref_wanderlust_num_waypoints = 150,
## ref_wanderlust_flock_waypoints = 2, ref_wanderlust_waypoints_seed = 2711)
## NULL
Runing the pacakge through the GUI is quite straightforward, so here we demo the usage of the main function with an example:
dir <- system.file('extdata', package='uSORT')
file <- list.files(dir, pattern='.txt$', full=TRUE)
# uSORT_results <- uSORT(exprs_file = file,
# log_transform = TRUE,
# remove_outliers = TRUE,
# project_name = "uSORT_example",
# preliminary_sorting_method = "autoSPIN",
# refine_sorting_method = "sWanderlust",
# result_directory = getwd(),
# save_results = TRUE,
# reproduce_seed = 1234)
When the analysis is done, the results will be returned in a list:
#str(uSORT_results)
# List of 7
# $ exp_raw : num [1:251, 1:43280] 1.08 0 0 0.62 0 0 0 0.27 1.16 0 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:251] "RMD119" "RMD087" "RMD078" "RMD225" ...
# .. ..$ : chr [1:43280] "0610005C13Rik" "0610007P14Rik" "0610009B22Rik" "0610009E02Rik" ...
# $ trimmed_log2exp : num [1:241, 1:9918] 4.82 0 0 2.77 5.84 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:241] "RMD119" "RMD087" "RMD078" "RMD225" ...
# .. ..$ : chr [1:9918] "0610007P14Rik" "0610009B22Rik" "0610009E02Rik" "0610009O20Rik" ...
# $ preliminary_sorting_genes : chr [1:650] "1110038B12Rik" "1190002F15Rik" "2810417H13Rik" "5430435G22Rik" ...
# $ preliminary_sorting_order : chr [1:241] "RMD196" "RMD236" "RMD250" "RMD220" ...
# $ refined_sorting_genes : chr [1:320] "Mpo" "H2-Aa" "Cd74" "H2-Ab1" ...
# $ refined_sorting_order : chr [1:241] "RMD271" "RMD272" "RMD265" "RMD295" ...
# $ driverGene_refinedOrder_log2exp: num [1:241, 1:320] 13.16 10.77 12.17 9.82 9.77 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:241] "RMD271" "RMD272" "RMD265" "RMD295" ...
# .. ..$ : chr [1:320] "Mpo" "H2-Aa" "Cd74" "H2-Ab1" ...
And if save_results = TRUE
, several result files will be saved:
uSORT_example_final_driver_genes_profiles.pdf:
uSORT_example_distance_heatmap_preliminary.pdf:
uSORT_example_distance_heatmap_refined.pdf:
If the cell type and signature genes are known, the reuslts can be validated with these information:
# sig_genes <- read.table(file.path(system.file('extdata', package='uSORT'), 'signature_genes.txt'))
# sig_genes <- as.character(sig_genes[,1])
# spl_annotat <- read.table(file.path(system.file('extdata', package='uSORT'), 'celltype.txt'),header=T)
pre_log2ex <- uSORT_results$trimmed_log2exp[rev(uSORT_results$preliminary_sorting_order), ]
m <- spl_annotat[match(rownames(pre_log2ex), spl_annotat$SampleID), ]
celltype_color <- c('blue','red','black')
celltype <- c('MDP','CDP','PreDC')
cell_color <- celltype_color[match(m$GroupID, celltype)]
sigGenes_log2ex <- t(pre_log2ex[ ,colnames(pre_log2ex) %in% sig_genes])
fileNm <- paste0(project_name, '_signatureGenes_profiles_preliminary.pdf')
heatmap.2(as.matrix(sigGenes_log2ex),
dendrogram='row',
trace='none',
col = bluered,
Rowv=T,Colv=F,
scale = 'row',
cexRow=1.8,
ColSideColors=cell_color,
margins = c(8, 8))
legend("topright",
legend=celltype,
col=celltype_color,
pch=20,
horiz=T,
bty= "n",
inset=c(0,-0.01),
pt.cex=1.5)
ref_log2ex <- uSORT_results$trimmed_log2exp[uSORT_results$refined_sorting_order, ]
m <- spl_annotat[match(rownames(ref_log2ex), spl_annotat$SampleID), ]
celltype_color <- c('blue','red','black')
celltype <- c('MDP','CDP','PreDC')
cell_color <- celltype_color[match(m$GroupID, celltype)]
sigGenes_log2ex <- t(ref_log2ex[ ,colnames(ref_log2ex) %in% sig_genes])
fileNm <- paste0(project_name, '_signatureGenes_profiles_refine.pdf')
heatmap.2(as.matrix(sigGenes_log2ex),
dendrogram='row',
trace='none',
col = bluered,
Rowv=T,Colv=F,
scale = 'row',
cexRow=1.8,
ColSideColors=cell_color,
margins = c(8, 8))
legend("topright",
legend=celltype,
col=celltype_color,
pch=20,
horiz=T,
bty= "n",
inset=c(0,-0.01),
pt.cex=1.5)
sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] tcltk stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] uSORT_1.22.0 BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] matrixStats_0.62.0 bitops_1.0-7 DDRTree_0.1.5
## [4] RColorBrewer_1.1-3 prabclus_2.3-2 docopt_0.7.1
## [7] tools_4.2.0 bslib_0.3.1 utf8_1.2.2
## [10] R6_2.5.1 irlba_2.3.5 KernSmooth_2.23-20
## [13] DBI_1.1.2 BiocGenerics_0.42.0 colorspace_2.0-3
## [16] nnet_7.3-17 gridExtra_2.3 tidyselect_1.1.2
## [19] compiler_4.2.0 cli_3.3.0 Biobase_2.56.0
## [22] bookdown_0.26 slam_0.1-50 sass_0.4.1
## [25] diptest_0.76-0 caTools_1.18.2 scales_1.2.0
## [28] DEoptimR_1.0-11 robustbase_0.95-0 stringr_1.4.0
## [31] digest_0.6.29 sparsesvd_0.2 rmarkdown_2.14
## [34] pkgconfig_2.0.3 htmltools_0.5.2 limma_3.52.0
## [37] fastmap_1.1.0 rlang_1.0.2 VGAM_1.1-6
## [40] jquerylib_0.1.4 generics_0.1.2 combinat_0.0-8
## [43] jsonlite_1.8.0 mclust_5.4.9 gtools_3.9.2
## [46] dplyr_1.0.8 magrittr_2.0.3 modeltools_0.2-23
## [49] Matrix_1.4-1 Rcpp_1.0.8.3 munsell_0.5.0
## [52] fansi_1.0.3 viridis_0.6.2 lifecycle_1.0.1
## [55] stringi_1.7.6 yaml_2.3.5 MASS_7.3-57
## [58] plyr_1.8.7 flexmix_2.3-17 gplots_3.1.3
## [61] Rtsne_0.16 grid_4.2.0 parallel_4.2.0
## [64] crayon_1.5.1 lattice_0.20-45 splines_4.2.0
## [67] knitr_1.38 pillar_1.7.0 igraph_1.3.1
## [70] fpc_2.2-9 reshape2_1.4.4 stats4_4.2.0
## [73] glue_1.6.2 evaluate_0.15 leidenbase_0.1.11
## [76] BiocManager_1.30.17 vctrs_0.4.1 gtable_0.3.0
## [79] RANN_2.6.1 purrr_0.3.4 kernlab_0.9-30
## [82] assertthat_0.2.1 ggplot2_3.3.5 xfun_0.30
## [85] monocle_2.24.0 RSpectra_0.16-1 viridisLite_0.4.0
## [88] qlcMatrix_0.9.7 class_7.3-20 HSMMSingleCell_1.15.0
## [91] tibble_3.1.6 pheatmap_1.0.12 cluster_2.1.3
## [94] fastICA_1.2-3 ellipsis_0.3.2