Augmenting reference subtype classifications using gene programs
An example quantifying proliferation, response to interferons and tissue-residency gene programs across T cell subtypes
Background
Cells responding to external cues can transiently activate gene programs, or groups of genes that are collectively involved in specific gene processes, such as cell division, or response to interferons. In scRNA-seq data, transcriptional variance can be largely explained by these transient gene expression programs masking differences between different cell types, and leading to the clustering of cells of different types.
Some typical gene programs observed in T cell scRNA-seq data:
Cell cycle (eg. MCM5, TOP2A)
Interferon (IFN) response (eg. MX1, ISG15)
Heat shock protein (HSP) response (eg. HSPA1, HSPA2)
TCR Activation (eg. JUN, FOS, NR4A1)
Tissue-residency a program associated to homing and/or retention of T cells in non-lymphoid tissues (eg. ZNF683, ITGAE)
Here we will show to use reference-based projection to classify cells into robust subtypes, irrespective of the activation of transient gene programs, but still recover this important information to more granularly define cell states.
In this example we will use data from Gueguen et al. 2021.
Data setup
Loading the reference map
First, let’s download and load the CD8 TIL reference map Seurat object
# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette
# Plotting
DimPlot(ref.cd8, group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)We can verify the reference map only contains “stable” or “basal” cell types (cells strongly expressing transient gene programs, such as cell division, IFN-response, and HSP-response were removed from the reference by design).
Load query data
Download and load Seurat object containing the pre-annotated Gueguen
dataset. In this example, the original author annotations are available
in the seurat_clusters metadata column
#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)
# DimPlot
DimPlot(gueguen.cd3, order = T, label = T, repel = T) ProjecTILs reference-based classification
Next we will run ProjecTILs
ProjecTILs.classifier function. This running mode will add
the predicted celltype labels as metadata
(i.e. functional.cluster), while keeping intact the
original low dimensional embeddings.
Here we will run the prediction separately for each patient/sample
contained in the dataset. To this end, we include the parameter
split.by indicating the metadata column containing the
sample information (in this example orig.ident).
# Projection
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
# Plotting
DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)table(gueguen.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 2132 3181 147 238 276
## CD8.TEX CD8.TPEX
## 3340 337
Augmenting cell states resolution by quantifying activation of gene programs
To further sub-classify T cell reference subtypes according to the activation of gene programs, we will use UCell. In this example we will also use SignatuR that stores useful gene signatures (e.g cell cycle).
library(SignatuR)
# Load Human signatures
signatuR.list <- GetSignature(SignatuR$Hs)
# Select cycling signature and merge the two cycling phases
signatures <- list()
signatures[["Cycling"]] <- unlist(signatuR.list[c("cellCycle.G1S","cellCycle.G2M")])
# for IFN and tissue-residecy we will simply define them manually
signatures[["IFN"]] <- c("ISG15","IFI6","IFI44L","MX1")
signatures[["Tissue_Resident"]] <- c("ITGAE")Now let’s compute the gene signature scores using
UCell
gueguen.cd3 <- AddModuleScore_UCell(gueguen.cd3, features = signatures, ncores = 8,
assay = "RNA")We can visualize these scores on the UMAP, and verify their agreement with the original author annotations (CD4/8-MCM5, CD4/8-TOP2A cell clusters with high cycling score, and CD4/8-ISG15 cluster with high IFN response score).
FeaturePlot(gueguen.cd3, features = paste0(names(signatures), "_UCell"),
order = T, pt.size = 0.3) & scale_color_paletteer_c("pals::coolwarm") ## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
We can now classify each cell into low/high for each gene program, according to their signature scores.
Signatures scores distributions
We can explore signatures scores distributions and set a threshold to classify cells into negative/positive for each gene program. Here we will use a global threshold of 0.2.
qplot(gueguen.cd3$Cycling_UCell, main = "Cycling") + qplot(gueguen.cd3$IFN_UCell, main = "IFN") + qplot(gueguen.cd3$Tissue_Resident_UCell, main = "Tissue Resident") & theme_bw()Example with the Cycling signature:
cycling.high <- subset(gueguen.cd3, subset = Cycling_UCell > 0.2)
cycling.low <- subset(gueguen.cd3, subset = Cycling_UCell <= 0.2)
cycling.list <- list(cycling.low,cycling.high)
names(cycling.list) <- c("cycling.low","cycling.high")Visualize using ProjecTILs plot.states.radar
function
plot.states.radar(ref = ref.cd8, query = cycling.list, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c("SELL","LEF1","TCF7","CCR7","KLF2","GZMK", "GZMA", "FCGR3A","FGFBP2","XCL1","XCL2", "KLRB1",'MCM5',"MCM3","TOP2A")) Here Cycling high and Cycling low cells display gene expression profiles in terms of marker genes, but differ in their expression of cell cycle genes, such as MCM5, MCM3 and TOP2A.
Now we can do this for the 4 signatures that we selected at the beginning.
wrap_plots(pll, guides = "collect")As previously shown, CD8.TEX and CD8.TPEX populations contain the highest frequency of proliferating, tissue-resident and IFN-responding cells.
Bonus track
Upset plot to check program co-occurences
Let’s focus on the CD8.TEX population, and see how the different gene programs relate using a UpSet plot, using UpSetR package.
gueguen.cd3.TEX <- subset(gueguen.cd3, subset = functional.cluster == "CD8.TEX")
listInput <- list(IFN = which(gueguen.cd3.TEX$IFN.class == "Positive"),
Cycling = which(gueguen.cd3.TEX$Cycling.class == "Positive"),
Tissue_resident = which(gueguen.cd3.TEX$Tissue_Resident.class == "Positive"))
upset(fromList(listInput), order.by = "freq")
grid.text("CD8.TEX gene programs overlap",x = 0.65, y=0.95, gp=gpar(fontsize=12))We can see that roughly one third of CD8.TEX with high tissue-residency program are also high for IFN-response program.
Comparison with the original annotation
In the original study, cycling cells are defined as “CD8.LAYN” and “CD8.GZMH”
Now let’s compare this with ProjecTILs classifications. First, let’s subset cells in the 2 cycling clusters CD4/8-TOP2A and CD4/8-MCM5.
gueguen.cycling <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD4/8-TOP2A', "CD4/8-MCM5"))
# Dimplot
DimPlot(gueguen.cycling, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We can have a look at clusters proportions in the original cycling clusters. It seems that there is more heterogeneity among cycling cells than previously observed, including Central Memory and Effector Memory T cells.
plot.statepred.composition(ref.cd8, gueguen.cycling, metric = "Percent") +
ggtitle("Cycling cells") + ylim(0, 75) + theme_bw() + RotatedAxis()Let’s check radar plots to confirm identities, where we included some markers genes for cell cycle: TOP2A, MCM5 and MCM3.
plot.states.radar(ref = ref.cd8, query = gueguen.cycling, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c("SELL","LEF1","TCF7","CCR7","KLF2","GZMK", "GZMA", "FCGR3A","FGFBP2","XCL1","KLRB1",'MCM5',"MCM3","TOP2A","PDCD1","TOX","HAVCR2"))The marker gene profiles support a correct classification.
Session Info
sessionInfo()## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices datasets utils methods
## [8] base
##
## other attached packages:
## [1] SignatuR_0.1.0 UpSetR_1.4.0 cowplot_1.1.1
## [4] plotly_4.10.1 pheatmap_1.0.12 UCell_2.2.0
## [7] paletteer_1.5.0 scales_1.2.1 ggrepel_0.9.2
## [10] gridExtra_2.3 ProjecTILs_3.0.2 patchwork_1.1.2
## [13] GEOquery_2.66.0 Biobase_2.58.0 BiocGenerics_0.44.0
## [16] data.table_1.14.6 STACAS_2.0.0 scGate_1.4.1
## [19] forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10
## [22] purrr_1.0.1 readr_2.1.3 tidyr_1.2.1
## [25] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
## [28] SeuratObject_4.1.3 Seurat_4.3.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 spatstat.explore_3.0-5
## [3] reticulate_1.27 tidyselect_1.2.0
## [5] htmlwidgets_1.6.1 BiocParallel_1.32.5
## [7] Rtsne_0.16 munsell_0.5.0
## [9] codetools_0.2-18 ica_1.0-3
## [11] umap_0.2.9.0 future_1.30.0
## [13] miniUI_0.1.1.1 withr_2.5.0
## [15] spatstat.random_3.1-2 colorspace_2.1-0
## [17] progressr_0.13.0 highr_0.10
## [19] knitr_1.41 rstudioapi_0.14
## [21] stats4_4.2.1 SingleCellExperiment_1.20.0
## [23] ROCR_1.0-11 tensor_1.5
## [25] listenv_0.9.0 labeling_0.4.2
## [27] MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9
## [29] polyclip_1.10-4 farver_2.1.1
## [31] parallelly_1.34.0 vctrs_0.5.2
## [33] generics_0.1.3 xfun_0.36
## [35] timechange_0.2.0 R6_2.5.1
## [37] GenomeInfoDb_1.34.7 rmdformats_1.0.4
## [39] pals_1.7 bitops_1.0-7
## [41] spatstat.utils_3.0-1 cachem_1.0.6
## [43] DelayedArray_0.24.0 assertthat_0.2.1
## [45] promises_1.2.0.1 googlesheets4_1.0.1
## [47] gtable_0.3.1 globals_0.16.2
## [49] goftest_1.2-3 rlang_1.0.6
## [51] splines_4.2.1 lazyeval_0.2.2
## [53] gargle_1.2.1 dichromat_2.0-0.1
## [55] prismatic_1.1.1 spatstat.geom_3.0-5
## [57] broom_1.0.2 BiocManager_1.30.19
## [59] yaml_2.3.7 reshape2_1.4.4
## [61] abind_1.4-5 modelr_0.1.10
## [63] backports_1.4.1 httpuv_1.6.8
## [65] tools_4.2.1 bookdown_0.32
## [67] ellipsis_0.3.2 jquerylib_0.1.4
## [69] RColorBrewer_1.1-3 ggridges_0.5.4
## [71] Rcpp_1.0.10 plyr_1.8.8
## [73] zlibbioc_1.44.0 RCurl_1.98-1.9
## [75] openssl_2.0.5 deldir_1.0-6
## [77] pbapply_1.7-0 S4Vectors_0.36.1
## [79] zoo_1.8-11 SummarizedExperiment_1.28.0
## [81] haven_2.5.1 cluster_2.1.4
## [83] fs_1.6.0 magrittr_2.0.3
## [85] RSpectra_0.16-1 scattermore_0.8
## [87] lmtest_0.9-40 reprex_2.0.2
## [89] RANN_2.6.1 googledrive_2.0.0
## [91] fitdistrplus_1.1-8 matrixStats_0.63.0
## [93] hms_1.1.2 mime_0.12
## [95] evaluate_0.20 xtable_1.8-4
## [97] readxl_1.4.1 IRanges_2.32.0
## [99] compiler_4.2.1 maps_3.4.1
## [101] KernSmooth_2.23-20 crayon_1.5.2
## [103] htmltools_0.5.4 later_1.3.0
## [105] tzdb_0.3.0 lubridate_1.9.0
## [107] DBI_1.1.3 dbplyr_2.3.0
## [109] MASS_7.3-58.2 data.tree_1.0.0
## [111] Matrix_1.5-3 cli_3.6.0
## [113] parallel_4.2.1 igraph_1.3.5
## [115] GenomicRanges_1.50.2 pkgconfig_2.0.3
## [117] sp_1.6-0 spatstat.sparse_3.0-0
## [119] xml2_1.3.3 bslib_0.4.2
## [121] XVector_0.38.0 rvest_1.0.3
## [123] digest_0.6.31 pracma_2.4.2
## [125] sctransform_0.3.5 RcppAnnoy_0.0.20
## [127] spatstat.data_3.0-0 rmarkdown_2.20
## [129] cellranger_1.1.0 leiden_0.4.3
## [131] uwot_0.1.14 shiny_1.7.4
## [133] lifecycle_1.0.3 nlme_3.1-161
## [135] jsonlite_1.8.4 BiocNeighbors_1.16.0
## [137] mapproj_1.2.9 askpass_1.1
## [139] viridisLite_0.4.1 limma_3.54.0
## [141] fansi_1.0.4 pillar_1.8.1
## [143] lattice_0.20-45 fastmap_1.1.0
## [145] httr_1.4.4 survival_3.5-0
## [147] glue_1.6.2 png_0.1-8
## [149] stringi_1.7.12 sass_0.4.5
## [151] rematch2_2.1.2 renv_0.15.5
## [153] irlba_2.3.5.1 future.apply_1.10.0