Detecting CD8 TPEX with ProjecTILs
In this tutorial we highlight ProjecTILs ability to detect CD8+ “TPEX” (Progenitor of exhausted), a rare population of memory CD8+ T cells that sustain CD8 T cell immunity in the context of chronic viral infection and cancer. These CD8 T cells are typically defined by co-expression of TCF1, PD-1 and TOX. They are of primary importance in cancer therapy, as they are thought to renew the pool of terminally exhausted CD8 T cells in response to PD-1/PD-L1 blockade Siddiqui et al.
Tumor T cell differentiation model, going through an intermediate state of progenitor exhausted (TPEX). Figure from Andreatta et al. 2021.
ProjecTILs classifies cells by projecting them onto a reference map. Here we will use our reference map of human tumor-infiltrating CD8 T cells. You can find more information about this map here.
Why using ProjecTILs to classify cell subtypes?
Obtain consistent cell annotations across datasets
Classify cells into reference subtypes, irrespective of activation of transient gene programs, such as cell cycle.
Avoid use of subjective, dataset-specific parameters, such as the selection of highly variable genes, number of clusters, etc.
Projection is robust to batch effects, single-cell technologies, sequencing depth (for more information, please read ProjecTILs paper - Andreatta et al. 2021).
What happens if the input data contains cells that do not match the
the reference map cell type?
If the input data contain cell types not included in the reference map(eg. CD4 T cells if the reference is for CD8 T cells) they automatically get filtered-out.
By default, both Run.ProjecTILs() and
ProjecTILs.classifier() have the parameter
filter.cells set as TRUE. This means that
cells out of reference will be filtered-out using the built-in scGate
model. This model is stored in the slot misc of the reference Seurat
object: ref@misc$scGate. You can custom this filtering by
amending this slot using scGate
grammar.
Human CD8 TIL reference
First, let’s have a look at the reference map.
# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette
# DimPlot
DimPlot(ref.cd8, group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)Here are the different T cell subsets defined in the map:
CD8.NaiveLike: Antigen-naive T cells
CD8.CM: Central Memory T cells
CD8.EM: Effector Memory
CD8.TEMRA: Effector Memory cells re-expressing CD45RA. Sometimes called Short Lived Effectors (SLEC), or Cytotoxic effectors
CD8.TPEX: Progenitors exhausted T cells
CD8.TEX: Exhausted T cells
CD8.MAIT: Mucosal-associated invariant T cells, innate-like T cells defined by their semi-invariant αβ T cell receptor
Let’s check Differentially Expressed Genes (DEGs) between subtypes and verify expected marker genes:
# Compute DEGs
DefaultAssay(ref.cd8) <- "RNA"
ref.cd8 <- NormalizeData(ref.cd8)
markers <- FindAllMarkers(object = ref.cd8, only.pos = TRUE, assay = "RNA")
# Remove TCR genes
tcr.genes <- SignatuR::GetSignature(SignatuR$Hs$Compartments$TCR)
markers <- markers %>% filter(!gene %in% unname(tcr.genes))
markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_log2FC) -> top3
# Plot heatmap
VlnPlot(ref.cd8, assay = "RNA", features = top3$gene, cols = mycols, stack = T, flip = T, fill.by = "ident") + NoLegend()Here are some marker genes to help manually identify TPEX:
Positive markers: TCF7, CD200, CRTAM, GNG4, TOX, LEF1, CCR7, CXCL13, XCL1, XCL2
Negative markers: GZMB, NKG7, PRF1, HAVCR2, CCL5, GZMA
Let’s display 6 positive and 6 negative TPEX markers, that are especially useful to distinguish TPEX from closely related TEX.
DefaultAssay(ref.cd8) <- 'RNA'
FeaturePlot(ref.cd8, features = c('TCF7','XCL1','XCL2',"CXCL13","TOX","CRTAM","GZMB","NKG7","CCL5","HAVCR2","PRF1","GZMA"), ncol = 3, pt.size = 0.2, order = T, cols = pals::coolwarm()) & NoLegend()Despite their great releveance, TPEX are often missed in tumor scRNA-seq studies.
Detecting TPEX in Gueguen et al. 2021
Setup data
#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)Projection
Thanks to automatic scGate filtering, only the CD8 clusters (upper part of the UMAP) are annotated.
# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 2132 3181 147 238 276
## CD8.TEX CD8.TPEX
## 3340 337
DimPlot(gueguen.cd3, order = T, label = T, repel = T) DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)We can check specific TPEX genes to confirm that identities match.
DefaultAssay(gueguen.cd3) <- 'RNA'
FeaturePlot(gueguen.cd3, features = c('XCL1','XCL2'), ncol = 2, pt.size = 0.5, order = T, cols = pals::coolwarm()) & NoLegend()# Radar plots
p <- plot.states.radar(ref.cd8, query = gueguen.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T)
wrap_plots(p) + theme_bw()We can see that the previously homogeneous cluster CD8-LAYN is actually composed of two subsets: CD8.TEX and CD8.TPEX.
How to assess quality of mapping?
The first thing we recommend to do is verifying the expression of expected marker genes. In addition, ProjecTILs provides multiple tools to help the researcher decide if the projection and classification are accurate (e.g. https://carmonalab.github.io/ProjecTILs_CaseStudies/novelstate.html).
Detecting TPEX in Yost et al. 2019
Setup data
# Load data
#download.file("https://figshare.com/ndownloader/files/39109277", destfile = "Yost.cd3.Rds")
Yost.cd3 <- readRDS("Yost.cd3.Rds")
# Normalize data
Yost.cd3 <- NormalizeData(Yost.cd3)
Yost.cd3 <- ScaleData(Yost.cd3)## Centering and scaling data matrix
# DimPlots
DimPlot(Yost.cd3, reduction = 'umap', group.by = 'cluster', label = T)DimPlot(Yost.cd3, reduction = 'umap', group.by = 'patient', label = T, repel = T)Projection
As this dataset is a mix between CD4 and CD8 T cells, we will keep
the parameter filter.cells as TRUE to keep only CD8+ T
cells.
DefaultAssay(Yost.cd3) <- "RNA"
Yost.cd3 <- ProjecTILs.classifier(Yost.cd3, ref = ref.cd8, filter.cells = T, split.by = 'patient', ncores = 6)
table(Yost.cd3$functional.cluster)##
## CD8.CM CD8.EM CD8.MAIT CD8.NaiveLike CD8.TEMRA
## 3806 5296 316 438 965
## CD8.TEX CD8.TPEX
## 2370 499
DimPlot(Yost.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)Here again we detect TPEX. Let’s check globally how the expression profiles of marker genes look.
# Radar plots
p <- plot.states.radar(ref.cd8, query = Yost.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T)
wrap_plots(p) + theme_bw()We can see that predicted CD8.TPEX displey consistent marker gene profiles. In the authors’ original UMAP space, however, TPEX are found scattered. This is because activation signals and other confounding factors were contributing to defining the UMAP space. Reference-based annotation uncovers cell type signals masked by the activation program. If you are interested in recovering cell type identities hidden by transient cell states, you can read more in the corresponding tutorial.
Session Info
sessionInfo()## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] plotly_4.10.1 EnhancedVolcano_1.14.0 pheatmap_1.0.12
## [4] UCell_2.2.0 scales_1.2.1 ggrepel_0.9.2
## [7] SignatuR_0.1.0 gridExtra_2.3 ProjecTILs_3.0.2
## [10] patchwork_1.1.2 GEOquery_2.66.0 Biobase_2.58.0
## [13] BiocGenerics_0.44.0 data.table_1.14.6 STACAS_2.0.0
## [16] scGate_1.4.1 forcats_0.5.2 stringr_1.5.0
## [19] dplyr_1.0.10 purrr_1.0.1 readr_2.1.3
## [22] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0
## [25] tidyverse_1.3.2 SeuratObject_4.1.3 Seurat_4.3.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 spatstat.explore_3.0-5
## [3] reticulate_1.27 tidyselect_1.2.0
## [5] htmlwidgets_1.6.1 grid_4.2.1
## [7] BiocParallel_1.32.5 Rtsne_0.16
## [9] munsell_0.5.0 codetools_0.2-18
## [11] ica_1.0-3 umap_0.2.9.0
## [13] future_1.30.0 miniUI_0.1.1.1
## [15] withr_2.5.0 spatstat.random_3.1-2
## [17] colorspace_2.1-0 progressr_0.13.0
## [19] highr_0.10 knitr_1.41
## [21] rstudioapi_0.14 stats4_4.2.1
## [23] SingleCellExperiment_1.20.0 ROCR_1.0-11
## [25] tensor_1.5 listenv_0.9.0
## [27] labeling_0.4.2 MatrixGenerics_1.10.0
## [29] GenomeInfoDbData_1.2.9 polyclip_1.10-4
## [31] farver_2.1.1 parallelly_1.34.0
## [33] vctrs_0.5.2 generics_0.1.3
## [35] xfun_0.36 timechange_0.2.0
## [37] R6_2.5.1 GenomeInfoDb_1.34.7
## [39] rmdformats_1.0.4 pals_1.7
## [41] bitops_1.0-7 spatstat.utils_3.0-1
## [43] cachem_1.0.6 DelayedArray_0.24.0
## [45] assertthat_0.2.1 promises_1.2.0.1
## [47] googlesheets4_1.0.1 gtable_0.3.1
## [49] globals_0.16.2 goftest_1.2-3
## [51] rlang_1.0.6 splines_4.2.1
## [53] lazyeval_0.2.2 gargle_1.2.1
## [55] dichromat_2.0-0.1 spatstat.geom_3.0-5
## [57] broom_1.0.2 BiocManager_1.30.19
## [59] yaml_2.3.7 reshape2_1.4.4
## [61] abind_1.4-5 modelr_0.1.10
## [63] backports_1.4.1 httpuv_1.6.8
## [65] tools_4.2.1 bookdown_0.32
## [67] ellipsis_0.3.2 jquerylib_0.1.4
## [69] RColorBrewer_1.1-3 ggridges_0.5.4
## [71] Rcpp_1.0.10 plyr_1.8.8
## [73] zlibbioc_1.44.0 RCurl_1.98-1.9
## [75] openssl_2.0.5 deldir_1.0-6
## [77] pbapply_1.7-0 cowplot_1.1.1
## [79] S4Vectors_0.36.1 zoo_1.8-11
## [81] SummarizedExperiment_1.28.0 haven_2.5.1
## [83] cluster_2.1.4 fs_1.6.0
## [85] magrittr_2.0.3 RSpectra_0.16-1
## [87] scattermore_0.8 lmtest_0.9-40
## [89] reprex_2.0.2 RANN_2.6.1
## [91] googledrive_2.0.0 fitdistrplus_1.1-8
## [93] matrixStats_0.63.0 hms_1.1.2
## [95] mime_0.12 evaluate_0.20
## [97] xtable_1.8-4 readxl_1.4.1
## [99] IRanges_2.32.0 compiler_4.2.1
## [101] maps_3.4.1 KernSmooth_2.23-20
## [103] crayon_1.5.2 htmltools_0.5.4
## [105] later_1.3.0 tzdb_0.3.0
## [107] lubridate_1.9.0 DBI_1.1.3
## [109] dbplyr_2.3.0 MASS_7.3-58.2
## [111] data.tree_1.0.0 Matrix_1.5-3
## [113] cli_3.6.0 parallel_4.2.1
## [115] igraph_1.3.5 GenomicRanges_1.50.2
## [117] pkgconfig_2.0.3 sp_1.6-0
## [119] spatstat.sparse_3.0-0 xml2_1.3.3
## [121] bslib_0.4.2 XVector_0.38.0
## [123] rvest_1.0.3 digest_0.6.31
## [125] pracma_2.4.2 sctransform_0.3.5
## [127] RcppAnnoy_0.0.20 spatstat.data_3.0-0
## [129] rmarkdown_2.20 cellranger_1.1.0
## [131] leiden_0.4.3 uwot_0.1.14
## [133] shiny_1.7.4 lifecycle_1.0.3
## [135] nlme_3.1-161 jsonlite_1.8.4
## [137] BiocNeighbors_1.16.0 mapproj_1.2.9
## [139] askpass_1.1 viridisLite_0.4.1
## [141] limma_3.54.0 fansi_1.0.4
## [143] pillar_1.8.1 lattice_0.20-45
## [145] fastmap_1.1.0 httr_1.4.4
## [147] survival_3.5-0 glue_1.6.2
## [149] png_0.1-8 stringi_1.7.12
## [151] sass_0.4.5 renv_0.15.5
## [153] irlba_2.3.5.1 future.apply_1.10.0