Human CD8 TILs projection

ProjecTILs is a computational method to project scRNA-seq data into reference single-cell atlases, enabling their direct comparison in a stable, annotated system of coordinates. This tutorial outlines the main functions implemented in ProjecTILs on a small, simple dataset. For more advanced (and more biologically interesting) applications of ProjecTILs see this list of ProjecTILs case studies

R environment

First, check package dependencies and install ProjecTILs

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
options(timeout = max(300, getOption("timeout")))

if (!requireNamespace("renv")) install.packages("renv")
library(renv)
renv::restore()
* The library is already synchronized with the lockfile.
remotes::install_github("carmonalab/scGate")
remotes::install_github("carmonalab/ProjecTILs")
rlang        (1.0.4  -> 1.0.5 ) [CRAN]
progressr    (0.10.1 -> 0.11.0) [CRAN]
future       (1.27.0 -> 1.28.0) [CRAN]
gtable       (0.3.0  -> 0.3.1 ) [CRAN]
dplyr        (1.0.9  -> 1.0.10) [CRAN]
reticulate   (1.25   -> 1.26  ) [CRAN]
SeuratObject (4.1.0  -> 4.1.1 ) [CRAN]

The downloaded binary packages are in
    /var/folders/9k/l8bhj7_90g3793p02y0c6xcc0000gn/T//RtmpLCNrr7/downloaded_packages
* checking for file ‘/private/var/folders/9k/l8bhj7_90g3793p02y0c6xcc0000gn/T/RtmpLCNrr7/remotesbdb85703da29/carmonalab-ProjecTILs-580678b/DESCRIPTION’ ... OK
* preparing ‘ProjecTILs’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘ProjecTILs_3.0.0.tar.gz’
library(ProjecTILs)
library(Seurat)
library(ggplot2)
library(gridExtra)

Load reference atlas and query data

First, load the Human CD8 TILs reference atlas. This reference was built using filtered Zheng et al. datasets and semi-supervised STACAS integration. More information about the process can be found here. Included TIL subsets are literature-based. Key markers for each populations can be found below.

download.file(url = "https://www.dropbox.com/s/phqmqlp30xgnlwn/Human_CD8_TILs_atlas.rds?dl=1",
    destfile = "Human_CD8_TILs_atlas.rds")
ref <- load.reference.map(ref = "Human_CD8_TILs_atlas.rds")
[1] "Loading Custom Reference Atlas..."
[1] "Loaded Custom Reference map custom_reference"

Let’s explore the reference atlas

refCols <- c(Naive = "#b7d2e0", CM = "#da6f6f", EM = "#72b28a", TEMRA = "#e5bfaf",
    TRM = "#f3a166", TPEX = "#aca6e0", TEX = "#f5d39f", MAIT = "#fdbfd4")
DimPlot(ref, label = T, cols = refCols)

See expression of important marker genes across reference subtypes

markers <- c("LEF1", "TCF7", "CCR7", "GZMK", "FGFBP2", "FCGR3A", "ZNF683", "ITGAE",
    "CRTAM", "CD200", "GNG4", "HAVCR2", "TOX", "ENTPD1", "SLC4A10", "TRAV1-2")
VlnPlot(ref, features = markers, stack = T, flip = T, assay = "RNA") + NoLegend()

Now let’s load a query dataset - Bassez et al. 2021 These are T cells from Breast cancer patients, treated or not with Immune checkpoint blockade, for which we know if they responded or not.

download.file(url = "https://www.dropbox.com/s/dv46t7o50ldzeo5/Bassez.example.rds?dl=1",
    destfile = "Bassez.example.rds")
query <- readRDS(file = "Bassez.example.rds")

Projection mode 1: aligning embeddings

The main function in ProjecTILs is Run.ProjecTILs, which takes as input a reference map and query dataset. The query will be batch-corrected and projected into the reference map, with low-dimensional embeddings (PCA and UMAP) compatible with those of the reference. To minimize the batch effects, we compute filtering and projection patient per patient.

query.projected <- Run.ProjecTILs(query = query, ref = ref, filter.cell = TRUE, skip.normalize = T,
    split.by = "patient_id", ncores = 4)

Visualize projection

Plot projection of new data over the reference in UMAP space. The contour lines display the density of projected query cells onto the reference map.

plot.projection(ref, query.projected, linesize = 0.5, pointsize = 0.5, cols = refCols)

We can also plot the predicted composition of the query in terms of reference T cell subtypes. By default, plot.statepred.composition will look on the functional.cluster metadata that was created by the projection.

plot.statepred.composition(ref, query.projected, metric = "Percent", cols = refCols) +
    theme_bw()

How do the gene expression levels compare between reference and query for the different cell states?

plot.states.radar(ref, query = query.projected, genes4radar = markers)

Compare cell states across conditions

If we have multiple conditions (e.g. control vs. treatment, or samples from different tissues), we can search for discriminant genes between conditions (otherwise, by default this analysis is performed using the reference as the ‘control’)

Let’s start by plotting proportions between Responders and Non-responders to ICB.

# Splitting the Seurat object by response to ICB (called expansion here)
query.list <- subset(query.projected, subset = expansion %in% c("E", "NE"))
query.list <- SplitObject(query.list, split.by = "expansion")
pll <- list()
pll[[1]] <- plot.projection(ref, cols = refCols, query.list[["E"]]) + ggtitle("Responder")
pll[[2]] <- plot.statepred.composition(ref, query.list[["E"]], cols = refCols, metric = "Percent") +
    ggtitle("Responder") + ylim(0, 60) + theme_bw() + RotatedAxis() + NoLegend()
pll[[3]] <- plot.projection(ref, cols = refCols, query.list[["NE"]]) + ggtitle("Non-responder")
pll[[4]] <- plot.statepred.composition(ref, cols = refCols, query.list[["NE"]], metric = "Percent") +
    ggtitle("Non-responder") + ylim(0, 60) + theme_bw() + RotatedAxis() + NoLegend()
grid.arrange(grobs = pll, ncol = 2, nrow = 2, widths = c(1.5, 1))

We notice a decrease of TEX in responders (ie, T cell expanded patients), which is consistent with the original annotation.

'Expansion mainly involved CD8+ T cells with pronounced expression of cytotoxic-activity (PRF1, GZMB), immune-cell homing (CXCL13) and exhaustion markers (HAVCR2, LAG3)
Conversely, undifferentiated pre-effector/memory T cells (TCF7+, GZMK+) were inversely correlated with T cell expansion.'

We can also check for gene expression changes between conditions.

plot.states.radar(ref, query = query.list, min.cells = 20, genes4radar = c("LEF1",
    "TCF7", "CCR7", "GZMK", "FGFBP2", "FCGR3A", "ZNF683", "ITGAE", "CRTAM", "CD200",
    "GNG4", "HAVCR2", "TOX", "ENTPD1", "SLC4A10", "TRAV1-2"))

Projection mode 2: classifier

If you do not wish to embed your query data into the reference space, you may also simply use ProjecTILs as a cell type classifier. This may be useful to annotate cell types in your query without altering existing embeddings. This mode is quicker than mode 1.

See the query dataset in unsupervised low-dim embeddings:

query <- query |>
    FindVariableFeatures(nfeatures = 500) |>
    ScaleData() |>
    RunPCA(npcs = 10) |>
    RunUMAP(dims = 1:10)

DimPlot(query, group.by = "cellSubType", label = T, repel = T) + NoLegend()

The ProjecTILs.classifier function applies reference-projection but does not alter the current embeddings.

query <- ProjecTILs.classifier(query = query, ref = ref, ncores = 4)
DimPlot(query, group.by = "functional.cluster", cols = refCols, label = T) + NoLegend()

As compared to mode 1, this will keep all your cells. They will just be assigned the NA label, as they cannot be confidently projected into the reference.

Further reading