ProjecTILs case study - MC38 TILs
In this case study, we will build an integrated scRNA-seq analysis workflow to interpret the transcriptional and clonal structure of tumor-infiltrating T cells in MC38 colon adenocarcinoma (data from Xiong et al 2019).
The main R packages and methods employed in this workflow are:
- Seurat - for storing, processing and visualizing scRNA-seq data
- ProjecTILs - for the projection of scRNA-seq data into a reference TIL atlas
- scRepertoire - for the analysis of TCR-seq data
Note that scRepertoire
requires R version 4.0 or higher
- you will need to have R v.4 installed to run this case study.
R Environment
Check & load R packages
scRNA-seq data preparation
Download scRNA-seq data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7919/) After download and unpacking (you will need curl and unzip), you should get three files: matrix.mtx, genes.tsv and barcodes.tsv
files <- c("matrix.mtx", "genes.tsv", "barcodes.tsv")
matrix_dir <- "./input/Xiong_TIL/matrix/"
dir.create(matrix_dir, recursive = T, showWarnings = F)
for (i in 1:length(files)) {
dataUrl <- sprintf("https://www.ebi.ac.uk/biostudies/files/E-MTAB-7919/%s", files[i])
download.file(dataUrl, paste0(matrix_dir, files[i]))
}
Load scRNA-seq data and store as Seurat object
projectID <- "Xiong_TIL"
libIDtoSampleID <- c("Mouse 1", "Mouse 2", "Mouse 3", "Mouse 4")
names(libIDtoSampleID) <- 4:7
exp_mat <- Read10X(matrix_dir)
querydata <- CreateSeuratObject(counts = exp_mat, project = projectID, min.cells = 3,
min.features = 50)
querydata$Sample <- substring(colnames(querydata), 18)
table(querydata$Sample)
4 5 6 7
1759 1746 1291 1386
querydata$SampleLabel <- factor(querydata$Sample, levels = c(4:7), labels = libIDtoSampleID)
table(querydata$SampleLabel)
Mouse 1 Mouse 2 Mouse 3 Mouse 4
1759 1746 1291 1386
scTCR data preparation
Download scTCR data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7918/)
files <- c("filtered_contig_annotations_35.csv", "filtered_contig_annotations_36.csv",
"filtered_contig_annotations_37.csv", "filtered_contig_annotations_38.csv")
tcr_dir <- "./input/Xiong_TIL/TCR/"
dir.create(tcr_dir, recursive = T, showWarnings = F)
for (i in 1:length(files)) {
dataUrl <- sprintf("https://www.ebi.ac.uk/biostudies/files/E-MTAB-7918/%s", files[i])
download.file(dataUrl, paste0(tcr_dir, files[i]))
}
Mouse 1 to 4 (sample ID 4 to 7) correspond to TCR-seq libraries 35 to 38
libIDtoSampleID_VDJ <- 4:7
names(libIDtoSampleID_VDJ) <- 35:38
vdj.list <- list()
for (i in 1:length(libIDtoSampleID_VDJ)) {
s <- names(libIDtoSampleID_VDJ)[i]
vdj.list[[i]] <- read.csv(sprintf("%s/filtered_contig_annotations_%s.csv", tcr_dir,
s), as.is = T)
# Rename barcodes to match scRNA-seq suffixes
vdj.list[[i]]$barcode <- sub("\\d$", "", vdj.list[[i]]$barcode)
vdj.list[[i]]$barcode <- paste0(vdj.list[[i]]$barcode, libIDtoSampleID_VDJ[i])
vdj.list[[i]]$raw_clonotype_id <- paste0(vdj.list[[i]]$raw_clonotype_id, "-",
libIDtoSampleID_VDJ[i])
vdj.list[[i]]$SampleLabel <- libIDtoSampleID_VDJ[i]
}
Combine alpha and beta chains using the combineTCR
function from scRepertoire
# Using parameters removeNA=T and removeMulti=T will remove cells with multiple
# a-b combinations
combined <- combineTCR(vdj.list, samples = libIDtoSampleID_VDJ, ID = names(libIDtoSampleID_VDJ),
cells = "T-AB", removeNA = T, removeMulti = T)
for (i in seq_along(combined)) {
combined[[i]] <- stripBarcode(combined[[i]], column = 1, connector = "_", num_connects = 3)
}
The function ‘combineExpression’ of scRepertoire allows incorporating clonotype information to a Seurat object, and creates the categorical variable ‘cloneTypes’ discretizing frequencies of the clonotypes
querydata <- combineExpression(combined, querydata, cloneCall = "gene", cloneTypes = c(Small = 0.01,
Medium = 0.1, Large = 1))
We have now paired expression and TCR data for the query samples, and loaded them into a unified Seurat object. We can proceed to project the data onto the reference map.
ProjecTILs
Load the mouse TIL reference map
[1] "Loading Default Reference Atlas..."
[1] "/Users/mass/Documents/Projects/Github/ProjecTILs_CaseStudies/ref_TILAtlas_mouse_v1.rds"
[1] "Loaded Custom Reference map ref_TILAtlas_mouse_v1"
Project query data (with clonotype information stored as metadata) onto the TIL reference map
querydata <- NormalizeData(querydata)
query.projected <- Run.ProjecTILs(querydata, ref = ref, split.by = "SampleLabel",
ncores = 4)
| | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
T cells are projected mostly in the Treg, CD8 Terminal exhausted (CD8_Tex) and Precursor exhausted (CD8_Tpex) areas, with some clusters in the T-helper (Th1) and CD8 Effector memory areas, and to a lesser extent in Naive-like and CD8 Early-Activated areas.
p1 <- plot.projection(ref) + theme(aspect.ratio = 1)
p2 <- plot.projection(ref, query.projected, linesize = 0.3, pointsize = 0.5) + theme(aspect.ratio = 1)
p1 | p2
Visualize the projections per sample. Broadly, the distribution across the reference map is similar for the four mice, with some variation in the frequency of Effector Memory T cells.
plots <- list()
sample_names <- unique(query.projected$SampleLabel)
for (sample_i in seq_along(sample_names)) {
sample <- sample_names[sample_i]
plots[[sample_i]] <- plot.projection(ref, query.projected[, query.projected$SampleLabel ==
sample], linesize = 0.3, pointsize = 0.5) + ggtitle(sample) + theme(aspect.ratio = 1)
}
wrap_plots(plots)
Look at distribution of T cells in terms of cell states.
We can check the gene expression profile of the cells assigned to each state (yellow), and compare them to those of the reference states (black).
For example, cells projected into Tex and Tpex region express Tox and Pdcd1, while only Tex express Gzmb and Havcr2, as expected.
Clonality analysis
Now, let’s see where the expanded clones are located within the map:
levs <- c("Large (0.1 < X <= 1)", "Medium (0.01 < X <= 0.1)", "Small (0 < X <= 0.01)")
palette <- colorRampPalette(c("#FF4B20", "#C6FDEC", "#0348A6"))
query.projected$cloneType <- factor(query.projected$cloneType, levels = levs)
cl <- DimPlot(query.projected, group.by = "cloneType") + scale_color_manual(values = c(palette(5)),
na.value = "grey") + theme(aspect.ratio = 1)
p1 | cl
Larger (most highly expanded) clones are concentrated in the Tex/Tpex area. This is largely explained by the fact that sustained antigenic stimulation drives immunodominant clones into the Tox+-driven exhausted lineage (Tex - Tpex region)
Let’s check the raw numbers
Large (0.1 < X <= 1) Medium (0.01 < X <= 0.1)
CD4_NaiveLike 0 0
CD8_EarlyActiv 0 2
CD8_EffectorMemory 1 116
CD8_NaiveLike 0 0
CD8_Tex 86 649
CD8_Tpex 30 154
Tfh 0 0
Th1 0 15
Treg 4 115
Small (0 < X <= 0.01)
CD4_NaiveLike 23
CD8_EarlyActiv 50
CD8_EffectorMemory 259
CD8_NaiveLike 179
CD8_Tex 452
CD8_Tpex 113
Tfh 1
Th1 291
Treg 590
Plot clonal size by functional cluster
df <- query.projected@meta.data[!is.na(query.projected@meta.data$Frequency), c("functional.cluster",
"cloneType")]
meta <- reshape2::melt(table(df), varnames = c("functional.cluster", "cloneType"))
ggplot(data = meta, aes(x = functional.cluster, y = value, fill = cloneType)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + scale_fill_manual(values = c(palette(5)),
na.value = "grey")
This confirms the intial observation that Tex and Tpex present the largest clonal expansion
We can highlight specific clonotypes on the reference map (here those with at least 40 cells):
clone_call = "CTaa" #select column/variable to use as clonotypes ID, in this case CTaa, the paired TCR CDR3 aminoacid sequences
cutoff <- 40 #Min cells for clonotype
clonotypeSizes <- sort(table(query.projected[[clone_call]])[table(query.projected[[clone_call]]) >
cutoff], decreasing = T)
bigClonotypes <- names(clonotypeSizes)
plots <- list()
for (i in 1:length(bigClonotypes)) {
ctype <- bigClonotypes[i]
plots[[i]] <- plot.projection(ref, query.projected[, which(query.projected[[clone_call]] ==
ctype)], linesize = 0.3, pointsize = 0.5) + ggtitle(sprintf("%s - size %i",
ctype, clonotypeSizes[ctype]))
}
wrap_plots(plots, ncol = 2)
The majority of clones tend to span Tex and Tpex states. Indeed, Tcf7+ precursor exhausted Tpex cells self-renew and give rise to more differentiated Tex effector cells.
The clonal overlap (Morisita similarity index) implemented in scRepertoire confirms this pattern:
meta.list <- expression2List(query.projected, split.by = "functional.cluster")
clonalOverlap(meta.list, cloneCall = "gene", method = "morisita") + theme(axis.text.x = element_text(angle = 90,
hjust = 1, vjust = 0.5))
We can also visualize overlap of the top CD8 clones, and their overlap across functional clusters, using scRepertoire’s alluvial plot
Conclusions
Projection of the MC38 TILs onto the TIL reference
map showed that these T cell samples mostly consist of
exhausted (CD8_Tex), precursor-exhausted CD8 T cells (CD8_Tpex), and
Tregs, typical of highly immunogenic, “hot” tumors. Also, the majority
of expanded clones were found spanning the CD8_Tex and
CD8_Tpex states. This is expected as sustained antigen stimulation in
the tumor drives immuno-dominant tumor-reactive clones towards the
exhaustion differentiation path.
The combination of
ProjecTILs
and scRepertoire
simplifies the
joint analysis of single-cell expression data and clonotype analysis, in
the context of an annotated reference map of TIL states.
Further reading
Original publication - Xiong et al. (2019) Cancer Immunol Res
ProjecTILs case studies - INDEX - Repository
ProjecTILs method Andreatta et. al (2021) Nat. Comm. and code
For more information on performing reference-based T cell clonal analysis see also Andreatta et al. (2023) BioProtocol - and its related GitHub repository.
Session info
sessionInfo()