ProjecTILs case study - MC38 TILs

In this case study, we will build an integrated scRNA-seq analysis workflow to interpret the transcriptional and clonal structure of tumor-infiltrating T cells in MC38 colon adenocarcinoma (data from Xiong et al 2019).

The main R packages and methods employed in this workflow are:

Note that scRepertoire requires R version 4.0 or higher - you will need to have R v.4 installed to run this case study.

R Environment

Check & load R packages

options(timeout = 5000)
options(warn = 1)

library(renv)
renv::restore()

library(Seurat)
library(ggplot2)
library(patchwork)
library(ProjecTILs)
library(scRepertoire)

scRNA-seq data preparation

Download scRNA-seq data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7919/) After download and unpacking (you will need curl and unzip), you should get three files: matrix.mtx, genes.tsv and barcodes.tsv

files <- c("matrix.mtx", "genes.tsv", "barcodes.tsv")
matrix_dir <- "./input/Xiong_TIL/matrix/"

dir.create(matrix_dir, recursive = T, showWarnings = F)

for (i in 1:length(files)) {
    dataUrl <- sprintf("https://www.ebi.ac.uk/biostudies/files/E-MTAB-7919/%s", files[i])
    download.file(dataUrl, paste0(matrix_dir, files[i]))
}

Load scRNA-seq data and store as Seurat object

projectID <- "Xiong_TIL"
libIDtoSampleID <- c("Mouse 1", "Mouse 2", "Mouse 3", "Mouse 4")
names(libIDtoSampleID) <- 4:7

exp_mat <- Read10X(matrix_dir)
querydata <- CreateSeuratObject(counts = exp_mat, project = projectID, min.cells = 3,
    min.features = 50)
querydata$Sample <- substring(colnames(querydata), 18)
table(querydata$Sample)

   4    5    6    7 
1759 1746 1291 1386 
querydata$SampleLabel <- factor(querydata$Sample, levels = c(4:7), labels = libIDtoSampleID)
table(querydata$SampleLabel)

Mouse 1 Mouse 2 Mouse 3 Mouse 4 
   1759    1746    1291    1386 

scTCR data preparation

Download scTCR data from Array Express (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7918/)

files <- c("filtered_contig_annotations_35.csv", "filtered_contig_annotations_36.csv",
    "filtered_contig_annotations_37.csv", "filtered_contig_annotations_38.csv")
tcr_dir <- "./input/Xiong_TIL/TCR/"

dir.create(tcr_dir, recursive = T, showWarnings = F)

for (i in 1:length(files)) {
    dataUrl <- sprintf("https://www.ebi.ac.uk/biostudies/files/E-MTAB-7918/%s", files[i])
    download.file(dataUrl, paste0(tcr_dir, files[i]))
}

Mouse 1 to 4 (sample ID 4 to 7) correspond to TCR-seq libraries 35 to 38

libIDtoSampleID_VDJ <- 4:7
names(libIDtoSampleID_VDJ) <- 35:38

vdj.list <- list()
for (i in 1:length(libIDtoSampleID_VDJ)) {
    s <- names(libIDtoSampleID_VDJ)[i]
    vdj.list[[i]] <- read.csv(sprintf("%s/filtered_contig_annotations_%s.csv", tcr_dir,
        s), as.is = T)

    # Rename barcodes to match scRNA-seq suffixes
    vdj.list[[i]]$barcode <- sub("\\d$", "", vdj.list[[i]]$barcode)
    vdj.list[[i]]$barcode <- paste0(vdj.list[[i]]$barcode, libIDtoSampleID_VDJ[i])
    vdj.list[[i]]$raw_clonotype_id <- paste0(vdj.list[[i]]$raw_clonotype_id, "-",
        libIDtoSampleID_VDJ[i])
    vdj.list[[i]]$SampleLabel <- libIDtoSampleID_VDJ[i]

}

Combine alpha and beta chains using the combineTCR function from scRepertoire

# Using parameters removeNA=T and removeMulti=T will remove cells with multiple
# a-b combinations
combined <- combineTCR(vdj.list, samples = libIDtoSampleID_VDJ, ID = names(libIDtoSampleID_VDJ),
    cells = "T-AB", removeNA = T, removeMulti = T)

for (i in seq_along(combined)) {
    combined[[i]] <- stripBarcode(combined[[i]], column = 1, connector = "_", num_connects = 3)
}

The function ‘combineExpression’ of scRepertoire allows incorporating clonotype information to a Seurat object, and creates the categorical variable ‘cloneTypes’ discretizing frequencies of the clonotypes

querydata <- combineExpression(combined, querydata, cloneCall = "gene", cloneTypes = c(Small = 0.01,
    Medium = 0.1, Large = 1))

We have now paired expression and TCR data for the query samples, and loaded them into a unified Seurat object. We can proceed to project the data onto the reference map.

ProjecTILs

Load the mouse TIL reference map

ref <- load.reference.map()
[1] "Loading Default Reference Atlas..."
[1] "/Users/mass/Documents/Projects/Github/ProjecTILs_CaseStudies/ref_TILAtlas_mouse_v1.rds"
[1] "Loaded Custom Reference map ref_TILAtlas_mouse_v1"

Project query data (with clonotype information stored as metadata) onto the TIL reference map

querydata <- NormalizeData(querydata)
query.projected <- Run.ProjecTILs(querydata, ref = ref, split.by = "SampleLabel",
    ncores = 4)
  |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%

T cells are projected mostly in the Treg, CD8 Terminal exhausted (CD8_Tex) and Precursor exhausted (CD8_Tpex) areas, with some clusters in the T-helper (Th1) and CD8 Effector memory areas, and to a lesser extent in Naive-like and CD8 Early-Activated areas.

p1 <- plot.projection(ref) + theme(aspect.ratio = 1)
p2 <- plot.projection(ref, query.projected, linesize = 0.3, pointsize = 0.5) + theme(aspect.ratio = 1)
p1 | p2

Visualize the projections per sample. Broadly, the distribution across the reference map is similar for the four mice, with some variation in the frequency of Effector Memory T cells.

plots <- list()

sample_names <- unique(query.projected$SampleLabel)
for (sample_i in seq_along(sample_names)) {
    sample <- sample_names[sample_i]
    plots[[sample_i]] <- plot.projection(ref, query.projected[, query.projected$SampleLabel ==
        sample], linesize = 0.3, pointsize = 0.5) + ggtitle(sample) + theme(aspect.ratio = 1)
}

wrap_plots(plots)

Look at distribution of T cells in terms of cell states.

plot.statepred.composition(ref, query.projected, metric = "Percent")

We can check the gene expression profile of the cells assigned to each state (yellow), and compare them to those of the reference states (black).

plot.states.radar(ref = ref, query.projected, min.cells = 30)

For example, cells projected into Tex and Tpex region express Tox and Pdcd1, while only Tex express Gzmb and Havcr2, as expected.

Clonality analysis

Now, let’s see where the expanded clones are located within the map:

levs <- c("Large (0.1 < X <= 1)", "Medium (0.01 < X <= 0.1)", "Small (0 < X <= 0.01)")
palette <- colorRampPalette(c("#FF4B20", "#C6FDEC", "#0348A6"))

query.projected$cloneType <- factor(query.projected$cloneType, levels = levs)
cl <- DimPlot(query.projected, group.by = "cloneType") + scale_color_manual(values = c(palette(5)),
    na.value = "grey") + theme(aspect.ratio = 1)
p1 | cl

Larger (most highly expanded) clones are concentrated in the Tex/Tpex area. This is largely explained by the fact that sustained antigenic stimulation drives immunodominant clones into the Tox+-driven exhausted lineage (Tex - Tpex region)

Let’s check the raw numbers

table(query.projected@meta.data$functional.cluster, query.projected@meta.data$cloneType)
                    
                     Large (0.1 < X <= 1) Medium (0.01 < X <= 0.1)
  CD4_NaiveLike                         0                        0
  CD8_EarlyActiv                        0                        2
  CD8_EffectorMemory                    1                      116
  CD8_NaiveLike                         0                        0
  CD8_Tex                              86                      649
  CD8_Tpex                             30                      154
  Tfh                                   0                        0
  Th1                                   0                       15
  Treg                                  4                      115
                    
                     Small (0 < X <= 0.01)
  CD4_NaiveLike                         23
  CD8_EarlyActiv                        50
  CD8_EffectorMemory                   259
  CD8_NaiveLike                        179
  CD8_Tex                              452
  CD8_Tpex                             113
  Tfh                                    1
  Th1                                  291
  Treg                                 590

Plot clonal size by functional cluster

df <- query.projected@meta.data[!is.na(query.projected@meta.data$Frequency), c("functional.cluster",
    "cloneType")]
meta <- reshape2::melt(table(df), varnames = c("functional.cluster", "cloneType"))

ggplot(data = meta, aes(x = functional.cluster, y = value, fill = cloneType)) + geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + scale_fill_manual(values = c(palette(5)),
    na.value = "grey")

This confirms the intial observation that Tex and Tpex present the largest clonal expansion

We can highlight specific clonotypes on the reference map (here those with at least 40 cells):

clone_call = "CTaa"  #select column/variable to use as clonotypes ID, in this case CTaa, the paired TCR CDR3 aminoacid sequences
cutoff <- 40  #Min cells for clonotype

clonotypeSizes <- sort(table(query.projected[[clone_call]])[table(query.projected[[clone_call]]) >
    cutoff], decreasing = T)

bigClonotypes <- names(clonotypeSizes)

plots <- list()

for (i in 1:length(bigClonotypes)) {
    ctype <- bigClonotypes[i]

    plots[[i]] <- plot.projection(ref, query.projected[, which(query.projected[[clone_call]] ==
        ctype)], linesize = 0.3, pointsize = 0.5) + ggtitle(sprintf("%s - size %i",
        ctype, clonotypeSizes[ctype]))
}

wrap_plots(plots, ncol = 2)

The majority of clones tend to span Tex and Tpex states. Indeed, Tcf7+ precursor exhausted Tpex cells self-renew and give rise to more differentiated Tex effector cells.

The clonal overlap (Morisita similarity index) implemented in scRepertoire confirms this pattern:

meta.list <- expression2List(query.projected, split.by = "functional.cluster")
clonalOverlap(meta.list, cloneCall = "gene", method = "morisita") + theme(axis.text.x = element_text(angle = 90,
    hjust = 1, vjust = 0.5))

We can also visualize overlap of the top CD8 clones, and their overlap across functional clusters, using scRepertoire’s alluvial plot

compareClonotypes(meta.list, numbers = 4, samples = c("CD8_EffectorMemory", "CD8_Tpex",
    "CD8_Tex", "CD8_NaiveLike"), cloneCall = "aa", graph = "alluvial") + theme(axis.text.x = element_text(angle = 45,
    hjust = 1, vjust = 1))

Conclusions

Projection of the MC38 TILs onto the TIL reference map showed that these T cell samples mostly consist of exhausted (CD8_Tex), precursor-exhausted CD8 T cells (CD8_Tpex), and Tregs, typical of highly immunogenic, “hot” tumors. Also, the majority of expanded clones were found spanning the CD8_Tex and CD8_Tpex states. This is expected as sustained antigen stimulation in the tumor drives immuno-dominant tumor-reactive clones towards the exhaustion differentiation path.
The combination of ProjecTILs and scRepertoire simplifies the joint analysis of single-cell expression data and clonotype analysis, in the context of an annotated reference map of TIL states.

Further reading

Original publication - Xiong et al. (2019) Cancer Immunol Res

ProjecTILs case studies - INDEX - Repository

ProjecTILs method Andreatta et. al (2021) Nat. Comm. and code

For more information on performing reference-based T cell clonal analysis see also Andreatta et al. (2023) BioProtocol - and its related GitHub repository.

Session info

sessionInfo()