UCell signature enrichment - interacting with Seurat

In this demo, we will apply UCell to evaluate gene signatures in single-cell PBMC data. We will use a subset of the data from Hao and Hao et al, bioRvix 2020, which comprises multiple immune cell types at different levels of resolution. Because these cells were characterized both in terms of transciptomes (using scRNAseq) and surface proteins (using a panel of antibodies), the cell type annotations should be of very high quality. To demonstrate how UCell can simply and accurately evaluate gene signatures on a query dataset, we will apply it directly to the Seurat object from Hao et al. and compare the signature scores to the original cluster annotations by the authors.

The original dataset is very large (>160K cells), for this illustrative example we used a downsampled version (20,000 cells), and then further subset on T cells only (9,074 cells).

Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("UCell")

library(Seurat)
library(UCell)
set.seed(123)

Query single-cell data

Obtain a downsampled version of the data from Hao and Hao et al, bioRvix 2020, either programmatically (see below) or from FigShare

options(timeout = 3000)
url <- "https://figshare.com/ndownloader/files/54054143"
download.file(url, destfile = "data/pbmc_multimodal.Tcells_ds9k.rds")

Then load the object and visualize the clustering annotation by the authors.

pbmc.Tcell <- readRDS("data/pbmc_multimodal.Tcells_ds9k.rds")
pbmc.Tcell <- RenameAssays(pbmc.Tcell, assay.name = "SCT", new.assay.name = "RNA")

DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
    label.size = 3, repel = TRUE)

Score signatures using UCell

Define some signatures for T cell subtypes

markers <- list()
markers$Tcell_CD4 <- c("CD4", "CD40LG")
markers$Tcell_CD8 <- c("CD8A", "CD8B")
markers$Tcell_Treg <- c("FOXP3", "IL2RA")
markers$Tcell_MAIT <- c("TRAV1-2", "SLC4A10")
markers$Tcell_gd <- c("TRDC", "TRGC1", "TRGC2", "TRDV1", "TRAC-", "TRBC1-", "TRBC2-")
markers$Tcell_NK <- c("FGFBP2", "SPON2", "KLRF1", "FCGR3A", "KLRD1", "TRDC", "CD3E-",
    "CD3G-")

pbmc.Tcell <- AddModuleScore_UCell(pbmc.Tcell, features = markers)
signature.names <- paste0(names(markers), "_UCell")

VlnPlot(pbmc.Tcell, features = signature.names, group.by = "celltype.l1", pt.size = 0)

VlnPlot(pbmc.Tcell, features = signature.names, group.by = "celltype.l2", pt.size = 0)

How do signatures compare to original annotations

Idents(pbmc.Tcell) <- "celltype.l2"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label.size = 3,
    repel = TRUE, label = T)

FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = signature.names, ncol = 3)

Smoothing UCell scores

Single-cell data are sparse. It can be useful to ‘impute’ scores by neighboring cells and partially correct this sparsity. The function SmoothKNN performs smoothing of single-cell scores by weighted average of the k-nearest neighbors in a given dimensionality reduction. It can be applied directly on Seurat objects to smooth UCell scores:

pbmc.Tcell <- SmoothKNN(pbmc.Tcell, signature.names = signature.names, reduction = "pca")

FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = paste0(signature.names,
    "_kNN"), ncol = 3)

Compare to AddModuleScore from Seurat

Seurat comes with a method for signature enrichment analysis, AddModuleScore. This method is very fast, but the score is highly dependent on the composition of the dataset. Here we will apply AddModuleScore with a simple CD8 T cell signature to two datasets: a set composed of different T cell types (pbmc.Tcell) and a subset of this dataset only comprising the CD8 T cells (pbmc.Tcell.CD8).

First, generate a subset only comprising CD8 T cells (pbmc.Tcell.CD8)

Idents(pbmc.Tcell) <- "celltype.l1"
pbmc.Tcell.CD8 <- subset(pbmc.Tcell, idents = c("CD8 T"))
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
    label = TRUE, label.size = 3, repel = TRUE) + NoLegend()

Note that applying the same signature to the complete set or to the CD8 T subset gives very different results. When other cell types are present, the score distribution for CD8 T cells has a median close to 1, but the same CD8 T cell evaluated alone give a zero-centered distribution of scores. It may be undesirable to have a score that changes so dramatically for the same cells depending of the composition of the dataset.

markers.cd8 <- list(Tcell_CD8 = c("CD8A", "CD8B"))

pbmc.Tcell <- AddModuleScore(pbmc.Tcell, features = markers.cd8, name = "Tcell_CD8_Seurat")
a <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_Seurat1", pt.size = 0)

pbmc.Tcell.CD8 <- AddModuleScore(pbmc.Tcell.CD8, features = markers.cd8, name = "Tcell_CD8_Seurat")
b <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_Seurat1", pt.size = 0)

a | b

summary(subset(pbmc.Tcell, subset = celltype.l1 == "CD8 T")$Tcell_CD8_Seurat1)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.5548  0.3933  0.7152  0.6977  1.0206  2.0078

summary(pbmc.Tcell.CD8$Tcell_CD8_Seurat1)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.5472 -0.4332 -0.1008 -0.1290  0.1988  1.2619

UCell score is based on gene rankings and therefore is not affected by the composition of the query dataset. Note that the score distribution is nearly identical for the same cell population in different datasets (small differences are due to random resolution of rank ties)

pbmc.Tcell <- AddModuleScore_UCell(pbmc.Tcell, features = markers.cd8)
a <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_UCell", pt.size = 0)

pbmc.Tcell.CD8 <- AddModuleScore_UCell(pbmc.Tcell.CD8, features = markers.cd8)
b <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_UCell", pt.size = 0)

a | b

summary(subset(pbmc.Tcell, subset = celltype.l1 == "CD8 T")$Tcell_CD8_UCell)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.3795  0.5000  0.5235  0.7679  0.9371

summary(pbmc.Tcell.CD8$Tcell_CD8_UCell)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.3795  0.5000  0.5235  0.7679  0.9371

We can have a look at the distribution of the scores for all T cells:

Idents(pbmc.Tcell) <- "celltype.l1"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
    label.size = 3, repel = TRUE)

FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = c("Tcell_CD8_UCell", "Tcell_CD8_Seurat1"),
    ncol = 2, order = T)

…and on the CD8 T cell subset only:

Idents(pbmc.Tcell.CD8) <- "celltype.l2"
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
    label = TRUE, label.size = 3, repel = TRUE) + NoLegend()

FeaturePlot(pbmc.Tcell.CD8, reduction = "wnn.umap", features = c("Tcell_CD8_UCell",
    "Tcell_CD8_Seurat1"), ncol = 2, order = T)

UCell signature enrichment - interacting with Seurat