UCell signature enrichment - interacting with Seurat
In this demo, we will apply UCell to evaluate gene signatures in single-cell PBMC data. We will use a subset of the data from Hao and Hao et al, bioRvix 2020, which comprises multiple immune cell types at different levels of resolution. Because these cells were characterized both in terms of transciptomes (using scRNAseq) and surface proteins (using a panel of antibodies), the cell type annotations should be of very high quality. To demonstrate how UCell can simply and accurately evaluate gene signatures on a query dataset, we will apply it directly to the Seurat object from Hao et al. and compare the signature scores to the original cluster annotations by the authors.
The original dataset is very large (>160K cells), for this illustrative example we used a downsampled version (20,000 cells), and then further subset on T cells only (9,074 cells).
Installation
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
::install("UCell")
BiocManagerlibrary(Seurat)
library(UCell)
set.seed(123)
Query single-cell data
Obtain a downsampled version of the data from Hao and Hao et al, bioRvix 2020 at the following link: pbmc_multimodal.downsampled20k.Tcell.seurat.RNA.rds
Then load the object and visualize the clustering annotation by the authors.
<- readRDS("pbmc_multimodal.downsampled20k.Tcell.seurat.RNA.rds")
pbmc.Tcell DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
label.size = 3, repel = TRUE)
Score signatures using UCell
Define some signatures for T cell subtypes
<- list()
markers $Tcell_CD4 <- c("CD4", "CD40LG")
markers$Tcell_CD8 <- c("CD8A", "CD8B")
markers$Tcell_Treg <- c("FOXP3", "IL2RA")
markers$Tcell_MAIT <- c("KLRB1", "SLC4A10", "NCR3")
markers$Tcell_gd <- c("TRDC", "TRGC1", "TRGC2", "TRDV1", "TRAC-", "TRBC1-", "TRBC2-")
markers$Tcell_NK <- c("FGFBP2", "SPON2", "KLRF1", "FCGR3A", "KLRD1", "TRDC", "CD3E-",
markers"CD3G-")
<- AddModuleScore_UCell(pbmc.Tcell, features = markers)
pbmc.Tcell <- paste0(names(markers), "_UCell")
signature.names
VlnPlot(pbmc.Tcell, features = signature.names, group.by = "celltype.l1")
VlnPlot(pbmc.Tcell, features = signature.names, group.by = "celltype.l2")
How do signatures compare to original annotations
Idents(pbmc.Tcell) <- "celltype.l2"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label.size = 3,
repel = TRUE, label = T)
FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = signature.names, ncol = 3,
order = T)
Compare to AddModuleScore from Seurat
Seurat comes with a method for signature enrichment analysis, AddModuleScore. This method is very fast, but the score is highly dependent on the composition of the dataset. Here we will apply AddModuleScore with a simple CD8 T cell signature to two datasets: a set composed of different T cell types (pbmc.Tcell) and a subset of this dataset only comprising the CD8 T cells (pbmc.Tcell.CD8).
First, generate a subset only comprising CD8 T cells (pbmc.Tcell.CD8)
Idents(pbmc.Tcell) <- "celltype.l1"
<- subset(pbmc.Tcell, idents = c("CD8 T"))
pbmc.Tcell.CD8 DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
label = TRUE, label.size = 3, repel = TRUE) + NoLegend()
Note that applying the same signature to the complete set or to the CD8 T subset gives very different results. When other cell types are present, the score distribution for CD8 T cells has a median close to 1, but the same CD8 T cell evaluated alone give a zero-centered distribution of scores. It may be undesirable to have a score that changes so dramatically for the same cells depending of the composition of the dataset.
<- list(Tcell_CD8 = c("CD8A", "CD8B"))
markers.cd8
<- AddModuleScore(pbmc.Tcell, features = markers.cd8, name = "Tcell_CD8_Seurat")
pbmc.Tcell <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_Seurat1")
a
<- AddModuleScore(pbmc.Tcell.CD8, features = markers.cd8, name = "Tcell_CD8_Seurat")
pbmc.Tcell.CD8 <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_Seurat1")
b
| b a
summary(subset(pbmc.Tcell, subset = celltype.l1 == "CD8 T")$Tcell_CD8_Seurat1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.6057 0.5149 0.9236 0.8756 1.2673 2.3228
summary(pbmc.Tcell.CD8$Tcell_CD8_Seurat1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.65105 -0.44921 -0.03485 -0.09280 0.30758 1.39551
UCell score is based on gene rankings and therefore is not affected by the composition of the query dataset. Note that the score distribution is nearly identical for the same cell population in different datasets (small differences are due to random resolution of rank ties)
<- AddModuleScore_UCell(pbmc.Tcell, features = markers.cd8)
pbmc.Tcell <- VlnPlot(pbmc.Tcell, features = "Tcell_CD8_UCell")
a
<- AddModuleScore_UCell(pbmc.Tcell.CD8, features = markers.cd8)
pbmc.Tcell.CD8 <- VlnPlot(pbmc.Tcell.CD8, features = "Tcell_CD8_UCell")
b
| b a
summary(subset(pbmc.Tcell, subset = celltype.l1 == "CD8 T")$Tcell_CD8_UCell)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3803 0.5193 0.5294 0.7733 0.9372
summary(pbmc.Tcell.CD8$Tcell_CD8_UCell)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.3803 0.5193 0.5294 0.7733 0.9372
We can have a look at the distribution of the scores for all T cells:
Idents(pbmc.Tcell) <- "celltype.l1"
DimPlot(object = pbmc.Tcell, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE,
label.size = 3, repel = TRUE)
FeaturePlot(pbmc.Tcell, reduction = "wnn.umap", features = c("Tcell_CD8_UCell", "Tcell_CD8_Seurat1"),
ncol = 2, order = T)
…and on the CD8 T cell subset only:
Idents(pbmc.Tcell.CD8) <- "celltype.l2"
DimPlot(object = pbmc.Tcell.CD8, reduction = "wnn.umap", group.by = "celltype.l2",
label = TRUE, label.size = 3, repel = TRUE) + NoLegend()
FeaturePlot(pbmc.Tcell.CD8, reduction = "wnn.umap", features = c("Tcell_CD8_UCell",
"Tcell_CD8_Seurat1"), ncol = 2, order = T)
Further reading
For more examples of UCell functionalities see THIS DEMO
The code and the package are available at the UCell GitHub repository; more demos available at UCell demo repository