Isolating target cell types from CITE-seq data using scGate

Introduction

The CITE-seq dataset by Hao.et al. 2021 characterizes human PBMCs with scRNA-seq paired with antibody-derived tags (ADT) for a panel of >200 antibodies. This demo illustrates how the scGate package can be applied to isolate target cell type populations from the different modalities in CITE-seq data.

renv::restore()
library(ggplot2)
library(dplyr)
library(Seurat)
library(UCell)
library(scGate)
library(viridis)
library(patchwork)

Load CITE-seq dataset

For simplicity, in this demo we use a reduced version of the dataset by Hao.et al. 2021, downsampled to 2k cells. The same approach can be applied to the complete dataset.

testing.datasets <- scGate::get_testing_data(version = "hsa.latest")
hao <- testing.datasets$Satija

palette <- c(list("Impure" = "gray", "Pure" = "green"))

Let’s have a look at the cell type annotations provided by the authors

DefaultAssay(hao) <- "ADT"
DimPlot(hao, group.by = "celltype.l1",label = T,repel = T,reduction = "umap") + theme(aspect.ratio=1)

Are ADT signals in the same scale, how are they distributed within each cell?

ADT signals appear to be fairly comparable between cells and antibody tags. However, unlike scRNA-seq these data are not sparse (many zeros), so we will have to adapt the parameters of scGate to this different distribution of signals.

scGate on CITE-seq data

Because CITE-seq contains two modalities (scRNA-seq and ADT counts), we can apply scGate on one modality and verify gating purity by measuring key markers in the second modality.

Purify B cells

First, we build a simple gating model for Bcells using the gene MS4A1 (encoding CD20)

bcell.sct <- scGate::gating_model(name = "Bcell.sct", signature = c("MS4A1"))

And we can apply it on the SCT assay (which stores normalized RNA expression counts) of the Seurat object

hao <- scGate(data = hao, model = bcell.sct, assay = "SCT",output.col.name = "Bcell.SCT")

DimPlot(hao, group.by = "Bcell.SCT", cols = palette) + 
  theme(aspect.ratio = 1) + ggtitle("scGate Bcells (SCT assay)")

The surface protein measurements for key B cell markers CD19 and CD20 (stored in the ADT assay) confirm that scGate could isolate pure B cells from trascriptomics data

VlnPlot(hao,features = c("CD20","CD19"), assay = "ADT",
        cols = palette, pt.size = 0)

We can now perform the complementary experiment: detecting B cells from ADT readouts (and confirming gating purity by RNA-seq expression)

Set up a simple gating model based on marker CD20

cd20.adt <-  gating_model(name = "Bcell.adt", signature = c("CD20"))

Then apply this scGate model on the ADT assay. Note that, because ADT data are not sparse, and the dimensionality is much lower than scRNA-seq (up to a few hundred tags, compared to tens of thousands of genes in scRNA-seq), we will need to adjust the maxRank parameter. This parameter tells UCell how many variables to consider for ranking. For ADT data, the total number of tags divided by two is normally a reasonable choice.

DefaultAssay(hao) <- "ADT"
maxrank <- nrow(hao@assays$ADT)/2

hao <- scGate(data = hao, model = cd20.adt, assay = "ADT",output.col.name = "Bcell.ADT",maxRank = maxrank)  

b1 <- DimPlot(hao, group.by = "Bcell.ADT", cols = palette) + 
  theme(aspect.ratio = 1) + ggtitle("scGate Bcells (ADT assay)")

#Verify RNA expression of MS4A1 (encoding CD20) in SCT assay
b2 <- VlnPlot(hao,features = "MS4A1", assay = "SCT",cols = palette, pt.size = 0)
b1 + b2

We can always examine the strength of the signatures in scGate models by plotting the UCell scores returned as part of the output. For example, for the Bcell.sct and Bcell.adt signatures, scores for all cells are available in the metadata slots Bcell.sct_UCell and Bcell.adt_UCell, respectively.

DefaultAssay(hao) <- "SCT"
a <- FeaturePlot(hao, features=c("Bcell.sct_UCell"), order = T) + 
       scale_color_viridis(option="D") + theme(aspect.ratio=1)


DefaultAssay(hao) <- "ADT"
b <- FeaturePlot(hao, features=c("Bcell.adt_UCell"), order = T) + 
       scale_color_viridis(option="D") + theme(aspect.ratio=1)

a | b

Purify T cells

As described in another demo, scGate allows us defining hierarchical models, mimicking a flow cytometry gating process.

# First level - filter on lymphoid cells
t.sct <- gating_model(name = "lymphoid", signature = c("LCK"))
# Second level - filter on T cells markers (integrin alpha 2b expressed in endothelial cells, platelets, etc. but not T cells)
t.sct <- gating_model(model = t.sct, name = "t.sct", signature = c("CD3E","CD3D","ITGA2B-"),level = 2)

# Run scGate model on SCT assay
hao <- scGate(data = hao, model = t.sct, assay = "SCT")

a <- DimPlot(hao, cols = palette) + 
  theme(aspect.ratio = 1) + ggtitle("scGate Tcells (SCT assay)")

# and check surface protein markers in ADT assay
b <- VlnPlot(hao,features = c(grep("CD3-[1-2]",row.names(hao@assays$ADT),value = T),"CD41"), 
             assay = "ADT",cols = palette, pt.size = 0, ncol=2) 

a | b

And conversely, generate a model to filter T cells based on ADT readouts.

#model definition
t.adt <- gating_model(name = "t.adt", signature = c("CD3-1","CD3-2","CD41-"))

# filtering 
hao <- scGate(data = hao, model = t.adt, assay = "ADT",maxRank = maxrank)

# UMAP plot
DimPlot(hao, cols = palette)  + 
  ggtitle("scGate Tcells (ADT assay)") + theme(aspect.ratio=1)

Almost all cells were filtered out by the method. When using ADT data, one needs to check if default positive/negative threshold (0.2) are appropriate:

plot_UCell_scores(obj = hao, model = t.adt)
## Picking joint bandwidth of 0.0143

The distribution is bimodal, but the default threshold pos.thr=0.2 is too strict. Run again with pos.thr=0.05.

# model definition
t.adt <- gating_model(name = "t.adt", signature = c("CD3-1","CD3-2","CD41-"))
# filtering
hao <- scGate(data = hao, model = t.adt, assay = "ADT", maxRank = maxrank, pos.thr = 0.05)

a <- DimPlot(hao, cols = palette) + 
  theme(aspect.ratio = 1) + ggtitle("scGate Tcells (ADT assay)")

# Control markers on SCT assay
b <- VlnPlot(hao,features = c("CD3E","CD3D","ITGA2B"), assay = "SCT", ncol=2,
             cols = palette, pt.size = 0) 
a | b

Purify NK cells

Build a model for scRNA-seq

nk.sct <- gating_model(name = "NK.sct", signature = c("NCAM1","KLRD1","CD3D-"))

Filter on SCT data

hao <- scGate(data = hao, model = nk.sct, assay = "SCT",output.col.name = "NK.sct")
a <- DimPlot(hao, cols = palette) + theme(aspect.ratio = 1) + 
  ggtitle("scGate NKs (SCT assay)")

b <- VlnPlot(hao,features = c("CD56-1", "CD56-2","CD3-1","CD3-2"), assay = "ADT",
             cols = palette, pt.size = 0,ncol = 2)

a | b

Build and apply NK model for ADT assay

# ADT: build model
# since KLRD1/CD94 is not available in Hao data, we will use the CD56 marker instead

nk.adt <- gating_model(name = "NK.adt", signature = c("CD56-1", "CD56-2"))
nk.adt <- gating_model(model = nk.adt, name = "tcell.adt",signature = c("CD3-1","CD3-2"), negative = T)

hao <- scGate(data = hao, model = nk.adt, assay = "ADT",maxRank = maxrank,
              pos.thr = 0.4, neg.thr = 0.7, output.col.name = "NK.adt")
plot_UCell_scores(obj = hao, model = nk.adt, pos.thr = 0.4, neg.thr = 0.7)

a <- DimPlot(hao, cols = palette, group.by = "NK.adt") +
  theme(aspect.ratio = 1) + ggtitle("scGate NKs (ADT assay)")
b <- VlnPlot(hao,features = c("NCAM1","KLRD1","CD3D","CD3G"), assay = "SCT",
             cols = palette, pt.size = 0,ncol = 2)

a | b

Purify Platelets

We start by filtering platelets on the SCT assay

# model definition
platelet.sct <- gating_model(name = "platelet.sct", signature = c("ITGA2B"))
#filtering
hao <- scGate(data = hao, model = platelet.sct, assay = "SCT", output.col.name = "platelet.sct")
# UMAP plot
a <- DimPlot(hao, cols = palette) + theme(aspect.ratio = 1) + ggtitle("scGate Platelets (SCT assay)")
# Control markers in ADT assay
b <- VlnPlot(hao,features = c("CD41"), assay = "ADT",
             cols = palette, pt.size = 0) + theme(aspect.ratio = 1)

(a + theme(aspect.ratio=1)) | (b + theme(aspect.ratio=1))

Finally, we try to detect Platelets using ADT counts.

Since CD41 could be expressed in monocytes and dendritic cells, we need to add a negative signature. Positive and negative signatures could present different UCell distributions when dealing with ADT data; again we need to adjust the thresholds for positive and negative UCell scores (the distribution plots can help you set a threshold).

platelet.adt <- gating_model(name = "platelet", signature = c("CD41"))
platelet.adt <- gating_model(model = platelet.adt,name = "myeloid", signature = c("HLA-DR"),negative = T)
platelet.adt <- gating_model(model = platelet.adt,name = "monocyte", signature = c("CD14","CD11c","CD11b-1","CD11b-2"),negative = T)
hao <- scGate(data = hao, model = platelet.adt, assay = "ADT",maxRank = maxrank,pos.thr = 0.95,neg.thr = 0.6)
a <- DimPlot(hao, cols = palette)  + ggtitle("scGate Platelets (ADT assay)") + theme(aspect.ratio=1)
b <- VlnPlot(hao,features = c("ITGA2B"), assay = "SCT",
             cols = palette, pt.size = 0) +theme(aspect.ratio=1)

a | b

Further reading

The scGate package and installation instructions are available at: scGate package

The code for this demo (and additional demos) can be found on GitHub

References

  • Hao, Y., Hao, S., Andersen-Nissen, E., Mauck III, W. M., Zheng, S., Butler, A., … & Satija, R. (2021). Integrated analysis of multimodal single-cell data. Cell

  • Andreatta, M., Carmona, S. J. (2021) UCell: Robust and scalable single-cell gene signature scoring Computational and Structural Biotechnology Journal