Isolating target cell types from CITE-seq data using scGate
Introduction
The CITE-seq dataset by Hao.et al. 2021 characterizes human PBMCs with scRNA-seq paired with antibody-derived tags (ADT) for a panel of >200 antibodies. This demo illustrates how the scGate package can be applied to isolate target cell type populations from the different modalities in CITE-seq data.
Load CITE-seq dataset
For simplicity, in this demo we use a reduced version of the dataset by Hao.et al. 2021, downsampled to 2k cells. The same approach can be applied to the complete dataset.
testing.datasets <- scGate::get_testing_data(version = "hsa.latest")
hao <- testing.datasets$Satija
palette <- c(list("Impure" = "gray", "Pure" = "green"))
Let’s have a look at the cell type annotations provided by the authors
DefaultAssay(hao) <- "ADT"
DimPlot(hao, group.by = "celltype.l1",label = T,repel = T,reduction = "umap") + theme(aspect.ratio=1)
Are ADT signals in the same scale, how are they distributed within each cell?
ADT signals appear to be fairly comparable between cells and antibody tags. However, unlike scRNA-seq these data are not sparse (many zeros), so we will have to adapt the parameters of scGate to this different distribution of signals.
scGate on CITE-seq data
Because CITE-seq contains two modalities (scRNA-seq and ADT counts), we can apply scGate on one modality and verify gating purity by measuring key markers in the second modality.
Purify B cells
First, we build a simple gating model for Bcells using the gene MS4A1 (encoding CD20)
And we can apply it on the SCT assay (which stores normalized RNA expression counts) of the Seurat object
hao <- scGate(data = hao, model = bcell.sct, assay = "SCT",output.col.name = "Bcell.SCT")
DimPlot(hao, group.by = "Bcell.SCT", cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Bcells (SCT assay)")
The surface protein measurements for key B cell markers CD19 and CD20 (stored in the ADT assay) confirm that scGate could isolate pure B cells from trascriptomics data
We can now perform the complementary experiment: detecting B cells from ADT readouts (and confirming gating purity by RNA-seq expression)
Set up a simple gating model based on marker CD20
Then apply this scGate model on the ADT assay. Note that, because ADT
data are not sparse, and the dimensionality is much lower than scRNA-seq
(up to a few hundred tags, compared to tens of thousands of genes in
scRNA-seq), we will need to adjust the maxRank
parameter.
This parameter tells UCell how many variables
to consider for ranking. For ADT data, the total number of tags divided
by two is normally a reasonable choice.
DefaultAssay(hao) <- "ADT"
maxrank <- nrow(hao@assays$ADT)/2
hao <- scGate(data = hao, model = cd20.adt, assay = "ADT",output.col.name = "Bcell.ADT",maxRank = maxrank)
b1 <- DimPlot(hao, group.by = "Bcell.ADT", cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Bcells (ADT assay)")
#Verify RNA expression of MS4A1 (encoding CD20) in SCT assay
b2 <- VlnPlot(hao,features = "MS4A1", assay = "SCT",cols = palette, pt.size = 0)
b1 + b2
We can always examine the strength of the signatures in scGate models
by plotting the UCell scores returned as part of the output. For
example, for the Bcell.sct and Bcell.adt signatures, scores for all
cells are available in the metadata slots Bcell.sct_UCell
and Bcell.adt_UCell
, respectively.
DefaultAssay(hao) <- "SCT"
a <- FeaturePlot(hao, features=c("Bcell.sct_UCell"), order = T) +
scale_color_viridis(option="D") + theme(aspect.ratio=1)
DefaultAssay(hao) <- "ADT"
b <- FeaturePlot(hao, features=c("Bcell.adt_UCell"), order = T) +
scale_color_viridis(option="D") + theme(aspect.ratio=1)
a | b
Purify T cells
As described in another demo, scGate allows us defining hierarchical models, mimicking a flow cytometry gating process.
# First level - filter on lymphoid cells
t.sct <- gating_model(name = "lymphoid", signature = c("LCK"))
# Second level - filter on T cells markers (integrin alpha 2b expressed in endothelial cells, platelets, etc. but not T cells)
t.sct <- gating_model(model = t.sct, name = "t.sct", signature = c("CD3E","CD3D","ITGA2B-"),level = 2)
# Run scGate model on SCT assay
hao <- scGate(data = hao, model = t.sct, assay = "SCT")
a <- DimPlot(hao, cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Tcells (SCT assay)")
# and check surface protein markers in ADT assay
b <- VlnPlot(hao,features = c(grep("CD3-[1-2]",row.names(hao@assays$ADT),value = T),"CD41"),
assay = "ADT",cols = palette, pt.size = 0, ncol=2)
a | b
And conversely, generate a model to filter T cells based on ADT readouts.
#model definition
t.adt <- gating_model(name = "t.adt", signature = c("CD3-1","CD3-2","CD41-"))
# filtering
hao <- scGate(data = hao, model = t.adt, assay = "ADT",maxRank = maxrank)
# UMAP plot
DimPlot(hao, cols = palette) +
ggtitle("scGate Tcells (ADT assay)") + theme(aspect.ratio=1)
Almost all cells were filtered out by the method. When using ADT data, one needs to check if default positive/negative threshold (0.2) are appropriate:
The distribution is bimodal, but the default threshold pos.thr=0.2 is too strict. Run again with pos.thr=0.05.
# model definition
t.adt <- gating_model(name = "t.adt", signature = c("CD3-1","CD3-2","CD41-"))
# filtering
hao <- scGate(data = hao, model = t.adt, assay = "ADT", maxRank = maxrank, pos.thr = 0.05)
a <- DimPlot(hao, cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Tcells (ADT assay)")
# Control markers on SCT assay
b <- VlnPlot(hao,features = c("CD3E","CD3D","ITGA2B"), assay = "SCT", ncol=2,
cols = palette, pt.size = 0)
a | b
Purify NK cells
Build a model for scRNA-seq
Filter on SCT data
hao <- scGate(data = hao, model = nk.sct, assay = "SCT",output.col.name = "NK.sct")
a <- DimPlot(hao, cols = palette) + theme(aspect.ratio = 1) +
ggtitle("scGate NKs (SCT assay)")
b <- VlnPlot(hao,features = c("CD56-1", "CD56-2","CD3-1","CD3-2"), assay = "ADT",
cols = palette, pt.size = 0,ncol = 2)
a | b
Build and apply NK model for ADT assay
# ADT: build model
# since KLRD1/CD94 is not available in Hao data, we will use the CD56 marker instead
nk.adt <- gating_model(name = "NK.adt", signature = c("CD56-1", "CD56-2"))
nk.adt <- gating_model(model = nk.adt, name = "tcell.adt",signature = c("CD3-1","CD3-2"), negative = T)
hao <- scGate(data = hao, model = nk.adt, assay = "ADT",maxRank = maxrank,
pos.thr = 0.4, neg.thr = 0.7, output.col.name = "NK.adt")
plot_UCell_scores(obj = hao, model = nk.adt, pos.thr = 0.4, neg.thr = 0.7)
Purify Platelets
We start by filtering platelets on the SCT assay
# model definition
platelet.sct <- gating_model(name = "platelet.sct", signature = c("ITGA2B"))
#filtering
hao <- scGate(data = hao, model = platelet.sct, assay = "SCT", output.col.name = "platelet.sct")
# UMAP plot
a <- DimPlot(hao, cols = palette) + theme(aspect.ratio = 1) + ggtitle("scGate Platelets (SCT assay)")
# Control markers in ADT assay
b <- VlnPlot(hao,features = c("CD41"), assay = "ADT",
cols = palette, pt.size = 0) + theme(aspect.ratio = 1)
(a + theme(aspect.ratio=1)) | (b + theme(aspect.ratio=1))
Finally, we try to detect Platelets using ADT counts.
Since CD41 could be expressed in monocytes and dendritic cells, we need to add a negative signature. Positive and negative signatures could present different UCell distributions when dealing with ADT data; again we need to adjust the thresholds for positive and negative UCell scores (the distribution plots can help you set a threshold).
platelet.adt <- gating_model(name = "platelet", signature = c("CD41"))
platelet.adt <- gating_model(model = platelet.adt,name = "myeloid", signature = c("HLA-DR"),negative = T)
platelet.adt <- gating_model(model = platelet.adt,name = "monocyte", signature = c("CD14","CD11c","CD11b-1","CD11b-2"),negative = T)
hao <- scGate(data = hao, model = platelet.adt, assay = "ADT",maxRank = maxrank,pos.thr = 0.95,neg.thr = 0.6)
Further reading
The scGate package and installation instructions are available at: scGate package
The code for this demo (and additional demos) can be found on GitHub
References
Hao, Y., Hao, S., Andersen-Nissen, E., Mauck III, W. M., Zheng, S., Butler, A., … & Satija, R. (2021). Integrated analysis of multimodal single-cell data. Cell
Andreatta, M., Carmona, S. J. (2021) UCell: Robust and scalable single-cell gene signature scoring Computational and Structural Biotechnology Journal