This notebook reproduces results from pyUCell with the R version of UCell. We will use a test object from scanpy, also used in pyUCell’s demo.
library(UCell)
library(reticulate)
library(zellkonverter)
library(Seurat)
library(ggplot2)
Download anndata object from scanpy. This requires “reticulate” to
run python from within R. Alternatively, you may download and store the
pbmc3k.h5ad from a python session.
do_download <- FALSE
if (do_download) {
use_python("/usr/bin/python3", required = TRUE)
sc <- import("scanpy")
adata <- sc$datasets$pbmc3k()
adata$write("pbmc3k.h5ad")
}
Read object into R. For conversion, we will use
zellkoverter; first converts an “.h5ad” object to “sce”
format, then we can convert from “sce” to “Seurat”. Conversions between
R and python objects are known to be clunky.
sce <- readH5AD("pbmc3k.h5ad")
pbmc <- as.Seurat(sce, counts="X", data=NULL)
## Warning: `PackageCheck()` was deprecated in SeuratObject 5.0.0.
## ℹ Please use `rlang::check_installed()` instead.
## ℹ The deprecated feature was likely used in the Seurat package.
## Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
pbmc
## An object of class Seurat
## 32738 features across 2700 samples within 1 assay
## Active assay: originalexp (32738 features, 0 variable features)
## 2 layers present: counts, data
Define two simple signatures.
signatures = list()
signatures[['T_cell']] <- c('CD3D', 'CD3E', 'CD2')
signatures[['B_cell']] <- c('MS4A1', 'CD79A', 'CD79B')
Run UCell on the PBMC object to evaluate the T cell and B cell signatures.
pbmc <- AddModuleScore_UCell(pbmc, features = signatures)
Have a look at the results in metadata.
head(pbmc[[]])
## orig.ident nCount_originalexp nFeature_originalexp
## AAACATACAACCAC-1 SeuratProject 2421 781
## AAACATTGAGCTAC-1 SeuratProject 4903 1352
## AAACATTGATCAGC-1 SeuratProject 3149 1131
## AAACCGTGCTTCCG-1 SeuratProject 2639 960
## AAACCGTGTATGCG-1 SeuratProject 981 522
## AAACGCACTGGTAC-1 SeuratProject 2164 782
## T_cell_UCell B_cell_UCell
## AAACATACAACCAC-1 0.5996885 0.0000000
## AAACATTGAGCTAC-1 0.0000000 0.8558077
## AAACATTGATCAGC-1 0.9027592 0.0000000
## AAACCGTGCTTCCG-1 0.1913663 0.0000000
## AAACCGTGTATGCG-1 0.0000000 0.0000000
## AAACGCACTGGTAC-1 0.5132399 0.2923899
Calculate low-dimensional embeddings, to see the signature activity in UMAP space.
pbmc <- pbmc |> NormalizeData() |> FindVariableFeatures(nfeatures = 500) |>
ScaleData() |> RunPCA(npcs = 10) |> RunUMAP(dims = 1:10)
## Warning: The `slot` argument of `SetAssayData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
## Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
## Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Centering and scaling data matrix
## PC_ 1
## Positive: CST3, TYROBP, LST1, AIF1, FTL, FCN1, LYZ, S100A9, FTH1, FCER1G
## CFD, LGALS1, S100A8, LGALS2, CTSS, IFITM3, SAT1, IFI30, NPC2, COTL1
## GSTP1, NCF2, CDA, PYCARD, MS4A6A, APOBEC3A, TNFSF13B, S100A4, HLA-DRA, CEBPB
## Negative: LTB, CD2, STK17A, GIMAP5, CCL5, AQP3, CST7, MAL, GZMK, NKG7
## KLRG1, LYAR, PRF1, RIC3, FAM107B, TIGIT, GPR183, GZMH, FGFBP2, CD79A
## PTPN7, GZMB, TCL1A, GNLY, XCL2, PPA1, NEMF, STMN1, UXS1, C14orf1
## PC_ 2
## Positive: HLA-DPA1, HLA-DPB1, HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DRA, CD79A, CD74, HLA-DQA1, CD79B
## TCL1A, HLA-DMA, LTB, IRF8, FCRLA, CTSS, FCN1, C1orf162, LGALS2, NCF2
## PYCARD, LYZ, IFI30, ALDH2, LST1, PSMA7, CD1C, S100A9, IGLL5, MS4A6A
## Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, HIST1H2AC, GP9, RGS18, CLU
## TUBB1, CD9, ITGA2B, PTCRA, CA2, TMEM40, ACRBP, TREML1, MMD, MYL9
## RUFY1, SEPT5, MPP1, TSC22D1, CMTM5, GP1BA, LY6G6F, CLEC1B, MAP3K7CL, AC147651.3
## PC_ 3
## Positive: CD79A, HLA-DQA1, HLA-DQB1, TCL1A, CD79B, HLA-DRA, CD74, HLA-DRB1, HLA-DPB1, HLA-DMA
## HLA-DPA1, HLA-DRB5, FCRLA, LTB, IRF8, IGLL5, PPP1R14A, MZB1, RP5-887A10.1, RP11-428G5.5
## CD1C, IGJ, PRKCB, SCPEP1, HIST1H2AC, ODC1, FAM212A, CD9, AL928768.3, RIC3
## Negative: NKG7, CST7, PRF1, GZMB, FGFBP2, GNLY, GZMH, CCL4, CCL5, S100A4
## FCGR3A, XCL2, ANXA1, IGFBP7, ACTB, GAPDH, KLRG1, LYAR, CD160, ABI3
## FCER1G, TIGIT, LGALS1, TYROBP, CCL3, CD2, GIMAP5, GZMK, PPIB, S100B
## PC_ 4
## Positive: S100A8, MAL, FYB, AQP3, S100A9, S100A12, CD2, LTB, LGALS2, FOLR3
## LYZ, S100A4, FCN1, MS4A6A, IL8, GIMAP5, AIF1, CORO1B, LGALS3BP, CDA
## IL23A, ATP5H, G0S2, PPA1, FPR1, COTL1, FTH1, CFD, ANXA1, LST1
## Negative: HLA-DQA1, GZMB, HLA-DPB1, CD74, CD79B, HLA-DQB1, HLA-DPA1, FGFBP2, NKG7, CD79A
## HLA-DRB1, PRF1, CST7, HLA-DRB5, GNLY, GZMH, HLA-DRA, TCL1A, CCL4, FCGR3A
## HLA-DMA, XCL2, CCL5, IGFBP7, FCRLA, IRF8, ABI3, IGLL5, MZB1, CD160
## PC_ 5
## Positive: FGFBP2, CCL4, S100A8, GZMB, NKG7, CST7, GNLY, PRF1, LGALS2, S100A9
## S100A12, CCL3, GZMH, MS4A6A, XCL2, FOLR3, CCL5, CD160, TCL1A, TYROBP
## IGFBP7, IL8, FCN1, GSTP1, LYZ, HBA1, KLRG1, CD79A, ALDH2, FTL
## Negative: KIAA0101, TYMS, LTB, TUBA1B, ZWINT, HN1, RRM2, AQP3, PPA1, TK1
## BIRC5, GINS2, MKI67, GDI2, COTL1, PTTG1, ABRACL, CORO1B, PRDX1, KIFC1
## CD2, MS4A7, NDUFA12, RAD51, CENPM, MAL, CTD-2006K23.1, STMN1, WARS, AURKB
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 15:46:40 UMAP embedding parameters a = 0.9922 b = 1.112
## 15:46:40 Read 2700 rows and found 10 numeric columns
## 15:46:40 Using Annoy for neighbor search, n_neighbors = 30
## 15:46:40 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 15:46:40 Writing NN index file to temp file /var/folders/7b/0f8_3rj93qncp2yfx4hgr41r0000gq/T//RtmpB1RS4Z/file4ffb54fb0b4e
## 15:46:40 Searching Annoy index using 1 thread, search_k = 3000
## 15:46:40 Annoy recall = 100%
## 15:46:41 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 15:46:41 Initializing from normalized Laplacian + noise (using RSpectra)
## 15:46:41 Commencing optimization for 500 epochs, with 110142 positive edges
## 15:46:41 Using rng type: pcg
## 15:46:43 Optimization finished
Plot UCell scores in UMAP space
FeaturePlot(pbmc, features=c("T_cell_UCell","B_cell_UCell")) & theme(aspect.ratio = 1)
## Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
## Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.