vignettes/cdQuPath.Rmd
cdQuPath.Rmd
This package presents a fundamental workflow for spatial phenotyping, cell type annotation, multi-nucleus detection, quality control, and calculating cell neighborhoods from whole slide images of ultra-high multiplexed tissue captured by ‘Akoya Biosciences’ assays. The workflow leverages QuPath, StarDist, Cellpose, Seurat, and hoodscanR.
The upstream code are available in the folder of
extdata
. The detailed documentation will be provide in
following sections.
The cdQuPath aims to provide a easy to use tool for
downstream analysis with Seurat
and hoodscanR
for quality control and neighborhoods analysis.
Here is an example using cdQuPath.
First, please download and install QuPath (v0.4.0 or newer). The cell
segmentation requires the cellpose
and stardist
extension for QuPath
. Don’t forget to set the Python path
for the Cellpose
extension.
Then, download a copy of the Groovy script to your local machine.
There are two scripts available:
cellpose_stardist_multiNucleis_underGUI.groovy
and
ExportCellDetectionMeasurement.groovy
.
Then, install cdQuPath and other packages required to run the examples.
library(BiocManager)
BiocManager::install("jianhong/cdQuPath")
This step involves merging the results from Cellpose and Stardist analyses.
Navigate to line 25 of the
cellpose_stardist_multiNucleis_underGUI.groovy
script and
update the path of the stardistModel
variable to match the
file path of your cell_seg_model.pb
. Adjust the parameters
from line
26 to line 38 accordingly. You may want to change the
cytoChannels
and measurementPercentileCutoff
according to your observations. The distanceCutoff
is the
maximal cutoff distance of nucleus for multiple nucleus cells.
Then, select a region in the opened TIFF view window in QuPath. Start with a small region when testing the script, and expand it once you achieve the desired results. Please note that if you observe regional biases, you may want to exclude the biases regions. If it is un-avoidable, you may want to normalize the signals. We will discuss this later.
Finally, run the
cellpose_stardist_multiNucleis_underGUI.groovy
script by
clicking the Run button in the Script Editor.
The script ExportCellDetectionMeasurement.groovy
is
designed to export cell segmentation data. Upon export, two files are
generated and saved in a folder prefixed with
measurementsExport
.
.tsv
file is compatible with
createSeuratObj.R
..geojson
file is intended for reloading into
QuPath
.These exported files contain comprehensive information, including cell area, locations, marker signal statistics, and nucleus classification. The cell locations are particularly useful for neighborhood analysis.
tsv
file.
tsv <- system.file("extdata", "sample.qptiff.tsv.gz",
package = "cdQuPath",
mustWork = TRUE)
seu <- createSeuratObj(tsv, useValue = 'Median',
markerLocations = CodexPredefined$markerLocations)
seu
## An object of class Seurat
## 600 features across 1093 samples within 25 assays
## Active assay: RNA (24 features, 0 variable features)
## 1 layer present: counts
## 24 other assays present: Nucleus_Mean, Nucleus_Median, Nucleus_Min, Nucleus_Max, Nucleus_Std.Dev., Nucleus_Variance, Cytoplasm_Mean, Cytoplasm_Median, Cytoplasm_Min, Cytoplasm_Max, Cytoplasm_Std.Dev., Cytoplasm_Variance, Membrane_Mean, Membrane_Median, Membrane_Min, Membrane_Max, Membrane_Std.Dev., Membrane_Variance, Cell_Mean, Cell_Median, Cell_Min, Cell_Max, Cell_Std.Dev., Cell_Variance
## 1 dimensional reduction calculated: pos
The conventional method for evaluating dynamic range involves computing a signal-to-background (SNR) ratio. This is done by dividing the average intensity of the top 20 brightest cells by the average intensity of the weakest 10% of cells. An SNR value of 10 or higher is indicative of reliable image analysis. It is recommended that the SNR falls within the range of (10, Inf], typically exceeding 100. An SNR below 3 suggests that the antibodies are performing poorly.
## please make sure the default assay is 'RNA'
DefaultAssay(seu)
## [1] "RNA"
dr <- dynamicRanges(seu)
dr
## $brightest
## DAPI.01 CD163 CD127 sox9 CD14 CD4 mpo cd68
## 109.625 13.300 11.600 24.300 26.875 52.150 10.000 238.750
## cd45 foxp3 lcp1 pd1 cd8 hif1a ki67 cd19
## 19.700 8.750 18.350 7.400 10.525 30.575 80.625 17.650
## cd206 znf90 asma cd3 cd86 hist3h2a cd34 CD11c
## 12.550 34.850 20.875 2.150 11.925 105.600 50.000 42.000
##
## $weakest
## DAPI.01 CD163 CD127 sox9 CD14 CD4 mpo cd68
## 47.686992 2.148882 2.790650 0.000000 2.899390 5.625508 3.911585 69.067073
## cd45 foxp3 lcp1 pd1 cd8 hif1a ki67 cd19
## 4.411077 3.632622 6.510163 2.408537 3.322663 2.966463 3.584858 10.713415
## cd206 znf90 asma cd3 cd86 hist3h2a cd34 CD11c
## 6.490854 17.210874 0.000000 0.000000 9.333841 23.114329 10.100610 23.494411
##
## $dynamic_range
## DAPI.01 CD163 CD127 sox9 CD14 CD4 mpo cd68
## 2.298845 6.189265 4.156737 Inf 9.269190 9.270274 2.556508 3.456785
## cd45 foxp3 lcp1 pd1 cd8 hif1a ki67 cd19
## 4.466029 2.408728 2.818670 3.072405 3.167640 10.306886 22.490432 1.647467
## cd206 znf90 asma cd3 cd86 hist3h2a cd34 CD11c
## 1.933490 2.024883 Inf Inf 1.277609 4.568595 4.950196 1.787659
## markers may not work;
## Please note that DAPI.01 is used for cell detection.
avoid_markers <-
names(dr$dynamic_range)[dr$dynamic_range<3 | is.na(dr$dynamic_range)]
avoid_markers
## [1] "DAPI.01" "mpo" "foxp3" "lcp1" "cd19" "cd206" "znf90"
## [8] "cd86" "CD11c"
Verify the cell area and nucleus area for multi-nuclei.
# Visualize QC metrics as a violin plot
colnames(seu[[]])
## [1] "orig.ident" "nCount_RNA"
## [3] "nFeature_RNA" "Cell.classification"
## [5] "Cell.X" "Cell.Y"
## [7] "Nucleus.Area.um.2" "Nucleus.Length.um"
## [9] "Nucleus.Circularity" "Nucleus.Solidity"
## [11] "Nucleus.Max.diameter.um" "Nucleus.Min.diameter.um"
## [13] "Cell.Area.um.2" "Cell.Length.um"
## [15] "Cell.Circularity" "Cell.Solidity"
## [17] "Cell.Max.diameter.um" "Cell.Min.diameter.um"
## [19] "Nucleus.Cell.area.ratio" "Cell.isMultiNucleis"
VlnPlot(seu,
features = c("Cell.Area.um.2",
"Nucleus.Area.um.2",
'Nucleus.Cell.area.ratio'),
split.by = 'Cell.isMultiNucleis',
pt.size = ifelse(ncol(seu)>500, 0, 1),
ncol = 3)
## if you want to save the plot, try
## ggsave(filename)
VlnPlot(seu, features = c('cd68', 'CD11c', 'cd34'),
pt.size = ifelse(ncol(seu)>500, 0, 1),
ncol = 3)
Now, we can proceed with filtering the cells. Upon observation, we notice that the average area of single-nucleus cells is below 500. Therefore, we establish the cutoff for cell area at approximately three times the size of single-nucleus cells. Additionally, if the nucleus-to-cell area ratio is excessively high, it suggests potential issues with the cell area detection process. We will filter out those cells.
keep <- seu$Cell.Area.um.2<1500 & seu$Nucleus.Cell.area.ratio < .9
seu <- subset(seu,
cells = colnames(seu)[keep],
features = names(dr$dynamic_range)[!is.na(dr$dynamic_range)])
seu
## An object of class Seurat
## 600 features across 1088 samples within 25 assays
## Active assay: RNA (24 features, 0 variable features)
## 1 layer present: counts
## 24 other assays present: Nucleus_Mean, Nucleus_Median, Nucleus_Min, Nucleus_Max, Nucleus_Std.Dev., Nucleus_Variance, Cytoplasm_Mean, Cytoplasm_Median, Cytoplasm_Min, Cytoplasm_Max, Cytoplasm_Std.Dev., Cytoplasm_Variance, Membrane_Mean, Membrane_Median, Membrane_Min, Membrane_Max, Membrane_Std.Dev., Membrane_Variance, Cell_Mean, Cell_Median, Cell_Min, Cell_Max, Cell_Std.Dev., Cell_Variance
## 1 dimensional reduction calculated: pos
The initial query is whether we are observing abnormally high or low intensity regions in specific sub-regions. If these variations are deemed unrelated to tissue-specific characteristics, it might be advisable to normalize each marker channel accordingly. The positional data of cells is contained within the reduction matrix labeled ‘pos’.
nrow(seu)
## [1] 24
suppressMessages(
FeaturePlot(seu, slot='count',
features = rownames(seu)[1],
reduction = 'pos') + scale_y_reverse()
)
You may attempt to normalize the irregular sub-regions using LOESS
(Local Polynomial Regression Fitting) with the loessSmooth
function.
The next step involves cell-wise normalization. The available
normalization methods are ‘LogNormalize’, ‘CLR’ (centered log ratio),
‘RC’ (CPM), and ‘zscore’, which can be applied using the
normData
function. The normalization method except for
‘zscore’ is borrowed from Seurat
package.
seu <- normData(seu, normalizationMethod='zscore')
In the background, this step will execute the
FindVariableFeatures
function and exclude the
‘avoidMarkers’ from the list of variable markers. Please ensure that
‘nFeatures’ is smaller than the total number of available markers.
seu <- findVarMarkers(
seu,
avoidMarkers=c('DAPI.01', 'hist3h2a'),
nFeatures = 10L)
VariableFeatures(seu)
## [1] "ki67" "CD4" "cd34" "hif1a" "mpo" "znf90" "cd19" "CD127" "CD14"
plot1 <- VariableFeaturePlot(seu)
LabelPoints(plot = plot1, points = VariableFeatures(seu), repel = TRUE)
The following steps outline the standard process of single-cell
analysis in the Seurat
package. You may adjust the process
accordingly to suit your needs.
all.markers <- rownames(seu)
seu <- ScaleData(seu, features = all.markers)
seu <- RunPCA(seu, features = VariableFeatures(seu))
dim <- seq.int(ncol(Embeddings(seu$pca)))
VizDimLoadings(seu, dims = dim,
nfeatures=length(VariableFeatures(seu)),
reduction = "pca")
seu <- FindNeighbors(seu, dims = dim)
seu <- FindClusters(seu, resolution = 0.9)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 1088
## Number of edges: 32836
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7546
## Number of communities: 10
## Elapsed time: 0 seconds
seu <- RunUMAP(seu, reduction = "pca", dims = dim)
pt.size <- ifelse(ncol(seu)>500, 0, 2)
DimPlot(seu,
reduction = 'umap', label = TRUE,
pt.size = pt.size) + scale_y_reverse()
## check the distribution of the clusters in spatial position.
ncol <- ceiling(sqrt(length(unique(seu$seurat_clusters))))
DimPlot(seu,
reduction = 'pos', label = FALSE,
split.by = 'seurat_clusters',
ncol = ncol, pt.size = pt.size) + scale_y_reverse()
## check the nucleis distribution
dittoBarPlot(seu,
var = 'Cell.isMultiNucleis',
group.by='seurat_clusters')
DimPlot(seu, reduction = 'pos', label = TRUE,
split.by = 'Cell.isMultiNucleis',
pt.size = pt.size) + scale_y_reverse()
VlnPlot(seu,
features = c('Nucleus.Area.um.2',
'Cell.Area.um.2',
'Nucleus.Cell.area.ratio'),
pt.size = pt.size)
## Dot plot
markers <- unique(FindAllMarkers(seu)$gene)
DotPlot(seu, features = markers, scale = FALSE) + coord_flip()
In the downstream analysis, cell types are detected by fitting a multiple-class Gaussian Mixture Model (GMM) to the re-scaled data. The raw intensities are converted to probabilities, and a cutoff of 0.5 is set for cell type assignment.
Please run fitGMM
for each sub-region of the image if
the background of the sub-regions are not identical.
library(future.apply)
plan(multisession) # for multiple process.
set.seed(123) ## for model fit
seu <- fitGMM(seu)
DotPlot(seu, features = markers, assay = 'GMM', scale = FALSE) + coord_flip()
The correlation heatmap will display the signal similarity among
different markers. The distance will be calculated using the
stats::dist
function. Users can experiment with different
inputs such as z-scales, GMM data, and so on.
DefaultAssay(seu)
## [1] "RNA"
p <- markerCorrelation(seu)
names(p)
## [1] "counts" "data" "scale.data"
## plot the correlations for data, here it is Z-scale data
p[['data']]
## use GMM rescaled data to plot the correlation
DefaultAssay(seu) <- 'GMM'
p_GMM <- markerCorrelation(seu)
p_GMM[['data']]
##
DefaultAssay(seu) <- 'RNA'
The phenotype will utilize the re-scaled data obtained through GMM. Two methods are available for assigning cell types: ‘Rank’ and ‘Boolean’. With the ‘Rank’ option, cell types are assigned based on the rank of probabilities for the positive markers. On the other hand, the ‘Boolean’ option assigns cell types according to the order of the classifier for the positive markers.
The metadata ‘celltype’ stores the assigned cell types for each Seurat cluster. Additionally, the metadata ‘celltype_all’ contains all possible cell types for each cell, determined by a cutoff of 0.5. Finally, the metadata ‘celltype_first’ represents the initial assignment.
seu <- detectCellType(seu)
table(seu$celltype)
##
## B CELLS Cytotoxic T Cells Endothelial
## 178 310 87
## M1 Macrophage MONOCYTE Proliferation Marker
## 318 47 148
## check the cell type annotation by clusters
DimPlot(seu, group.by = 'celltype')
DimPlot(seu, group.by = 'celltype', reduction = 'pos') + scale_y_reverse()
dittoBarPlot(seu,
var = 'celltype',
group.by='Cell.isMultiNucleis')
## check the first annotation of each cell, not restricted by seurat_cluster
DimPlot(seu, group.by = 'celltype_first', reduction = 'pos') + scale_y_reverse()
dittoBarPlot(seu,
var = 'celltype_first',
group.by='Cell.isMultiNucleis')
## check the signals for each celltype
Idents(seu) <- 'celltype'
DotPlot(seu, features = rownames(seu), scale = FALSE) + coord_flip() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
## export the celltype
ct <- data.frame(Object.ID=colnames(seu),
x=seu$Cell.X,
y=seu$Cell.Y,
CellType=seu$celltype)
head(ct, n=2)
## Object.ID
## 8d5ccdac-31da-4930-81b6-dd9f272c10d7 8d5ccdac-31da-4930-81b6-dd9f272c10d7
## b21deea4-2d9c-4e0b-9c83-591622b33a9a b21deea4-2d9c-4e0b-9c83-591622b33a9a
## x y CellType
## 8d5ccdac-31da-4930-81b6-dd9f272c10d7 13142.86 8012.715 M1 Macrophage
## b21deea4-2d9c-4e0b-9c83-591622b33a9a 13142.86 8012.715 M1 Macrophage
## if you see postfix like _1, then run
ct$Object.ID <- sub('_\\d+$', '', ct$Object.ID)
## output to same folder: celltype_csv <- sub('qptiff.tsv', 'celltype.csv', tsv)
celltype_csv <- tempfile(fileext = '.csv')
write.csv(ct, file = celltype_csv, row.names = FALSE)
## after exporting, you can load the celltypes back to QuPath by running the script load_celltype.groovy in the scripts folder.
If you have datasets from multiple sub-regions, you can process each one individually as shown above and then merge them. Occasionally, some images may not have the same background across different regions. To address this, you can detect the cells in each sub-region with a similar background or baseline and then perform GMM rescaling for each sub-region. Once this is done, you can combine the analyzed results to create a merged dataset.
tsv2 <- system.file("extdata", "sample2.qptiff.tsv.gz",
package = "cdQuPath",
mustWork = TRUE)
seu2 <- createSeuratObj(tsv2, useValue = 'Median',
markerLocations = CodexPredefined$markerLocations)
seu2
## An object of class Seurat
## 600 features across 282 samples within 25 assays
## Active assay: RNA (24 features, 0 variable features)
## 1 layer present: counts
## 24 other assays present: Nucleus_Mean, Nucleus_Median, Nucleus_Min, Nucleus_Max, Nucleus_Std.Dev., Nucleus_Variance, Cytoplasm_Mean, Cytoplasm_Median, Cytoplasm_Min, Cytoplasm_Max, Cytoplasm_Std.Dev., Cytoplasm_Variance, Membrane_Mean, Membrane_Median, Membrane_Min, Membrane_Max, Membrane_Std.Dev., Membrane_Variance, Cell_Mean, Cell_Median, Cell_Min, Cell_Max, Cell_Std.Dev., Cell_Variance
## 1 dimensional reduction calculated: pos
keep <- seu2$Cell.Area.um.2<1500 & seu2$Nucleus.Cell.area.ratio < .9
table(keep)
## keep
## FALSE TRUE
## 1 281
seu2 <- subset(seu2,
cells = colnames(seu2)[keep])
seu2 <- normData(seu2, normalizationMethod='zscore')
seu2 <- findVarMarkers(
seu2,
avoidMarkers=c('DAPI.01', 'hist3h2a'),
nFeatures = 10L)
seu2 <- ScaleData(seu2, features = rownames(seu2))
seu2 <- RunPCA(seu2, features = VariableFeatures( seu2))
dim <- seq.int(ncol(Embeddings(seu2$pca)))
seu2 <- FindNeighbors(seu2, dims = dim)
seu2 <- FindClusters(seu2, resolution = 0.9)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 281
## Number of edges: 8053
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.6137
## Number of communities: 5
## Elapsed time: 0 seconds
seu2 <- fitGMM(seu2)
seu2 <- detectCellType(seu2)
seu2$pca <- NULL # to advoid different pca columns error.
seu$orig.ident <- 'region1'
seu2$orig.ident <- 'region2'
mergedSeu <- merge(x=seu, y=list(seu2), merge.dr = TRUE)
mergedSeu
## An object of class Seurat
## 624 features across 1369 samples within 26 assays
## Active assay: RNA (24 features, 8 variable features)
## 6 layers present: counts.1, counts.2, data.1, scale.data.1, data.2, scale.data.2
## 25 other assays present: Nucleus_Mean, Nucleus_Median, Nucleus_Min, Nucleus_Max, Nucleus_Std.Dev., Nucleus_Variance, Cytoplasm_Mean, Cytoplasm_Median, Cytoplasm_Min, Cytoplasm_Max, Cytoplasm_Std.Dev., Cytoplasm_Variance, Membrane_Mean, Membrane_Median, Membrane_Min, Membrane_Max, Membrane_Std.Dev., Membrane_Variance, Cell_Mean, Cell_Median, Cell_Min, Cell_Max, Cell_Std.Dev., Cell_Variance, GMM
## 3 dimensional reductions calculated: pos, pca, umap
Assays(mergedSeu)
## An object of class "SimpleAssays"
## Slot "data":
## List of length 1
p <- dittoBarPlot(mergedSeu,
var = 'celltype_first',
group.by='orig.ident')
p
## pie chart
plotData <- p$data %>%
dplyr::arrange(desc(label)) %>%
dplyr::group_by(grouping) %>%
dplyr::mutate(y_pos = cumsum(percent) - 0.5*percent)
ggplot(plotData, aes(x="", y = percent, fill = label)) +
geom_bar(stat="identity", width = 1) +
geom_text(aes(x=1.7, y=y_pos,
label=paste0(round(percent*100, digits = 1), '%')),
color = 'black') +
coord_polar(theta = 'y') +
facet_wrap(~grouping) +
theme_void() +
scale_fill_manual(values = dittoColors())
FeaturePlot(mergedSeu, slot='count',
features = rownames(mergedSeu)[1],
reduction = 'pos',
ncol = 2) + scale_y_reverse()
DimPlot(mergedSeu, group.by = 'celltype', reduction = 'pos') + scale_y_reverse()
## use GMM rescaled data to plot the correlation
DefaultAssay(mergedSeu) <- 'GMM'
p_GMM <- markerCorrelation(mergedSeu, layers = 'data')
p_GMM[['data']]
##
DefaultAssay(mergedSeu) <- 'RNA'
The simplest classifier is a list of markers.
CodexPredefined$classifier
## Helper T cells M1 Macrophage Cartilage tumor cell
## "CD4" "cd68" "sox9"
## Proliferation Marker Tumor/Myeloid MONOCYTE
## "ki67" "lcp1" "CD14"
## M2 MACROPHAGE M2 MACROPHAGE HSC
## "cd206" "CD163" "cd45"
## Cytotoxic T Cells mDC/cDC - Dendritic cells B CELLS
## "cd8" "CD11c" "cd19"
## Endothelial Pericyte/Fibroblast
## "cd34" "asma"
You can also try to detect the cell type by multiple markers with weight for each celltype.
CodexPredefined$classifier2
## $positive
## $positive$`B cell`
## $positive$`B cell`$symbol
## [1] "cd45" "cd19"
##
## $positive$`B cell`$weight
## [1] 0.5 0.5
##
##
## $positive$`T cell`
## $positive$`T cell`$symbol
## [1] "CD4"
##
## $positive$`T cell`$weight
## [1] 1
##
##
## $positive$CD4_TFH
## $positive$CD4_TFH$symbol
## [1] "CD4" "pd1"
##
## $positive$CD4_TFH$weight
## [1] 0.5 0.5
##
##
## $positive$`T-reg`
## $positive$`T-reg`$symbol
## [1] "CD4" "foxp3" "pd1"
##
## $positive$`T-reg`$weight
## [1] 0.3333333 0.3333333 0.3333333
##
##
## $positive$`CD8 T-cell`
## $positive$`CD8 T-cell`$symbol
## [1] "cd8"
##
## $positive$`CD8 T-cell`$weight
## [1] 1
##
##
## $positive$`Dendritic cell`
## $positive$`Dendritic cell`$symbol
## [1] "CD11c"
##
## $positive$`Dendritic cell`$weight
## [1] 1
##
##
## $positive$NK
## $positive$NK$symbol
## [1] "cd57"
##
## $positive$NK$weight
## [1] 1
##
##
## $positive$Macrophage
## $positive$Macrophage$symbol
## [1] "cd68"
##
## $positive$Macrophage$weight
## [1] 1
##
##
## $positive$`M2 macrophage`
## $positive$`M2 macrophage`$symbol
## [1] "cd68" "CD163"
##
## $positive$`M2 macrophage`$weight
## [1] 0.5 0.5
##
##
## $positive$HSC
## $positive$HSC$symbol
## [1] "cd45"
##
## $positive$HSC$weight
## [1] 1
##
##
## $positive$`cd45+lcp1`
## $positive$`cd45+lcp1`$symbol
## [1] "cd45" "lcp1"
##
## $positive$`cd45+lcp1`$weight
## [1] 0.5 0.5
##
##
## $positive$Monocyte
## $positive$Monocyte$symbol
## [1] "CD14"
##
## $positive$Monocyte$weight
## [1] 1
##
##
## $positive$Stromal
## $positive$Stromal$symbol
## [1] "cd34" "asma"
##
## $positive$Stromal$weight
## [1] 0.5 0.5
##
##
## $positive$Tumour
## $positive$Tumour$symbol
## [1] "sox9" "ki67"
##
## $positive$Tumour$weight
## [1] 0.5 0.5
##
##
##
## $negative
## $negative$Tumour
## $negative$Tumour$symbol
## [1] "cd45"
##
## $negative$Tumour$weight
## [1] 1
mergedSeu <- detectCellType(mergedSeu,
classifier = CodexPredefined$classifier2,
celltypeColumnName = 'celltype2')
dittoBarPlot(mergedSeu,
var = 'celltype2',
group.by='Cell.isMultiNucleis')
You can also try to detect the cell type by a generalized k-Nearest Neighbors Classifier.
classifier <- readRDS(system.file('extdata', 'classifier.CSA.res0.3.rds',
package='cdQuPath'))
## this request lots of memory.
mergedSeu <- detectCellType(mergedSeu,
classifier=classifier,
celltypeColumnName='celltypeByKNN')
table(mergedSeu$celltypeByKNN)
##
## Chondroblastic Endothelial Fibroblastic Mast cells Myeloid
## 0 226 5 661 30
## Osteoblastic Osteoclastic Pericytes Proliferating T/NK cells
## 37 221 1 27 161
dittoBarPlot(mergedSeu,
var = 'celltypeByKNN',
group.by='orig.ident')
Below is sample code to create a classifier for your data and align single-cell RNA-seq data with Codex data.
Codex data has a limited number of cell markers compared to the 10,000+ features detected by single-cell RNA-seq. To align the scRNA-seq matrix with the Codex matrix, it is crucial to minimize the effects of sparsity, noise, and collinearity among individual cells. This is achieved by performing distance calculations using pseudobulk analysis, which aggregates gene expression from individual cells into group-level values. By treating each group as a single cell, this approach reduces heterogeneity within cell clusters.
The resulting alignment matrix enables the prediction of transcriptomes using Codex spatial information.
## read the protein-gene map from the Codex (protein level) to
## transcriptome (gene level)
markers <- readRDS(system.file('extdata', 'markers.name.map.rds',
package='cdQuPath'))
tail(markers)
## $asma
## [1] "ACTA2"
##
## $ki67
## [1] "MKI67"
##
## $cd19
## [1] "CD19"
##
## $cd3
## [1] "CD3E" "CD3D" "CD3G"
##
## $cd34
## [1] "CD34"
##
## $CD11c
## [1] "ITGAX"
## load the scRNA-seq data
# scRNAseqSeu <- readRDS('int.rds')
## do cluster, set higher resolution to get more class
# scRNAseqSeu <- FindNeighbors(scRNAseqSeu, dims = 1:20)
## set a big resolution to split the cells into a big number of clusters.
# scRNAseqSeu <- FindClusters(scRNAseqSeu, resolution = 100)
## create a pseudobulking for each cluster
# pseudo <- AggregateExpression(scRNAseqSeu, assays = "RNA",
# return.seurat = TRUE,
# group.by = c("seurat_clusters"))
pseudo <- readRDS(system.file('extdata', 'pseudobulk.scRNAseq.rds',
package='cdQuPath'))
library(future.apply)
plan(multisession)
## will cost several mins
classifier <- buildClassifierFromSeurat(pseudo,
markerList = markers)
alignedCodex <- alignScRNAseqToCodex(codexSeu = mergedSeu,
scSeu = pseudo,
classifier = classifier)
## plot any gene expression with DarkTheme()
FeaturePlot(alignedCodex, slot='data',
features = rownames(pseudo)[1],
reduction = 'pos',
ncol = 2) + scale_y_reverse() + DarkTheme()
Here we will show the sample code to detect the cell types by CELESTA.
if (!require("CELESTA", quietly = TRUE)){
BiocManager::install("plevritis-lab/CELESTA")
}
library(CELESTA)
library(zeallot)
library(spdep)
prior_marker_info <- matrix(0,
nrow = length(CodexPredefined$classifier2$positive),
ncol = nrow(mergedSeu),
dimnames = list(
names(CodexPredefined$classifier2$positive),
rownames(mergedSeu)
))
for(i in seq_along(CodexPredefined$classifier2$positive)){
prior_marker_info[names(CodexPredefined$classifier2$positive)[i],
CodexPredefined$classifier2$positive[[i]]$symbol[
CodexPredefined$classifier2$positive[[i]]$symbol %in%
colnames(prior_marker_info)
]] <- 1
}
## reorder the celltype by lineage level
prior_marker_info <- prior_marker_info[
c('Tumour', 'Stromal', 'HSC',
'NK', 'B cell', 'T cell', 'Monocyte',
'CD4_TFH', 'CD8 T-cell', 'T-reg',
'Macrophage', 'M2 macrophage', 'cd45+lcp1', 'Dendritic cell'), ]
prior_marker_info <- cbind(
celltype = rownames(prior_marker_info),
Lineage_level=paste(c(1, 1, 1,
2, 2, 2, 2,
3, 3, 3,
3, 3, 3, 3), ## lineage level
c(0, 0, 0,
3, 3, 3, 3,
6, 6, 6,
7, 7, 7, 7), ## lineage ancestor
c(1, 2, 3,
4, 5, 6, 7,
8, 9, 10,
11, 12, 13, 14), ## index
sep='_'),
data.frame(prior_marker_info))
mergedSeu <- JoinLayers(mergedSeu)
imageData <- GetAssayData(mergedSeu, assay = 'RNA', layer = 'data')
imageData <- cbind(mergedSeu[[]], t(imageData))
colnames(imageData)[colnames(imageData)=='Cell.X'] <- 'X'
colnames(imageData)[colnames(imageData)=='Cell.Y'] <- 'Y'
CelestaObj <- CreateCelestaObject(
project_title = "test",
prior_marker_info = prior_marker_info,
imaging_data = imageData)
CelestaObj <- FilterCells(
CelestaObj,
high_marker_threshold=0.9,
low_marker_threshold=0.1)
CelestaObj <- AssignCells(
CelestaObj,
max_iteration=10,
cell_change_threshold=0.01)
mergedSeu$celltypeCELESTA <-
CelestaObj@final_cell_type_assignment[, 'Final cell type']
dittoBarPlot(mergedSeu,
var = 'celltypeCELESTA',
group.by='orig.ident')
After annotating cell types, we can assess whether these annotations
exhibit spatial enrichment and conduct analyses of cellular
neighborhoods throughout the tissue. Here we borrow the power of
hoodscanR
to identifying interactions between spatial
communities.
clusterK <- 12
spe <- neighborhoodsAna(mergedSeu, ## if no merging step, use `seu`.
clusterK = clusterK,
k = 10, ## maximal 10 neightbors
anno_col = 'celltype_first')
# Plot the metrics for probability matrix
plotTissue(spe, color = entropy) +
scale_color_gradientn(colors=CodexPredefined$heatColor)
plotTissue(spe, color = perplexity) +
scale_color_gradientn(colors=CodexPredefined$heatColor)
# Plot the clusters for the probability matrix
plotTissue(spe, color = clusters)
# Plot heatmap for neighbourhood analysis
plotColocal(spe, pm_cols = unique(spe$celltype_first))
# Plot probability distribution
plotProbDist(spe, pm_cols = unique(spe$celltype_first),
by_cluster = TRUE, plot_all = TRUE,
show_clusters = as.character(seq(clusterK)))
The Jaccard Index is a statistic used for gauging the similarity and diversity of sample sets. The Jaccard Index is defined as: .
This method are based on simple yet intuitive measures, such as counting the number of nearest neighbors of every cell and inspecting the proportion of different cell categories in a given Euclidean (2 norm aka ) distance (, the shortest distance).
## here we set two times of cell length as maximal gaps to check
## the cell co-location
maxgap <- max(mergedSeu$Cell.Length.um) * 2
ji <- JaccardIndex(mergedSeu, maxRadius = maxgap, anno_col='celltype_first')
plotJaccardIndex(ji)
The co-occurrence score gives us an indication on whether given two factors such as cell cluster, cell type, and etc, co-occur with each other at specific distances across the tissue. The co-occurrence score is defined as: .
maxgap <- max(mergedSeu$Cell.Length.um) * 50
## precalculate cell distance to save time because we will use same maxgap for
## both coOccurrence and spatialScore
celldist <- findNearCellsByRadius(
seu=mergedSeu,
maxRadius=maxgap)
cooc <- coOccurrence(mergedSeu, maxRadius=maxgap, bin=50,
anno_col='celltype',
celldist=celldist)
ggplot(data.frame(x=seq.int(nrow(cooc))*20-10, y=cooc[, 'cell proportion']),
aes(x, y)) + geom_line() + theme_classic() +
labs(x='distance', y='proportion')
plotData <- reshape2::melt(cooc[, -1])
ggplot(plotData, aes(x=Var1, y=Var2, fill = value))+
geom_tile() + labs(x='bin', y='') +
scale_fill_gradient(low = "white", high = "red")
The p-value of co-occurence can be calculated based on permutation test or a more rigorous framework, the homogeneous multitype Poisson point process (PPP) or complete spatial randomness and independence (CSRI) by picking a specific value of the radius (r). To perform the analysis, you can refer to R-package SpicyR or SpaceANOVA.
The spatial score define the distance ratio among three factors. The spatial score is defined as: . The high value indicates the left celltype is closer than the right celltype.
## need to increase the neighbors number
ss <- spatialScore(mergedSeu,
celltypeCenter = "Macrophage",
celltypeLeft = "T cell",
celltypeRight = "Monocyte",
anno_col = 'celltype_first',
maxRadius=maxgap,
celldist=celldist)
head(ss)
## cell dist.L dist.R spatialScore
## 1 8d5ccdac-31da-4930-81b6-dd9f272c10d7 68.92733 474.2253 6.880076
## 2 b21deea4-2d9c-4e0b-9c83-591622b33a9a 68.92733 474.2253 6.880076
## 3 64279019-198a-4a87-a829-3f67c19e3e97 68.92733 474.2253 6.880076
## 4 f61663dd-f8a0-4da1-8bae-dcf4146032b3 68.92733 474.2253 6.880076
## 5 78a932ae-bb7c-428c-8329-eef8c1f29c18 68.92733 474.2253 6.880076
## 6 a5b179c6-c0fc-4952-89c6-cfc942a135a3 78.08983 352.7442 4.517159
boxplot(ss$spatialScore)
R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Ubuntu 22.04.4 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC tzcode source: system (glibc)
attached base packages: [1] stats4 stats graphics grDevices utils
datasets methods
[8] base
other attached packages: [1] future.apply_1.11.2 future_1.34.0
[3] dplyr_1.1.4 SpatialExperiment_1.15.1
[5] SingleCellExperiment_1.27.2 SummarizedExperiment_1.35.1 [7]
Biobase_2.65.1 GenomicRanges_1.57.1
[9] GenomeInfoDb_1.41.1 IRanges_2.39.2
[11] S4Vectors_0.43.2 BiocGenerics_0.51.1
[13] MatrixGenerics_1.17.0 matrixStats_1.4.1
[15] hoodscanR_1.3.3 dittoSeq_1.17.0
[17] ggplot2_3.5.1 Seurat_5.1.0
[19] SeuratObject_5.0.2 sp_2.1-4
[21] cdQuPath_0.99.0.8 BiocStyle_2.33.1
loaded via a namespace (and not attached): [1] RcppAnnoy_0.0.22
splines_4.4.1 later_1.3.2
[4] tibble_3.2.1 polyclip_1.10-7 fastDummies_1.7.4
[7] lifecycle_1.0.4 doParallel_1.0.17 globals_0.16.3
[10] lattice_0.22-6 MASS_7.3-61 magrittr_2.0.3
[13] plotly_4.10.4 sass_0.4.9 rmarkdown_2.28
[16] jquerylib_0.1.4 yaml_2.3.10 httpuv_1.6.15
[19] sctransform_0.4.1 spam_2.10-0 spatstat.sparse_3.1-0
[22] reticulate_1.39.0 cowplot_1.1.3 pbapply_1.7-2
[25] RColorBrewer_1.1-3 abind_1.4-5 zlibbioc_1.51.1
[28] Rtsne_0.17 purrr_1.0.2 circlize_0.4.16
[31] GenomeInfoDbData_1.2.12 ggrepel_0.9.6 irlba_2.3.5.1
[34] listenv_0.9.1 spatstat.utils_3.1-0 pheatmap_1.0.12
[37] goftest_1.2-3 RSpectra_0.16-2 spatstat.random_3.3-1
[40] fitdistrplus_1.2-1 parallelly_1.38.0 pkgdown_2.1.0
[43] leiden_0.4.3.1 codetools_0.2-20 DelayedArray_0.31.11
[46] shape_1.4.6.1 tidyselect_1.2.1 UCSC.utils_1.1.0
[49] farver_2.1.2 spatstat.explore_3.3-2 jsonlite_1.8.8
[52] GetoptLong_1.0.5 e1071_1.7-14 progressr_0.14.0
[55] iterators_1.0.14 ggridges_0.5.6 survival_3.7-0
[58] systemfonts_1.1.0 foreach_1.5.2 Rmixmod_2.1.10
[61] tools_4.4.1 ragg_1.3.3 ica_1.0-3
[64] Rcpp_1.0.13 glue_1.7.0 gridExtra_2.3
[67] SparseArray_1.5.34 xfun_0.47 withr_3.0.1
[70] BiocManager_1.30.25 fastmap_1.2.0 fansi_1.0.6
[73] digest_0.6.37 R6_2.5.1 mime_0.12
[76] textshaping_0.4.0 colorspace_2.1-1 scattermore_1.2
[79] tensor_1.5 spatstat.data_3.1-2 utf8_1.2.4
[82] tidyr_1.3.1 generics_0.1.3 data.table_1.16.0
[85] class_7.3-22 httr_1.4.7 htmlwidgets_1.6.4
[88] S4Arrays_1.5.7 uwot_0.2.2 pkgconfig_2.0.3
[91] scico_1.5.0 gtable_0.3.5 ComplexHeatmap_2.21.0
[94] lmtest_0.9-40 XVector_0.45.0 htmltools_0.5.8.1
[97] dotCall64_1.1-1 bookdown_0.40 clue_0.3-65
[100] scales_1.3.0 png_0.1-8 spatstat.univar_3.0-1
[103] knitr_1.48 reshape2_1.4.4 rjson_0.2.22
[106] nlme_3.1-166 GlobalOptions_0.1.2 proxy_0.4-27
[109] cachem_1.1.0 zoo_1.8-12 stringr_1.5.1
[112] KernSmooth_2.23-24 parallel_4.4.1 miniUI_0.1.1.1
[115] desc_1.4.3 pillar_1.9.0 grid_4.4.1
[118] vctrs_0.6.5 RANN_2.6.2 promises_1.3.0
[121] xtable_1.8-4 cluster_2.1.6 evaluate_0.24.0
[124] magick_2.8.4 cli_3.6.3 compiler_4.4.1
[127] rlang_1.1.4 crayon_1.5.3 labeling_0.4.3
[130] mclust_6.1.1 plyr_1.8.9 fs_1.6.4
[133] stringi_1.8.4 viridisLite_0.4.2 deldir_2.0-4
[136] munsell_0.5.1 lazyeval_0.2.2 spatstat.geom_3.3-2
[139] Matrix_1.7-0 RcppHNSW_0.6.0 patchwork_1.2.0
[142] shiny_1.9.1 highr_0.11 ROCR_1.0-11
[145] igraph_2.0.3 bslib_0.8.0