Perform Hierarchical clustering for given 3D structures.

Calculate distance for each pair of cells after alignment.

cellClusters(
  xyzs,
  TADs,
  distance_method = "NID",
  cluster_method = "ward.D2",
  rescale = TRUE,
  quite = FALSE,
  parallel = FALSE,
  ...
)

cellDistance(
  xyzs,
  TADs,
  distance_method = c("NID", "RMSD", "SRD", "DSDC", "NMI", "ARI", "AMI"),
  eps,
  k,
  rescale = TRUE,
  quite = FALSE,
  parallel = FALSE,
  ...
)

Arguments

xyzs

A list of data.frame with x, y, z coordinates or output of cellDistance.

TADs

A list of index vectors, where each vector represents a TAD. For example, if the first TAD spans the 2nd to 4th coordinates and the second spans the 8th to 10th coordinates, the list would be: list(c(2, 3, 4), c(8, 9, 10)).

distance_method

'SRD', 'DSDC', 'RMSD', 'NMI', 'ARI', 'NID', or 'AMI'. SRD method will first perform clustering and then calculate the Sequence Relabeling Distance SRD. DSDC method will calculate the Euclidean distance of SDC. RMSD method will first do alignment for each cell x, y, z coordinates and the calculate Root Mean Square Deviation (RMSD, the square root of the mean of squared Euclidean distance between corresponding points). ARI, NID, NMI, and AMI method will first perform clustering and then calculate the Adjusted Rand Index (ARI), Normalized information distance (NID), Normalized Mutal Information (NMI), Adjusted Mutual Information (AMI).

cluster_method

The agglomeration method to be used for hclust. Default is 'ward.D2'.

rescale

Re-scale the object to similar size.

quite

Print the message or not.

parallel

Run parallel by future or not.

...

not used.

eps

numeric or 'auto'. The size (radius) of the epsilon neighborhood. If eps is set, use DBSCAN to cluster the points for each cell.

k

numeric or 'auto'. The number of groups. If k is set, use hclust to cluster the points for each cell.

Value

cellClusters return an object of class hclust.

cellDistance return distance matrix as an object of 'dist'

Examples

set.seed(1)
xyzs <- lapply(seq.int(20), function(i){
  matrix(sample.int(100, 60, replace = TRUE),
   nrow=20, dimnames=list(NULL, c('x', 'y', 'z')))
})
cd <- cellDistance(xyzs, distance_method='RMSD')
cc <- cellClusters(cd)
# plot(cc)
cutree(cc, k=3)
#>  [1] 1 1 1 1 2 1 2 1 3 1 2 1 2 3 2 2 2 3 2 3
cd2 <- cellDistance(xyzs, distance_method='SRD', eps=40)