Annotate Peaks with DNA Interaction Networks Using Graph Clustering

Fast annotation of genomic peaks using DNA interaction data by building interaction networks with igraph. Peaks overlapping any node in a connected subgraph are annotated with all genes in that subgraph.

annoLinker(
  peaks,
  annoData,
  interactions,
  bindingType = c("startSite", "body", "endSite"),
  bindingRegion = c(-5000, 5000),
  cluster_method = c("components", "louvain", "walktrap", "infomap"),
  maxgap = 0,
  interactionDistanceRange = c(10000, 1e+07),
  addEvidence = FALSE,
  parallel = FALSE,
  verbose = FALSE,
  ...
)

Arguments

peaks: GRanges object containing peak regions
annoData: annoGR or GRanges object with gene annotations
interactions: GInteractions, or Pairs object with interaction data (e.g., Hi-C, ChIA-PET)
bindingType: Character, one of "startSite", "body", or "endSite"
bindingRegion: Numeric vector of length 2 defining promoter window (e.g., c(-5000, 5000))
cluster_method: Character, clustering method: "components" (connected components), "louvain", "walktrap", or "infomap"
maxgap: Integer, bp to extend interaction anchors for overlap detection (default: 0)
interactionDistanceRange: Numeric vector of length 2 defining the minimal and maximal distance of interactions. This is used to make sure the annotations are not supper far away.
addEvidence: Logical, add evidence to the metadata or not.
parallel: Logical, use future_lapply to do parallel computing or not.
verbose: Logical, print the message or not
...: Parameters for cluster. see cluster_louvain, cluster_walktrap, and cluster_infomap.

Value

An annoLinkerResult object or NULL if no annotations found

Examples

## read the peaks and interactions
library(rtracklayer)
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:annoLinker':
#> 
#>     as.data.frame
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: Seqinfo
extPath <- system.file('extdata', package='annoLinker')
peaks <- rtracklayer::import(file.path(extPath, 'peaks.bed'))
interactions <- rtracklayer::import(file.path(extPath, 'interaction.bedpe'))
library(TxDb.Drerio.UCSC.danRer10.refGene)
#> Loading required package: GenomicFeatures
#> Loading required package: AnnotationDbi
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
annoData <- genes(TxDb.Drerio.UCSC.danRer10.refGene)
#>   32 genes were dropped because they have exons located on both strands of the
#>   same reference sequence or on more than one reference sequence, so cannot be
#>   represented by a single genomic range.
#>   Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList
#>   object, or use suppressMessages() to suppress this message.
anno <- annoLinker(peaks, annoData, interactions, verbose=TRUE)
#> Step 1/4: Building interaction network graph...
#> Graph has 3753 nodes and 2500 edges in 1264 clusters, on average 2.969146 nodes in each cluster
#> Step 2/4: Finding interaction anchors overlapping genes and peaks...
#> Step 3/4: Finding genes and peaks in the same cluster...
#> Step 4/4: Annotating peaks with gene clusters...
#> Annotated 4336 peaks with gene clusters