Fast annotation of genomic peaks using DNA interaction data by building interaction networks with igraph. Peaks overlapping any node in a connected subgraph are annotated with all genes in that subgraph.

annoLinker(
  peaks,
  annoData,
  interactions,
  bindingType = c("startSite", "body", "endSite"),
  bindingRegion = c(-5000, 5000),
  cluster_method = c("components", "louvain", "walktrap", "infomap"),
  maxgap = 0,
  interactionDistanceRange = c(10000, 1e+07),
  addEvidence = FALSE,
  parallel = FALSE,
  verbose = FALSE,
  ...
)

Arguments

peaks

GRanges object containing peak regions

annoData

annoGR or GRanges object with gene annotations

interactions

GInteractions, or Pairs object with interaction data (e.g., Hi-C, ChIA-PET)

bindingType

Character, one of "startSite", "body", or "endSite"

bindingRegion

Numeric vector of length 2 defining promoter window (e.g., c(-5000, 5000))

cluster_method

Character, clustering method: "components" (connected components), "louvain", "walktrap", or "infomap"

maxgap

Integer, bp to extend interaction anchors for overlap detection (default: 0)

interactionDistanceRange

Numeric vector of length 2 defining the minimal and maximal distance of interactions. This is used to make sure the annotations are not supper far away.

addEvidence

Logical, add evidence to the metadata or not.

parallel

Logical, use future_lapply to do parallel computing or not.

verbose

Logical, print the message or not

...

Parameters for cluster. see cluster_louvain, cluster_walktrap, and cluster_infomap.

Value

An annoLinkerResult object or NULL if no annotations found

Examples

## read the peaks and interactions
library(rtracklayer)
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:annoLinker':
#> 
#>     as.data.frame
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: Seqinfo
extPath <- system.file('extdata', package='annoLinker')
peaks <- rtracklayer::import(file.path(extPath, 'peaks.bed'))
interactions <- rtracklayer::import(file.path(extPath, 'interaction.bedpe'))
library(TxDb.Drerio.UCSC.danRer10.refGene)
#> Loading required package: GenomicFeatures
#> Loading required package: AnnotationDbi
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
annoData <- genes(TxDb.Drerio.UCSC.danRer10.refGene)
#>   32 genes were dropped because they have exons located on both strands of the
#>   same reference sequence or on more than one reference sequence, so cannot be
#>   represented by a single genomic range.
#>   Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList
#>   object, or use suppressMessages() to suppress this message.
anno <- annoLinker(peaks, annoData, interactions, verbose=TRUE)
#> Step 1/4: Building interaction network graph...
#> Graph has 3753 nodes and 2500 edges in 1264 clusters, on average 2.969146 nodes in each cluster
#> Step 2/4: Finding interaction anchors overlapping genes and peaks...
#> Step 3/4: Finding genes and peaks in the same cluster...
#> Step 4/4: Annotating peaks with gene clusters...
#> Annotated 4336 peaks with gene clusters