Introduction

The HMMtBroadPeak package is designed to call very broad peaks for data such as lamina-associated domains (LADs), nucleolus-associated domains (NADs), or other topologically associating domains.

The methods is following the description of Christ et.al1. Reads will be count by each bins. Only bins with at least given reads (defined by background parameter) for all samples (pool all reads for each bin) will be subsequently normalized. These bins will be first normalized to CPM (count per million) reads and then do log2 transform for the ratio over control with a pseudocount. The peaks were defined by running a hidden markov model over the normalized values (using the R-package HMMt).

Quick start

There are three steps for calling peaks:

Step1: prepare the bam files.

The bam files should be clean with reads passed quality control and proper paired (if applicable). The index file of bam should be stored in the same folder and with same prefix.

treatment <- system.file("extdata", "LB1.KD.chr1_1_5000000.bam",
                         package = "HMMtBroadPeak",
                         mustWork = TRUE)
control   <- system.file("extdata", "LB1.WT.chr1_1_5000000.bam",
                         package = "HMMtBroadPeak",
                         mustWork = TRUE)
## For local file, please try
# treatment <- "path/to/treatment/bam/files"
# control <- "path/to/control/bam/files"

Step2: calling peaks.

The reads counts for treatment and control will be pool for each group. That is to say duplicates will not be considered when we call peaks.

library(HMMtBroadPeak)
called <- HMMtBroadPeak(treatment, control)
## 
iteration: 1
iteration: 2
iteration: 3
iteration: 4
iteration: 5
iteration: 6
iteration: 7
iteration: 8
iteration: 9
iteration: 10
iteration: 11
iteration: 12
iteration: 13
iteration: 14
iteration: 15
iteration: 16
iteration: 17
iteration: 18
iteration: 19
iteration: 20
iteration: 21
iteration: 22
iteration: 23
iteration: 24
iteration: 25
iteration: 26
iteration: 27
iteration: 28
iteration: 29
iteration: 30
iteration: 31
iteration: 32
iteration: 33
iteration: 34
iteration: 35
iteration: 36
iteration: 37
iteration: 38
iteration: 39
iteration: 40
iteration: 41
iteration: 42
iteration: 43
iteration: 44
iteration: 45
iteration: 46
iteration: 47
iteration: 48
iteration: 49
iteration: 50
iteration: 51
iteration: 52
iteration: 53
called$peaks
## GRanges object with 3 ranges and 0 metadata columns:
##       seqnames          ranges strand
##          <Rle>       <IRanges>  <Rle>
##   [1]     chr1  774227-1698303      *
##   [2]     chr1 1713289-2657344      *
##   [3]     chr1 2777225-5000001      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome

Step3: validate the calling and export peaks

library(ggplot2)
plotPeaks(called, seqname="chr1") + theme_bw()

library(rtracklayer)
export(called$peaks, "called.broad.peak.bed")

Session Info

## R Under development (unstable) (2021-03-18 r80099)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.3.3               rtracklayer_1.51.5         
##  [3] HMMtBroadPeak_0.0.4         GenomicAlignments_1.27.2   
##  [5] Rsamtools_2.7.1             Biostrings_2.59.2          
##  [7] XVector_0.31.1              SummarizedExperiment_1.21.2
##  [9] Biobase_2.51.0              MatrixGenerics_1.3.1       
## [11] matrixStats_0.58.0          GenomicRanges_1.43.4       
## [13] GenomeInfoDb_1.27.8         IRanges_2.25.6             
## [15] S4Vectors_0.29.12           BiocGenerics_0.37.1        
## 
## loaded via a namespace (and not attached):
##  [1] lattice_0.20-41        rprojroot_2.0.2        digest_0.6.27         
##  [4] utf8_1.2.1             R6_2.5.0               evaluate_0.14         
##  [7] highr_0.8              pillar_1.5.1           zlibbioc_1.37.0       
## [10] rlang_0.4.10           Matrix_1.3-2           rmarkdown_2.7         
## [13] pkgdown_1.6.1          labeling_0.4.2         textshaping_0.3.3     
## [16] desc_1.3.0             BiocParallel_1.25.5    stringr_1.4.0         
## [19] RCurl_1.98-1.3         munsell_0.5.0          DelayedArray_0.17.10  
## [22] compiler_4.1.0         xfun_0.22              pkgconfig_2.0.3       
## [25] systemfonts_1.0.1      htmltools_0.5.1.1      tibble_3.1.0          
## [28] GenomeInfoDbData_1.2.4 XML_3.99-0.6           fansi_0.4.2           
## [31] withr_2.4.1            crayon_1.4.1           bitops_1.0-6          
## [34] grid_4.1.0             gtable_0.3.0           lifecycle_1.0.0       
## [37] magrittr_2.0.1         scales_1.1.1           HMMt_0.1              
## [40] stringi_1.5.3          debugme_1.1.0          cachem_1.0.4          
## [43] farver_2.1.0           fs_1.5.0               ellipsis_0.3.1        
## [46] ragg_1.1.2             vctrs_0.3.7            rjson_0.2.20          
## [49] restfulr_0.0.13        tools_4.1.0            glue_1.4.2            
## [52] fastmap_1.1.0          yaml_2.2.1             colorspace_2.0-0      
## [55] memoise_2.0.0          knitr_1.31             BiocIO_1.1.2

References

1.
Leemans, C. et al. Promoter-intrinsic and local chromatin features determine gene repression in LADs. Cell 177, 852–864.e14 (2019).