Abstract

The motifStack package is designed for graphic representation of multiple motifs with different similarity scores. It works with both DNA/RNA sequence motif, affinity logo and amino acid sequence motif. In addition, it provides the flexibility for users to customize the graphic parameters such as the font type and symbol colors.

Introduction

A sequence logo, based on information theory, has been widely used as a graphical representation of sequence conservation (aka motif) in multiple amino acid or nucleic acid sequences. Sequence motif represents conserved characteristics such as DNA binding sites, where transcription factors bind, and catalytic sites in enzymes. Although many tools, such as seqlogo1, have been developed to create sequence motif and to represent it as individual sequence logo, software tools for depicting the relationship among multiple sequence motifs are still lacking. We developed a flexible and powerful open-source R/Bioconductor package, motifStack, for visualization of the alignment of multiple sequence motifs.

Examples of using motifStack

plot a DNA sequence logo with different fonts and colors

Users can select different fonts and colors to draw the sequence logo.

library(motifStack)
pcm <- read.table(file.path(find.package("motifStack"), 
                            "extdata", "bin_SOLEXA.pcm"))
pcm <- pcm[,3:ncol(pcm)]
rownames(pcm) <- c("A","C","G","T")
motif <- new("pcm", mat=as.matrix(pcm), name="bin_SOLEXA")
##pfm object
#motif <- pcm2pfm(pcm)
#motif <- new("pfm", mat=motif, name="bin_SOLEXA")
plot(motif)
Plot a DNA sequence logo with different fonts and colors

Plot a DNA sequence logo with different fonts and colors

#plot the logo with same height
plot(motif, ic.scale=FALSE, ylab="probability")
Plot a DNA sequence logo with different fonts and colors

Plot a DNA sequence logo with different fonts and colors

#try a different font
plot(motif, font="mono,Courier", fontface="plain") # fontface can be 1=plain, 2=bold, 3=italic, 4=bold italic
Plot a DNA sequence logo with different fonts and colors

Plot a DNA sequence logo with different fonts and colors

#try a different font and a different color group
motif@color <- colorset(colorScheme='basepairing')
plot(motif,font="Times")
Plot a DNA sequence logo with different fonts and colors

Plot a DNA sequence logo with different fonts and colors

plot sequence logo with markers

If you assign markers slot by a list of marker object, markers can be plotted in the figure. There are three type of markers, “rect”, “line” and “text”.

markerRect <- new("marker", type="rect", start=6, stop=7, gp=gpar(lty=2, fill=NA, col="orange"))
markerLine <- new("marker", type="line", start=2, stop=7, gp=gpar(lwd=2, col="red"))
markerText <- new("marker", type="text", start=c(1, 5), 
                  label=c("*", "core"), gp=gpar(cex=2, col="red"))
motif <- new("pcm", mat=as.matrix(pcm), name="bin_SOLEXA", 
             markers=c(markerRect, markerLine, markerText))
plot(motif)
Plot a DNA sequence logo with markers

Plot a DNA sequence logo with markers

plot sequence logo stack

To show multiple motifs on the same canvas as a sequence logo stack, the distance of motifs need to be calculated first. Previously, MotIV3::motifDistances ( R implementation of STAMP4) is used to calculate the distance. However, The MotIV package were dropped from Bioconductor 3_12. Currently, by default, R implementation of matalign is used. After alignment, users can use plotMotifLogoStack, plotMotifLogoStackWithTree or plotMotifStackWithRadialPhylog to draw sequence logos in different layouts. To make it easy to use, we integrated different functionalities into one workflow function named as motifStack.

library(motifStack)
#####Input#####
motifs<-importMatrix(dir(file.path(find.package("motifStack"),
                                   "extdata"),"pcm$", 
                         full.names = TRUE))

## plot stacks
motifStack(motifs, layout="stack", ncex=1.0)
Plot motifs with sequence logo stack style

Plot motifs with sequence logo stack style

rnaMotifs <- DNAmotifToRNAmotif(motifs)
names(rnaMotifs)
## [1] "bin_SOLEXA"   "fd64A_SOLEXA" "fkh_NAR"      "foxo_SOLEXA"  "FoxP_SOLEXA" 
## [6] "slp1_SOLEXA"  "slp2_SOLEXA"
motifStack(rnaMotifs, layout = "stack", 
           reorder=FALSE) ## we can also use reorder=FALSE to keep the order of input. 
Plot RNA motifs with sequence logo stack style

Plot RNA motifs with sequence logo stack style

motif2 <- motif
motif2$mat <- motif$mat[, 5:12]
motif2$name <- "logo2"
psamMotifs <- list(motif, motif2)
motifStack(psamMotifs)
Plot affinity logos with sequence logo stack style

Plot affinity logos with sequence logo stack style

## plot stacks with hierarchical tree
motifStack(motifs, layout="tree")
Sequence logo stack with hierarchical cluster tree

Sequence logo stack with hierarchical cluster tree

## When the number of motifs is too much to be shown in a vertical stack, 
## motifStack can draw them in a radial style.
## random sample from MotifDb
library("MotifDb")
matrix.fly <- query(MotifDb, "Dmelanogaster")
motifs2 <- as.list(matrix.fly)
## use data from FlyFactorSurvey
motifs2 <- motifs2[grepl("Dmelanogaster\\-FlyFactorSurvey\\-",
                         names(motifs2))]
## format the names
names(motifs2) <- gsub("Dmelanogaster_FlyFactorSurvey_", "",
                       gsub("_FBgn\\d+$", "",
                            gsub("[^a-zA-Z0-9]","_",
                                 gsub("(_\\d+)+$", "", names(motifs2)))))
motifs2 <- motifs2[unique(names(motifs2))]
pfms <- sample(motifs2, 30)
## creat a list of object of pfm 
motifs2 <- mapply(pfms, names(pfms), FUN=function(.ele, .name){
  new("pfm",mat=.ele, name=.name)}, SIMPLIFY = FALSE)
## trim the motifs
motifs2 <- lapply(motifs2, trimMotif, t=0.4)
## setting colors
library(RColorBrewer)
color <- brewer.pal(10, "Set3")
## plot logo stack with radial style
motifStack(motifs2, layout="radialPhylog", 
           circle=0.3, cleaves = 0.2, 
           clabel.leaves = 0.5, 
           col.bg=rep(color, each=3), col.bg.alpha=0.3, 
           col.leaves=rep(color, each=3),
           col.inner.label.circle=rep(color, each=3), 
           inner.label.circle.width=0.05,
           col.outer.label.circle=rep(color, each=3), 
           outer.label.circle.width=0.02, 
           circle.motif=1.2,
           angle=350)
Plot motifs in a radial style when the number of motifs is too much to be shown in a vertical stack

Plot motifs in a radial style when the number of motifs is too much to be shown in a vertical stack

plot a sequence logo cloud

We can also plot a sequence logo cloud for DNA motifs.

## assign groups for motifs
groups <- rep(paste("group",1:5,sep=""), each=10)
names(groups) <- names(pfms)
## assign group colors
group.col <- brewer.pal(5, "Set3")
names(group.col)<-paste("group",1:5,sep="")
## create a list of pfm objects
pfms <- mapply(names(pfms), pfms, FUN=function(.ele, .pfm){
  new("pfm",mat=.pfm, name=.ele)}
               ,SIMPLIFY = FALSE)
## use matalign to calculate the distances of motifs
hc <- clusterMotifs(pfms)
## convert the hclust to phylog object
library(ade4)
phylog <- ade4::hclust2phylog(hc)
## reorder the pfms by the order of hclust
leaves <- names(phylog$leaves)
pfms <- pfms[leaves]
## extract the motif signatures
motifSig <- motifSignature(pfms, phylog, cutoffPval=0.0001, min.freq=1)
## draw the motifs with a tag-cloud style.
motifCloud(motifSig, scale=c(6, .5), 
           layout="rectangles", 
           group.col=group.col, 
           groups=groups, 
           draw.legend=TRUE)
Sequence logo cloud with rectangle packing layout

Sequence logo cloud with rectangle packing layout

motifCircos

We can also plot it with circos style. In circos style, we can plot two group of motifs and with multiple color rings.

## plot the logo stack with cirsoc style.
motifCircos(phylog=phylog, pfms=pfms, pfms2=sig, 
            col.tree.bg=rep(color, each=5), col.tree.bg.alpha=0.3, 
            col.leaves=rep(rev(color), each=5),
            col.inner.label.circle=gpCol, 
            inner.label.circle.width=0.03,
            col.outer.label.circle=gpCol, 
            outer.label.circle.width=0.03,
            r.rings=c(0.02, 0.03, 0.04), 
            col.rings=list(sample(colors(), 30), 
                           sample(colors(), 30), 
                           sample(colors(), 30)),
            angle=350, motifScale="logarithmic")
Grouped sequence logo with circos style layout

Grouped sequence logo with circos style layout

motifPiles

We can also plot the motifs in pile style. In pile style, we can plot two group of motifs with multiple types of annotation, for example heatmap. The col.anno parameter should be set as a named list.

## plot the logo stack with heatmap.
df <- data.frame(A=runif(n = 30), B=runif(n = 30), C=runif(n = 30), D=runif(n = 30))
map2col <- function(x, pal){
  rg <- range(x)
  pal[findInterval(x, seq(rg[1], rg[2], length.out = length(pal)+1), 
                   all.inside = TRUE)]
}
dl <- lapply(df, map2col, pal=heat.colors(10))
## alignment of the pfms, this step will make the motif logos occupy 
## more space. Users can skip this alignment to see the difference.
pfmsAligned <- DNAmotifAlignment(pfms)
## plot motifs
motifPiles(phylog=phylog, pfms=pfmsAligned, 
            col.tree=rep(color, each=5),
            col.leaves=rep(rev(color), each=5),
            col.pfms2=gpCol, 
            r.anno=rep(0.02, length(dl)), 
            col.anno=dl,
            motifScale="logarithmic",
            plotIndex=TRUE,
            groupDistance=10)
Grouped sequence logo with a heatmap

Grouped sequence logo with a heatmap

plot motifs with d3.js

Interactive plot can be generated using browseMotifs function which leverages the d3.js library. All motifs on the plot are draggable and the plot can be easily exported as a Scalable Vector Graphics (SVG) file.

browseMotifs(pfms = pfms, phylog = phylog, layout="tree", yaxis = FALSE, baseWidth=6, baseHeight = 15)

Plot the motifs in radialPhylog layout.

browseMotifs(pfms = pfms, phylog = phylog, layout="radialPhylog", yaxis = FALSE, xaxis = FALSE, baseWidth=6, baseHeight = 15)

docker container for motifStack

Docker container allows software to be packaged into containers which can be run in any platform using a virtual machine called boot2docker. To ease the installation of motifStack and its depencies, we have created a docker image containing all the components needed to run motifStack. Users can download the motifStack docker image using the following code snippet.

cd ~ ## in windows, please try cd c:\\ Users\\ username
docker pull jianhong/motifstack:latest
mkdir tmp4motifstack ## this will be the share folder for your host and container.
docker run -ti --rm -v ${PWD}/tmp4motifstack:/volume/data jianhong/motifstack:latest bash
  In motifstack:latest docker
    1  cd /volume/data
    2  git clone https://github.com/jianhong/motifStack.documentation.git
    3  cd motifStack.documentation/
    4  cp /usr/bin/matalign app/matalign-v4a
    5  cp /usr/bin/phylip/neighbor app/neighbor.app/Contents/MacOS/neighbor
    6  R cmd -e "rmarkdown::render('suppFigure2.Rmd')"
    7  R cmd -e "rmarkdown::render('suppFigure6.Rmd')"

You will see the test.pdf file in the folder of tmp4motifstack.

plot motifs with ggplot2

motifs could be plotted by geom_motif function.

pcm <- read.table(file.path(find.package("motifStack"), 
                            "extdata", "bin_SOLEXA.pcm"))
pcm <- pcm[,3:ncol(pcm)]
rownames(pcm) <- c("A","C","G","T")
markerRect <- new("marker", type="rect", start=6, stop=7, gp=gpar(lty=2, fill=NA, col="orange"))
markerLine <- new("marker", type="line", start=3, stop=5, gp=gpar(lwd=2, col="red"))
markerText <- new("marker", type="text", start=1, label="*", gp=gpar(cex=2, col="red"))
motif <- new("pcm", mat=as.matrix(pcm), name="bin_SOLEXA", 
             markers=c(markerRect, markerLine, markerText))
pfm <- pcm2pfm(motif)
df <- data.frame(xmin=c(.25, .25), ymin=c(.25, .75), xmax=c(.75, .75), ymax=c(.5, 1), 
                 fontfamily=c("Helvetica", "mono,Courier"), fontface=c(2, 1))
df$motif <- list(pfm, pfm)

library(ggplot2)

ggplot(df, aes(xmin=xmin, ymin=ymin, xmax=xmax, ymax=ymax, motif=motif, 
               fontfamily=fontfamily, fontface=fontface)) + 
    geom_motif() + theme_bw() + ylim(0, 1) + xlim(0, 1)

df <- data.frame(x=.5, y=c(.25, .75), width=.5, height=.25, 
                 fontfamily=c("Helvetica", "mono,Courier"), fontface=c(2, 1))
df$motif <- list(pfm, pfm)

ggplot(df, aes(x=x, y=y, width=width, height=height, motif=motif, 
               fontfamily=fontfamily, fontface=fontface)) + 
    geom_motif(use.xy=TRUE) + theme_bw() + ylim(0, 1) + xlim(0, 1)

Session Info

## R Under development (unstable) (2021-03-18 r80099)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] stats4    parallel  grid      stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-2   ggplot2_3.3.3        ade4_1.7-16         
##  [4] MotifDb_1.33.0       Biostrings_2.59.2    XVector_0.31.1      
##  [7] GenomicRanges_1.43.3 GenomeInfoDb_1.27.8  IRanges_2.25.6      
## [10] S4Vectors_0.29.9     BiocGenerics_0.37.1  motifStack_1.35.1   
## [13] knitr_1.31          
## 
## loaded via a namespace (and not attached):
##  [1] MatrixGenerics_1.3.1        Biobase_2.51.0             
##  [3] jsonlite_1.7.2              highr_0.8                  
##  [5] BiocManager_1.30.12         GenomeInfoDbData_1.2.4     
##  [7] Rsamtools_2.7.1             yaml_2.2.1                 
##  [9] progress_1.2.2              pillar_1.5.1               
## [11] lattice_0.20-41             glue_1.4.2                 
## [13] digest_0.6.27               colorspace_2.0-0           
## [15] htmltools_0.5.1.1           Matrix_1.3-2               
## [17] XML_3.99-0.6                pkgconfig_2.0.3            
## [19] grImport2_0.2-0             zlibbioc_1.37.0            
## [21] scales_1.1.1                jpeg_0.1-8.1               
## [23] BiocParallel_1.25.5         tibble_3.1.0               
## [25] ellipsis_0.3.1              cachem_1.0.4               
## [27] withr_2.4.1                 SummarizedExperiment_1.21.1
## [29] splitstackshape_1.4.8       magrittr_2.0.1             
## [31] crayon_1.4.1                memoise_2.0.0              
## [33] evaluate_0.14               fs_1.5.0                   
## [35] fansi_0.4.2                 MASS_7.3-53.1              
## [37] textshaping_0.3.3           tools_4.1.0                
## [39] data.table_1.14.0           prettyunits_1.1.1          
## [41] hms_1.0.0                   BiocStyle_2.19.2           
## [43] BiocIO_1.1.2                lifecycle_1.0.0            
## [45] matrixStats_0.58.0          stringr_1.4.0              
## [47] munsell_0.5.0               DelayedArray_0.17.10       
## [49] compiler_4.1.0              pkgdown_1.6.1              
## [51] systemfonts_1.0.1           rlang_0.4.10               
## [53] debugme_1.1.0               RCurl_1.98-1.3             
## [55] rjson_0.2.20                htmlwidgets_1.5.3          
## [57] labeling_0.4.2              bitops_1.0-6               
## [59] base64enc_0.1-3             rmarkdown_2.7              
## [61] restfulr_0.0.13             gtable_0.3.0               
## [63] R6_2.5.0                    GenomicAlignments_1.27.2   
## [65] rtracklayer_1.51.5          fastmap_1.1.0              
## [67] utf8_1.2.1                  rprojroot_2.0.2            
## [69] ragg_1.1.2                  desc_1.3.0                 
## [71] stringi_1.5.3               png_0.1-7                  
## [73] vctrs_0.3.7                 xfun_0.22

Reference

1.
Bembom, O. seqLogo: Sequence logos for DNA sequence alignments. R package version 1.5.4 (2006).
2.
Foat, B. C., Morozov, A. V. & Bussemaker, H. J. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22, e141–e149 (2006).
3.
Mercier, E. & Gottardo, R. MotIV: Motif identification and validation. R package version 1.10.0 (2010).
4.
S, M. & PV, B. STAMP: A web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 35(Web Server issue), W253–W258 (2007).