Introduction

It is a routine way to show the distribution of mutation for genetic variations by lollipop-style (or needle-style) plot in a genome browser, along with a variety of genomic annotations, such as gene level or exon level models, CpG island, and so on. To show the SNP status or methylation data, a special plot, called lollipop plot or needle plot, is used to show the distribution along annotations. Many of currently available tools offline or online provide lollipop-like plot to show the mutation data such as cBioPortal Tools::MutationMapper1, EMBL-EBI::Pfam2, and BioJS::muts-needle-plot3, BiQ Analyzer4, and Methylation plotter5. The cBioPortal Tools::MutationMapper is a well-known and easy to use on-line genome browser that can output high quality of figures with mutations by input tab-delimited mutation data. In R/Bioconductor, there are options to use the flexibility of the Rgraphics system to display Methylation data such as MethVisual6, REDseq7, and GenVisR8.

Table 1: Tools availble for lollipop plot.

software inputs online description
MutationMapper1 tab-delimited text Yes interprets mutations with different heights along protein annotations in automatic color theme
Pfam2 JSON Yes could combine different line and head colors with different drawing styles
muts-needle-plot3 JSON No plots data point with different colors, heights, and size along annotations, and highlighted selcted coordinates
BiQ Analyzer4 BiQ methylation file Yes interprets methylation status in black & white
Methylation plotter5 tab-delimited text Yes stacked multiple methylation status for multiple samples
MethVisual6 R list No visualize the methylation status of CpGs according to their genomic position
REDseq7 R list No plot frequencies of methylations and SNPs along a chromosome
GenVisR8 R dataframe No plot most accurate graphic representation of the ensembl annotation version based on biomart service

All the tools available for the genomic data visualization mentioned above can meet the basic even complicated requirement. However, if there are multiple mutations in almost same position, it is very difficult to display the data using any available tools. What’s more there is a tendency that the figures it generated become more and more complex and busy. Thereby make it difficult in generating high quality pictures for publication in bunch. To fill this gap, we developed trackViewer, a R/Bioconductor package as an enhanced light-weight genome viewer for visualizing various types of high-throughput sequencing data, side by side or stacked with multiple annotation tracks. Besides the regular read coverage tracks supported by existing genome browsers/viewers, trackViewer can also be used to generate lollipop plot to depict the methylation and SNP/mutation status, together with coverage data and annotation tracks to facilitate integrated analysis of multi-omics data. In addition, figures generated by trackViewer are interactive, i.e., the feel-and-look such as the layout, the color scheme and the order of tracks can be easily customized by the users. Furthermore, trackViewer can be easily integrated into standard analysis pipeline for various high-throughput sequencing dataset such as ChIP-seq, RNA-seq, methylation-seq or DNA-seq. The images produced by trackViewer are highly customizable including labels, symbols, colors and size. Here, we illustrate its utilities and capabilities in deriving biological insights from multi-omics dataset from GEO/Encode.

Installation

BiocManager is used to install the released version of trackViewer.

library(BiocManager)
install("trackViewer")

To install the development version of trackViewer, please try

library(BiocManager)
install("jianhong/trackViewer")

To check the current version of trackViewer, please try

packageVersion("trackViewer")
## [1] '1.25.1'

Quick start for lollipop plot

Step 1. Prepare the methylation/variant/mutation data

The input data for lollipop plot by trackViewer is an object of GenomicRanges::GRanges.

## load library
library(trackViewer)
set.seed(123) ## set seed for random sampling to make sure it can be repeat.
## Here we use SNP sample data
SNP <- c(10, 100, 105, 108, 400, 410, 420, 600, 700, 805, 840, 1400, 1402)
## use GenomicRanges::GRanges function to create a GRanges object.
## for real data, users can import vcf data via VariantAnnotation::readVcf function.
sample.gr <- GRanges("chr1", IRanges(SNP, width=1,
                                     ## the name of GRanges object will be used as label
                                     names=paste0("snp", SNP)),
                     ## score value will be used to for the height of lollipop
                     score = sample.int(5, length(SNP), replace = TRUE),
                     ## set the color for lollipop node.
                     color = sample.int(6, length(SNP), replace = TRUE),
                     ## set the lollipop stem color
                     border = sample(c("black", "gray80", "gray30"),
                                     length(SNP), replace=TRUE)
                     )

Step 2. Prepare the gene/protein model

The gene/protein model for lollipop plot by trackViewer is also an object of GenomicRanges::GRanges.

features.gr <- GRanges("chr1", IRanges(c(1, 501, 1001),
                                       width=c(120, 400, 405),
                                       names=paste0("exon", 1:3)),
                       fill = c("#FF8833", "#51C6E6", "#DFA32D"), ## color for exon
                       height = c(0.02, 0.05, 0.08) ## height for exon
                    )

Step 3. Plot lollipop plot

The clustered events could be visualized one by one by jittered lollipop positions.

lolliplot(sample.gr, features.gr)

Plot data from Variant Call Format (VCF) file

VCF is a text file format that contains metadata and mutation information about genomic positions, original genotypes and optional genotypes. The trackViewer package could show single nucleotide polymorphisms (SNPs) from VCF file in lollipop-style plot. Figure @ref(fig:plotVCFdata) shows an example lollipop plot of real SNPs. Sample SNPs are a subset of 1000 variants and 50 samples from chromosome 22 taken from 1000 Genomes in VCF in the VariantAnnotation package. Different colors depict the new SNP events in the circles. The number of circles indicates the number of SNP events.

library(VariantAnnotation) ## load package for reading vcf file
library(TxDb.Hsapiens.UCSC.hg19.knownGene) ## load package for gene model
library(org.Hs.eg.db) ## load package for gene name
library(rtracklayer)
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
## set the track range
gr <- GRanges("22", IRanges(50968014, 50970514, names="TYMP"))
## read in vcf file
tab <- TabixFile(fl)
vcf <- readVcf(fl, "hg19", param=gr)
## get GRanges from VCF object 
mutation.frequency <- rowRanges(vcf)
## keep the metadata
mcols(mutation.frequency) <-
    cbind(mcols(mutation.frequency),
          VariantAnnotation::info(vcf))
## set colors
mutation.frequency$border <- "gray30"
mutation.frequency$color <-
    ifelse(grepl("^rs", names(mutation.frequency)),
           "lightcyan", "lavender")
## plot Global Allele Frequency based on AC/AN
mutation.frequency$score <- round(mutation.frequency$AF*100)
## change the SNPs label rotation angle
mutation.frequency$label.parameter.rot <- 45
## keep sequence level style same
seqlevelsStyle(gr) <- seqlevelsStyle(mutation.frequency) <- "UCSC"
seqlevels(mutation.frequency) <- "chr22"
## extract transcripts in the range
trs <- geneModelFromTxdb(TxDb.Hsapiens.UCSC.hg19.knownGene,
                         org.Hs.eg.db, gr=gr)
## subset the features to show the interested transcripts only
features <- c(range(trs[[1]]$dat), range(trs[[5]]$dat))
## define the feature labels
names(features) <- c(trs[[1]]$name, trs[[5]]$name)
## define the feature colors
features$fill <- c("lightblue", "mistyrose")
## define the feature heights
features$height <- c(.02, .04)
## set the legends
legends <- list(labels=c("known", "unkown"),
                fill=c("lightcyan", "lavender"),
                color=c("gray80", "gray80"))
## lollipop plot
lolliplot(mutation.frequency,
          features, ranges=gr, type="circle",
          legend=legends)
lollipop plot for VCF data

lollipop plot for VCF data

Plot methylation data from bed format file

Any data can imported into a GRanges object can be viewed by trackViewer package. Sample methylations are random data generated for illustration and saved in bed format file. The rtracklayer package is used to import the methylation data into a GRanges. The transcripts are extracted from TxDb object and assigned gene symbol with org database. Here also show multiple transcripts could be shown in different colors and tracks.

library(TxDb.Hsapiens.UCSC.hg19.knownGene) ## load package for gene model
library(org.Hs.eg.db) ## load package for gene name
library(rtracklayer)
## set the track range
gr <- GRanges("chr22", IRanges(50968014, 50970514, names="TYMP"))
## extract transcripts in the range
trs <- geneModelFromTxdb(TxDb.Hsapiens.UCSC.hg19.knownGene,
                         org.Hs.eg.db, gr=gr)
## subset the features to show the interested transcripts only
features <- GRangesList(trs[[1]]$dat, trs[[5]]$dat, trs[[6]]$dat)
flen <- elementNROWS(features)
features <- unlist(features)
## define the feature track layers
features$featureLayerID <- rep(1:2, c(sum(flen[-3]), flen[3]))
## define the feature labels
names(features) <- features$symbol
## define the feature colors
features$fill <- rep(c("lightblue", "mistyrose", "mistyrose"), flen)
## define the feature heights
features$height <- ifelse(features$feature=="CDS", .04, .02)
## import methylation data from a bed file
methy <- import(system.file("extdata", "methy.bed", package="trackViewer"), "BED")
## subset the data to simplify information
methy <- methy[methy$score > 20]
## for pie plot, there are must be at least two numeric columns
methy$score2 <- max(methy$score) - methy$score
## set the legends
legends <- list(labels=c("methylated", "unmethylated"),
                fill=c("white", "lightblue"),
                color=c("black", "black"))
## lollipop plot, pie layout
lolliplot(methy, features,
          ranges=gr, type="pie",
          legend=legends)
lollipop plot, pie layout

lollipop plot, pie layout

Plot lollipop plot for multiple patients in “pie.stack” layout

The percentage of methylation rates are shown by pie graph in different layers for different patients.

## simulate multiple patients
rand.id <- sample.int(length(methy), 3*length(methy), replace=TRUE)
rand.id <- sort(rand.id)
methy.mul.patient <- methy[rand.id]
## pie.stack require metadata "stack.factor", and the metadata can not be 
## stack.factor.order or stack.factor.first
len.max <- max(table(rand.id))
stack.factors <- paste0("patient",
                        formatC(1:len.max,
                                width=nchar(as.character(len.max)),
                                flag="0"))
methy.mul.patient$stack.factor <-
    unlist(lapply(table(rand.id), sample, x=stack.factors))
methy.mul.patient$score <-
    sample.int(100, length(methy.mul.patient), replace=TRUE)
## for a pie plot, two or more numeric meta-columns are required.
methy.mul.patient$score2 <- 100 - methy.mul.patient$score
## set different color set for different patient
patient.color.set <- as.list(as.data.frame(rbind(rainbow(length(stack.factors)),
                                                 "#FFFFFFFF"),
                                           stringsAsFactors=FALSE))
names(patient.color.set) <- stack.factors
methy.mul.patient$color <-
    patient.color.set[methy.mul.patient$stack.factor]
## set the legends
legends <- list(labels=stack.factors, col="gray80",
                fill=sapply(patient.color.set, `[`, 1))
## lollipop plot
lolliplot(methy.mul.patient,
          features, ranges=gr,
          type="pie.stack",
          legend=legends)
lollipop plot, pie.stack layout

lollipop plot, pie.stack layout

Plot lollipop plot in caterpillar layout to compare two samples

The caterpillar layout can be used to side by side comparing two samples or to display dense data.

## use SNPsideID to set the layer of event
sample.gr$SNPsideID <- sample(c("top", "bottom"),
                               length(sample.gr),
                               replace=TRUE)
lolliplot(sample.gr, features.gr)
lollipop plot, caterpillar layout

lollipop plot, caterpillar layout

Dandelion plot hundreds events

Sometimes, there are as many as hundreds of SNPs or methylation status involved in one gene. Dandelion plot can be used to depict such dense SNPs or methylation. Please note that the height of the dandelion indicates the density of the events.

methy <- import(system.file("extdata", "methy.bed", package="trackViewer"), "BED")
length(methy)
## [1] 38
## set the color of dandelion leaves.
methy$color <- 3
methy$border <- "gray"
## we suppose the total event are same (methy + unmethy)
## we rescale the methylation score by max value of the score 
m <- max(methy$score)
methy$score <- methy$score/m
# The area of the fan indicate the percentage of methylation or rate of mutation.
dandelion.plot(methy, features, ranges=gr, type="fan")
dandelion plot

dandelion plot

Browse genomic data by trackViewer

There are two packages available in Bioconductor for visualizing genomic data: rtracklayer and Gviz. The rtracklayer package provides an interface to genome browsers and associated annotation tracks. The Gviz package can be used to plot coverage and annotation tracks. The trackViewer is a lightweight visualization tool for generating interactive figures for publication.

TDP-43 cross-linking and immunoprecipitation coupled with high-throughput sequencing (CLIP-seq) and corresponding RNA-seq mapped files were downloaded from gene expression omnibus database (GSE27394)(9). The reads of replicates were combined. Figure @ref(fig:plotTDPdata) shows an example of alternative splicing change on exon 18 of sortilin 1 (Sort1). Cyan and blue arrows depict increased density of reads in the TDP-43 knockdown samples compared with controls. The CLIP-seq peaks upstream of exon 17 in between two dashed gray lines are potential functional peaks.

library(trackViewer) # load package
gr <- parse2GRanges("chr3:108,476,000-108,485,000")
library(GenomicFeatures) # load GenomicFeatures to create TxDb from UCSC
path <- system.file("extdata", "trackViewer", package = "workshop2020")
if(interactive()){
    mm8KG <- makeTxDbFromUCSC(genome="mm8", tablename="knownGene")
    saveDb(mm8KG, "mm8KG.sqlite")
}else{## mm8KG was saved as sqlite file
    mm8KG <- loadDb(file.path(path, "mm8KG.sqlite"))
}
library(org.Mm.eg.db) # load annotation database
## create the gene model tracks information
trs <- geneModelFromTxdb(mm8KG, org.Mm.eg.db, gr=gr)
## import data from bedGraph/bigWig/BED ... files, see ?importScore for details
CLIP <- importScore(file.path(path, "CLIP.bedGraph"), format="bedGraph", ranges=gr)
control <- importScore(file.path(path, "control.bedGraph"), format="bedGraph", ranges=gr)
knockdown <- importScore(file.path(path, "knockdown.bedGraph"), format="bedGraph", ranges=gr)
## create styles by preset theme
optSty <- optimizeStyle(trackList(trs, knockdown, control, CLIP), theme="col")
trackList <- optSty$tracks
viewerStyle <- optSty$style
## adjust the styles for this track
### rename the trackList for each track
names(trackList)[1:2] <- paste0("Sort1: ", names(trackList)[1:2])
names(trackList)[3] <- "RNA-seq TDP-43 KD"
names(trackList)[4] <- "RNA-seq control"
### change the lab positions for gene model track to bottomleft
setTrackStyleParam(trackList[[1]], "ylabpos", "bottomleft")
setTrackStyleParam(trackList[[2]], "ylabpos", "bottomleft")
### change the color of gene model track
setTrackStyleParam(trackList[[1]], "ylabgp", list(cex=1, col="red"))
setTrackStyleParam(trackList[[2]], "ylabgp", list(cex=1, col="green"))
### remove the xaxis
setTrackViewerStyleParam(viewerStyle, "xaxis", FALSE)
### add a scale bar in CLIP track
setTrackXscaleParam(trackList[[5]], "draw", TRUE)
## plot the tracks
vp <- viewTracks(trackList, gr=gr, viewerStyle=viewerStyle)
### add guide lines to show the range of CLIP-seq signal
addGuideLine(c(108481252, 108481887), vp=vp)
### add arrow mark to show the alternative splicing event
addArrowMark(list(x=c(108483570, 108483570),
                  y=c(3, 4)),
             label=c("Inclusive\nexon", ""),
             col=c("blue", "cyan"),
             vp=vp, quadrant=1)
Tracks for Sort1 exclusive exon

Tracks for Sort1 exclusive exon

Browse the tracks by a web browser

Users can also browse the tracks by a web browser. In addition, we implemented a web application of trackViewer leveraging Shiny package. The web application of trackViewer is available at https://github.com/jianhong/trackViewer.documentation/tree/master/trackViewerShinyApp.

browseTracks(trackList, gr=gr)

Plot epigenetic data by trackViewer

Epigenetics is the study of mechanisms that produce phenotype changes without alterations in the DNA sequence. It includes DNA methylation, histone modification and non-coding RNA. Above we showed how to use the trackViewer package to plot lollipop plots for DNA methylation. The trackViewer can also plot the lollipop plot with the coverage and annotation tracks for histone modification.

## we simulate 2 group of the DNA methylation data
len <- 30
TDPmethySimGP1 <- GRanges("chr3",
                          IRanges(sample(108476000:108485000, size = len),
                                  width = 1), strand = "+")
TDPmethySimGP1$type <- "pie" ## plot the methylation data in pie style
TDPmethySimGP2 <- TDPmethySimGP1
## set the methylation
TDPmethySimGP1$score <- sample.int(100, size = len, replace = TRUE)
TDPmethySimGP1$unmethy <- 100 - TDPmethySimGP1$score
TDPmethySimGP2$score <- sample.int(100, size = len, replace = TRUE)
TDPmethySimGP2$unmethy <- 100 - TDPmethySimGP2$score
## create the lollipopData track
lollipopData <- new("track",
                    dat=TDPmethySimGP1,
                    dat2=TDPmethySimGP2,
                    type = "lollipopData")
## create styles by preset theme
optSty <- optimizeStyle(trackList(trs, knockdown, control, CLIP, lollipopData),
                        theme="col")
trackList <- optSty$tracks
viewerStyle <- optSty$style
## plot the tracks
vp <- viewTracks(trackList, gr=gr, viewerStyle=viewerStyle)
Plot the lollipop plot with the coverage and annotation tracks

Plot the lollipop plot with the coverage and annotation tracks

1. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling 6, pl1 (2013).

2. Finn, R. D. et al. Pfam: Clans, web tools and services. Nucleic acids research 34, D247–D251 (2006).

3. Schroeder, M. P. Muts-needle-plot: Mutations needle plot v0.8.0. (2015). doi:10.5281/zenodo.14561

4. Bock, C. et al. BiQ analyzer: Visualization and quality control for dna methylation data from bisulfite sequencing. Bioinformatics 21, 4067–4068 (2005).

5. Mallona, I., Dı́ez-Villanueva, A. & Peinado, M. A. Methylation plotter: A web tool for dynamic visualization of dna methylation data. Source code for biology and medicine 9, 1 (2014).

6. Zackay, A. & Steinhoff, C. MethVisual-visualization and exploratory statistical analysis of dna methylation profiles from bisulfite sequencing. BMC research notes 3, 337 (2010).

7. Zhu, L. & others. REDseq: A bioconductor package for analyzing high throughput sequencing data from restriction enzyme digestion.

8. Skidmore, Z. L. et al. GenVisR: Genomic visualizations in r. Bioinformatics btw325 (2016).

9. Polymenidou, M. et al. Long pre-mRNA depletion and rna missplicing contribute to neuronal vulnerability from loss of tdp-43. Nature neuroscience 14, 459–468 (2011).