Bioconductor Packages

Jianhong Ou
Mar 30, 2017


GeneNetworkBuilder: Build Regulatory Network with the combination of the expression data and regulation data

GeneNetworkBuilder

Appliation for discovering direct or indirect targets of transcription factors using ChIP-chip or ChIP-seq, and microarray or RNA-seq gene expression data. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TF.

InPAS: Identification of Novel alternative PolyAdenylation Sites (PAS)

Alternative polyadenylation (APA) is one of the important post-transcriptional regulation mechanisms which occurs in most human genes. InPAS facilitates the discovery of novel APA sites from RNAseq data. It leverages cleanUpdTSeq to fine tune identified APA sites.

NADfinder: Call peaks for sequence data of nucleolar associated domains

NADfinder

Mapping the nucleolar-associated domains (NADs) by performing DNA sequencing for purified nucleoli. NADfinder is designed to call broad peaks for NAD sequence data.

chr18

ATACqc: ATAC sequencing Quality Control

Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a alternative or complementary technique to MNase-seq (sequencing of micrococcal nuclease sensitive sites). It is rapid and sensitive method for chromatin accessibility analysis. We collect codes for ATAC-seq analysis following the report of Greenleaf Lab, in order to help users to do quick quaulity control for their data, which including fragment size distribution, nucleosome positioning, and CTCF or other Transcript Factor footprints.

The cumulative percentage tag allocation in nucleosome-free fragments and nucleosomes

ATAC-seq fragment sizes. Inset, log-transformed histogram shows clear periodicity persists to multiple nucleosomes.

ATAC-seq coverages for all active TSSs.
TSSs are enriched for nucleosome-free fragments and show phased nucleosomes at the -2, -1, +1, +2, +3
Aggregate ATAC-seq footprint for CTCF generated over binding sites within the genome.

ChIPpeakAnno: Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments

## First, load the ChIPpeakAnno package
library(ChIPpeakAnno)
## Step 1: Convert the peak data to GRanges with toGRanges
path <- system.file("extdata", "Tead4.broadPeak", package="ChIPpeakAnno")
peaks <- toGRanges(path, format="broadPeak")
## Step 2: Prepare annotation data with toGRanges
library(EnsDb.Hsapiens.v75)
annoData <- toGRanges(EnsDb.Hsapiens.v75)
## Step 3: Annotate the peaks with annotatePeakInBatch
## keep the seqnames in the same style
seqlevelsStyle(peaks) <- seqlevelsStyle(annoData)
## do annotation by nearest TSS
anno <- annotatePeakInBatch(peaks, AnnotationData=annoData)
## Step 4: Add additional annotation with addGeneIDs
library(org.Hs.eg.db)
anno <- addGeneIDs(anno, orgAnn="org.Hs.eg.db", 
                   feature_id_type="ensembl_gene_id",
                   IDs2Add=c("symbol"))
anno[1:2]
GRanges object with 2 ranges and 14 metadata columns:
                            seqnames           ranges strand |     score
                               <Rle>        <IRanges>  <Rle> | <integer>
  peak12338.ENSG00000227061     chr2 [175473, 176697]      * |       206
  peak12339.ENSG00000143727     chr2 [246412, 246950]      * |        31
                            signalValue    pValue    qValue        peak
                              <numeric> <numeric> <numeric> <character>
  peak12338.ENSG00000227061      668.42        -1        -1   peak12338
  peak12339.ENSG00000143727      100.23        -1        -1   peak12339
                                    feature start_position end_position
                                <character>      <integer>    <integer>
  peak12338.ENSG00000227061 ENSG00000227061         197569       202605
  peak12339.ENSG00000143727 ENSG00000143727         264140       278283
                            feature_strand insideFeature distancetoFeature
                               <character>      <factor>         <numeric>
  peak12338.ENSG00000227061              +      upstream            -22096
  peak12339.ENSG00000143727              +      upstream            -17728
                            shortestDistance fromOverlappingOrNearest
                                   <integer>              <character>
  peak12338.ENSG00000227061            20872          NearestLocation
  peak12339.ENSG00000143727            17190          NearestLocation
                                 symbol
                            <character>
  peak12338.ENSG00000227061        <NA>
  peak12339.ENSG00000143727        ACP1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Motif Visualization -- motifStack

pipline

From single motif to multiple motifs

  • Plot aligned motifs
  • Powerful tool to visulize bunch of sequence logos
  • Highlight grouped motifs by their signatures
  • Multiple style and technique to show and label motifs

motifPile

motifStack -- greate samples of motifStack in top journals

nature nature

cell cell

motifStack -- More and more style in continuous development

browseMotifs(pfms = pfms, phylog = phylog, layout="radialPhylog", 
             yaxis = FALSE, xaxis = FALSE, baseWidth=6, baseHeight = 15)

Motif Visualization -- dagLogo

dagLogo

Highlight the characters of AA sequence logo

  • Visualize significant conserved amino acid sequence pattern in groups based on probability theory

  • Positions 10, 14, 16, 21 and 25 are partially or completely buried and therefore tend to be populated by hydrophobic amino acids, which are very clear if we group the peptides by chemistry.

Genomic Data Visualization -- trackViewer

trackViewer

trackViewer

trackViewer produces high-quality plots of data from genomic association studies

  • TDP-43 cross-linking and immunoprecipitation coupled with high-throughput sequencing (CLIP-seq) and corresponding RNA-seq mapped reads are shown for an alternative splicing events on exon 18 of sortilin1 (Sort1).

  • Methylations and SNPs are shown in two lollipop plots with annotation information along genomic coordinates. Different colors depict the new SNP events in the circles and methylation status in pie.stack plot.