Jianhong Ou
Mar 30, 2017
High-throughput data
Annotation
Visualization
Appliation for discovering direct or indirect targets of transcription factors using ChIP-chip or ChIP-seq, and microarray or RNA-seq gene expression data. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TF.
## First, load the ChIPpeakAnno package
library(ChIPpeakAnno)
## Step 1: Convert the peak data to GRanges with toGRanges
path <- system.file("extdata", "Tead4.broadPeak", package="ChIPpeakAnno")
peaks <- toGRanges(path, format="broadPeak")
## Step 2: Prepare annotation data with toGRanges
library(EnsDb.Hsapiens.v75)
annoData <- toGRanges(EnsDb.Hsapiens.v75)
## Step 3: Annotate the peaks with annotatePeakInBatch
## keep the seqnames in the same style
seqlevelsStyle(peaks) <- seqlevelsStyle(annoData)
## do annotation by nearest TSS
anno <- annotatePeakInBatch(peaks, AnnotationData=annoData)
## Step 4: Add additional annotation with addGeneIDs
library(org.Hs.eg.db)
anno <- addGeneIDs(anno, orgAnn="org.Hs.eg.db",
feature_id_type="ensembl_gene_id",
IDs2Add=c("symbol"))
anno[1:2]
GRanges object with 2 ranges and 14 metadata columns:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
peak12338.ENSG00000227061 chr2 [175473, 176697] * | 206
peak12339.ENSG00000143727 chr2 [246412, 246950] * | 31
signalValue pValue qValue peak
<numeric> <numeric> <numeric> <character>
peak12338.ENSG00000227061 668.42 -1 -1 peak12338
peak12339.ENSG00000143727 100.23 -1 -1 peak12339
feature start_position end_position
<character> <integer> <integer>
peak12338.ENSG00000227061 ENSG00000227061 197569 202605
peak12339.ENSG00000143727 ENSG00000143727 264140 278283
feature_strand insideFeature distancetoFeature
<character> <factor> <numeric>
peak12338.ENSG00000227061 + upstream -22096
peak12339.ENSG00000143727 + upstream -17728
shortestDistance fromOverlappingOrNearest
<integer> <character>
peak12338.ENSG00000227061 20872 NearestLocation
peak12339.ENSG00000143727 17190 NearestLocation
symbol
<character>
peak12338.ENSG00000227061 <NA>
peak12339.ENSG00000143727 ACP1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
browseMotifs(pfms = pfms, phylog = phylog, layout="radialPhylog",
yaxis = FALSE, xaxis = FALSE, baseWidth=6, baseHeight = 15)
Visualize significant conserved amino acid sequence pattern in groups based on probability theory
Positions 10, 14, 16, 21 and 25 are partially or completely buried and therefore tend to be populated by hydrophobic amino acids, which are very clear if we group the peptides by chemistry.
TDP-43 cross-linking and immunoprecipitation coupled with high-throughput sequencing (CLIP-seq) and corresponding RNA-seq mapped reads are shown for an alternative splicing events on exon 18 of sortilin1 (Sort1).
Methylations and SNPs are shown in two lollipop plots with annotation information along genomic coordinates. Different colors depict the new SNP events in the circles and methylation status in pie.stack plot.