Introduction

There is an increasing requirement for the tools to identify of putative mammalian orthologs to enhancers in species other than human and mouse, such as zebrafish, which is lacking whole genome comparison analysis data. Take zebrafish as an example, there are two major methods to identify the orthologs to enhancers in human and mouse,

  1. use the whole genome comparison analysis data and conservation data1,

  2. use spotted gar genome as bridge genome to search the orthologs2.

Both methods will work well in the coding region. However, there is lacking comparative data in distal regulation region such as enhancers and silencers.

In 2020, Emily S. Wong et. al. provides a new method for identification of putative human orthologs to enhancers of zebrafish3. They used the method to interrogate conserved syntenic regions and human and mouse using candidate sponge enhancer sequences. First, they looked for overlap with available functional genomics information. For example, they used mouse ENCODE data to infer enhancer activity based on histone marks in specific tissues. Second, they select the best-aligned region by whole genome alignment from the candidates regions for human and mouse as orthologs. This method provides the possibility to search orthologs for enhancers or silencers even there is not genome comparative data available.

This package is modified from Wong’s methods and provide the easy-to-use script for researchers to quick search putative mammalian orthologs to enhancers. The modified algorithm is: The candidate regions were determined by ENCODE histone marks (default is H3K4me1) in specific tissue for human and mouse. The mapping score were calculated by pairwise alignment between enhancer sequences and candidates by global sequence alignment4. The Z-score were calculated from mapping score and then converted to P-value based on two-side test from a normal distribution. The candidates were filtered by p-value and distance from the TSS of target homologs. And then the top candidates from human and mouse were aligned to each other and exported as multiple alignments with given enhancer.

Installation

First install enhancerHomologSearch and other packages required to run the examples. Please note the example dataset used here is from zebrafish. To run analysis with dataset from a different species or different assembly, please install the corresponding Bsgenome and TxDb. For example, to analyze cattle data aligned to bosTau9, please install BSgenome.Btaurus.UCSC.bosTau9, and TxDb.Btaurus.UCSC.bosTau9.refGene. You can also generate a TxDb object by functions makeTxDbFromGFF from a local gff file, or makeTxDbFromUCSC, makeTxDbFromBiomart, and makeTxDbFromEnsembl, from online resources in GenomicFeatures package.

if (!"BiocManager" %in% rownames(installed.packages()))
     install.packages("BiocManager")
library(BiocManager)
BiocManager::install(c("enhancerHomologSearch",
                       "BiocParallel",
                       "BSgenome.Drerio.UCSC.danRer10",
                       "BSgenome.Hsapiens.UCSC.hg38",
                       "BSgenome.Mmusculus.UCSC.mm10",
                       "TxDb.Hsapiens.UCSC.hg38.knownGene",
                       "TxDb.Mmusculus.UCSC.mm10.knownGene",
                       "org.Hs.eg.db",
                       "org.Mm.eg.db"))

If you have trouble in install ribosomeProfilingQC, please check your R version first. The enhancerHomologSearch package require R >= 4.1.0.

R.version
##                _                           
## platform       x86_64-pc-linux-gnu         
## arch           x86_64                      
## os             linux-gnu                   
## system         x86_64, linux-gnu           
## status                                     
## major          4                           
## minor          1.1                         
## year           2021                        
## month          08                          
## day            10                          
## svn rev        80725                       
## language       R                           
## version.string R version 4.1.1 (2021-08-10)
## nickname       Kick Things

Step 1, prepare target enhancer sequences.

In this example, we will use an enhancer of lepb gene in zebrafish.

# load genome sequences
library(BSgenome.Drerio.UCSC.danRer10)
# define the enhancer genomic coordinates
LEN <- GRanges("chr4", IRanges(19050041, 19051709))
# extract the sequences as Biostrings::DNAStringSet object
(seqEN <- getSeq(BSgenome.Drerio.UCSC.danRer10, LEN))
## DNAStringSet object of length 1:
##     width seq
## [1]  1669 TGGCATACACAGCAAACATCATGAATTTAATTTA...TAGATAAATAGAAACAGAAGCAAATTGGCGAGT

Step 2, download candidate regions of enhancers from ENCODE by H3K4me1 marks

By default, the hisone marker is H3K4me1. Users can also define the markers by markers parameter in the function getENCODEdata. To make sure the markers are tissue specific, we can filter the data by biosample_name and biosample_type parameters. For additional filters, please refer ?getENCODEdata.

# load library
library(enhancerHomologSearch)
library(BSgenome.Hsapiens.UCSC.hg38)
library(BSgenome.Mmusculus.UCSC.mm10)
# download enhancer candidates for human heart tissue
hs <- getENCODEdata(genome=Hsapiens,
                    partialMatch=c(biosample_summary = "heart"))
# download enhancer candidates for mouse heart tissue
mm <- getENCODEdata(genome=Mmusculus,
                    partialMatch=c(biosample_summary = "heart"))

Step 3, get alignment score for target enhancer and candidate enhancers.

This step is time consuming step. For quick run, users can subset the data by given genomic coordinates.

# subset the data for test run 
# in human, the homolog HBEGF gene is located at chromosome 5
# you can try to subset the data by chromosome 5
gr <- as(seqinfo(Hsapiens), "GRanges")
hs <- subsetByOverlaps(hs, gr[seqnames(gr)=="chr7"])
# In this test run, we will only use upstream 1M and downstream 1M of homolog
# gene
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
eid <- mget("LEP", org.Hs.egALIAS2EG)[[1]]
g_hs <- select(TxDb.Hsapiens.UCSC.hg38.knownGene,
               keys=eid,
               columns=c("GENEID", "TXCHROM", "TXSTART", "TXEND", "TXSTRAND"),
               keytype="GENEID")
g_hs <- range(with(g_hs, GRanges(TXCHROM, IRanges(TXSTART, TXEND))))
expandGR <- function(x, ext){
  stopifnot(length(x)==1)
  start(x) <- max(1, start(x)-ext)
  end(x) <- end(x)+ext
  GenomicRanges::trim(x)
}
hs <- subsetByOverlaps(hs, expandGR(g_hs, ext=1000000))
# in mouse, the homolog Hbegf gene is located at chromosome 18
# same as above script in human, you can try to subset the data by chromosome.
gr <- as(seqinfo(Mmusculus), "GRanges")
mm <- subsetByOverlaps(mm, gr[seqnames(gr)=="chr6"])
# Here we use the subset of 1M upstream and downstream of homolog gene.
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)
eid <- mget("Lep", org.Mm.egALIAS2EG)[[1]]
g_mm <- select(TxDb.Mmusculus.UCSC.mm10.knownGene,
               keys=eid,
               columns=c("GENEID", "TXCHROM", "TXSTART", "TXEND", "TXSTRAND"),
               keytype="GENEID")
g_mm <- range(with(g_mm,
                   GRanges(TXCHROM,
                           IRanges(TXSTART, TXEND),
                           strand=TXSTRAND)))
g_mm <- g_mm[seqnames(g_mm) %in% "chr6" & strand(g_mm) %in% "+"]
mm <- subsetByOverlaps(mm, expandGR(g_mm, ext=1000000))

# use parallel computing to speed up.
library(BiocParallel)
bpparam <- MulticoreParam(tasks=200, progressbar=TRUE)
aln_hs <- alignmentOne(seqEN, hs, bpparam=bpparam)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%
aln_mm <- alignmentOne(seqEN, mm, bpparam=bpparam)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%

Step 4, filter the candidate regions.

Here we will filter the candidate regions more than 5K from TSS of homolog but within 100K from the gene body. The candidates will be also filtered by p-value.

# Step4
ext <- 100000
aln_hs <- subsetByOverlaps(aln_hs, ranges = expandGR(g_hs, ext=ext))
## filter by distance
distance(aln_hs) <- distance(peaks(aln_hs), g_hs, ignore.strand=TRUE)
aln_hs <- subset(aln_hs, pval<0.05 & distance >5000)


aln_mm <- subsetByOverlaps(aln_mm, ranges = expandGR(g_mm, ext=ext))
## filter by distance
distance(aln_mm) <- distance(peaks(aln_mm), g_mm, ignore.strand=TRUE)
aln_mm <- subset(aln_mm, pval<0.05 & distance >5000)

Step 5, export the multiple alignments in order.

The selected candidates will be aligned cross human and mouse and then output as phylip multiple alignment file in text format.

al <- alignment(seqEN, list(human=aln_hs, mouse=aln_mm),
                method="ClustalW", order="input")
al
## [[1]]
## DNAMultipleAlignment with 3 rows and 1686 columns
##      aln                                                    names               
## [1] TGGCATACACAGCAAACATCATGAAT...TAGAAACAGAAGCAAATTGGCGAGT Enhancer
## [2] --------------------------...------------------------- human_chr7:128340...
## [3] --------------------------...------------------------- mouse_chr6:291629...
library(MotifDb)
motifs <- query(MotifDb, "JASPAR_CORE")
consensus <- sapply(motifs, consensusString)
consensus <- DNAStringSet(gsub("\\?", "N", consensus))
tmpfolder <- tempdir()
saveAlignments(al, output_folder = tmpfolder, motifConsensus=consensus)
readLines(file.path(tmpfolder, "aln1.phylip.txt"))
##   [1] " 5 1686"                                                                                  
##   [2] "Enhancer                           TGGCATACAC AGCAAACATC ATGAATTTAA TTTAATTTAA TTTAATTTAA"
##   [3] "human_chr7:128340906-128341905:-   ---------- ---------- ---------- ---------- ----------"
##   [4] "mouse_chr6:29162916-29163915:+     ---------- ---------- ---------- ---------- ----------"
##   [5] "Consensus                          ---------- ---------- ---------- ---------- ----------"
##   [6] "motifConsensus                     ---------- ---------- ---------- ---------- ----------"
##   [7] ""                                                                                         
##   [8] "                                   TTTAATTTTT TTAATTTAAT TTTAATATTT TAAAATAAAA TAAAATAAAA"
##   [9] "                                   ---------- ---------- ---------- ---------- ----------"
##  [10] "                                   ---------- ---------- ---------- ---------- ----------"
##  [11] "                                   ---------- ---------- ---------- ---------- ----------"
##  [12] "                                   ---------- ---------- ---------- ---------- ----------"
##  [13] ""                                                                                         
##  [14] "                                   TAAAATAAAA TAAAAGATAA AGATAAAGAT AAAATAAAAT TCAACTCAAT"
##  [15] "                                   ---------- ---------- ---------- ---------- ----------"
##  [16] "                                   ---------- ---------- ---------- ---------- ----------"
##  [17] "                                   ---------- ---------- ---------- ---------- ----------"
##  [18] "                                   ---------- ---------- ---------- ---------- ----------"
##  [19] ""                                                                                         
##  [20] "                                   TAAATTAAAA CTAAGCTAAA ATAAAAATAC AATAAAATAA ATTTCAATTT"
##  [21] "                                   ---------- ---------- ---------- -------GGT GTTAGTAGCT"
##  [22] "                                   ---------- ---------- ---------- ---------- ----------"
##  [23] "                                   ---------- ---------- ---------- ---------- -TT---A--T"
##  [24] "                                   ---------- ---------- ---------- ---------- ----------"
##  [25] ""                                                                                         
##  [26] "                                   AATGTAATTT AATTTAAAAA GGGACTACGC CGAAAAGAAA ATGAATGAAT"
##  [27] "                                   GAATTAACTC CTCCTCACCA GCCCCCATGT TCTTCCATAG --GACTCCAC"
##  [28] "                                   ---------- ---------- ---------- ---------- ----------"
##  [29] "                                   -A--TAA-T- ----T-A--A G---C-A-G- --------A- --GA-T--A-"
##  [30] "                                   ---------- ---------- ---------- ---------- ----------"
##  [31] ""                                                                                         
##  [32] "                                   GGATGAATAA ATAATTTAAT TTAATTTAAT TTAATTTAAT TTAATTTAAT"
##  [33] "                                   AAA-GAATAA T--ATTAGGA CCAAGGAGAT GAAAACTACA AATATCAACT"
##  [34] "                                   ---------- ---ACAACGT CTTATATAAC CCAAGTAGGC CTCAACT-TC"
##  [35] "                                   --A-GAATAA ---ATT---T -TAAT-TAAT --AA-TTA-- -T-AT-TA-T"
##  [36] "                                   ---------- ---------- ---------- ---------- ----------"
##  [37] ""                                                                                         
##  [38] "                                   TTAATTTAAT TTAATTTAAT TTAATTTAAT TTAATTTAAT TTAATTTAAT"
##  [39] "                                   GTAACCCACC TTTGCCCTTT TTGGTTTAAC ---ATTTTGT ATAGTAAAAT"
##  [40] "                                   CTGATCCTCC TGCCTCCACT T--ACTAAGT ---GCTGGGA TTACAGTCAG"
##  [41] "                                   -TAAT--A-- TT--T--A-T TT-ATTTAAT ---ATTT--T TTA-T-TAAT"
##  [42] "                                   ---------- ---------- ---------- ---------- ----------"
##  [43] ""                                                                                         
##  [44] "                                   TTAATTTAAT TTGTTCGGCA CAGTAT-AAT ATGCTAGCAT CTCAGTTATT"
##  [45] "                                   TTTAAATAAT TTATT--GCA TCAAAA-AGG AAAAGGACAG ATCAAGAA--"
##  [46] "                                   ATACCACCAC TCCTG--GCA CAACATTAGT GTTGTAACAT -TTATTTA--"
##  [47] "                                   TTAA--TAAT TT-TT--GCA CA--AT-A-T AT--TA-CAT -TCA-TTA--"
##  [48] "                                   ---------- ---------- ---------- ---------- ----------"
##  [49] ""                                                                                         
##  [50] "                                   TCACGTGTGT TGTTACTATA AAATAAGCAA AACAGTGATA AAATAAGTTT"
##  [51] "                                   ----GTCTGC CTTTACCTGA AAAGGAC--- -------ACA TAGTTCATAT"
##  [52] "                                   ---CTTATGT GTGTGCAAAA GTATGGG--- -------AAA AATCTGGTAG"
##  [53] "                                   ---CGT-TGT --TTAC-A-A AAAT-AG--- -------A-A AA-T--GT-T"
##  [54] "                                   ---------- ---------- ---------- ---------- ----------"
##  [55] ""                                                                                         
##  [56] "                                   GTGTTGCTTA TCTTATGACT GG--TGGAAT GTAACAGGGA AAAAAAGCAC"
##  [57] "                                   GCTCTGCTTC TCTTTCCACT GCATTTGAAT CTCCTTCAGG GTATAAGCCC"
##  [58] "                                   GAGTTGGTTC T-TTCTGTCT AT--CAGGTT GAAGTGGGTC CTGAGATCTC"
##  [59] "                                   G-GTTGCTT- TCTT-TGACT G---T-GAAT GTA---GGG- --AAAAGC-C"
##  [60] "                                   ---------- ---------- ---------- ---------- ---AAAGCN-"
##  [61] ""                                                                                         
##  [62] "                                   ATACTGTGAC TTTGACAAAA CTGAGTGACT GATGATAATA AACTTCTCTT"
##  [63] "                                   ---------- TTTGACACAG ----GACACT GATCTACATA AACTGGT---"
##  [64] "                                   TT-------- CTCGATGTAC TC--GTGAAA GTTGGACATA ATTTATTCTT"
##  [65] "                                   -T-------- TTTGACA-A- ----GTGACT GATG---ATA AACT--TCTT"
##  [66] "                                   ---------N TTTGACA--- ---------- ---------- ----------"
##  [67] ""                                                                                         
##  [68] "                                   CTCGTAAG-C TGACAGTTCA TAAAACCTCT GCTTGTTTTT TTGTACTTTT"
##  [69] "                                   ----TAAA-G CAACATAACA TCAA---TTT TTCTCTTCTC TTGAACTCCT"
##  [70] "                                   CTGCCAAGTC TTTCATTTTA TCTG---TCT CTCAATTAGA TATTACTTTC"
##  [71] "                                   CT--TAAG-C T-ACA-TTCA T-AA---TCT ---T-TT-T- TTGTACTTTT"
##  [72] "                                   ---------- ---------- ---------- ---------- ----------"
##  [73] ""                                                                                         
##  [74] "                                   AATCTTAAGG TGACGCATGT AGCTTCCTGT CCTTCTCAGT TTACTGACAG"
##  [75] "                                   TATCATAA-- ------AAGC TGCCCACTCA CCTT-TCAGA AAATGAGCAC"
##  [76] "                                   AAACATGG-- ------GTGC TTCCTTCTAC AATT----GT TTGGGATTAA"
##  [77] "                                   AATC-TAA-- ------ATG- -GC-T-CT-- CCTT-TCAGT TTA----CA-"
##  [78] "                                   ---------- ---------- ---------- ---------- ----------"
##  [79] ""                                                                                         
##  [80] "                                   AGGTTAGGGT TTAATCCCAG ATATCCAGTC TGACTGTACA GTAGTTCAGG"
##  [81] "                                   AAGAAAATTC ATAGCCCTTC ATGATCAGTA CAACCA-AAA GTAAAGCAGC"
##  [82] "                                   AGGTGTGTGC TGAAGC---- -TGAGCCACA CAACTAGAAA TAGGATTTTC"
##  [83] "                                   AGGT-AG-G- TTAA-CC--- AT---CAGT- --ACT--A-A GTAG-TCAG-"
##  [84] "                                   ---------- ---------- ---------- ---------- ----------"
##  [85] ""                                                                                         
##  [86] "                                   AGACCGACGC AGATTTATAG CATCATTCGT CAAACCCTGA GGATAATCAT"
##  [87] "                                   TGA--AATGT AAA-----AG AGTTTGACAC CCAAATGAAA GGACATACAG"
##  [88] "                                   TAGTAAACAC CAC------- AATCTGAGGA CTCAGTGTGA TCG-AATCTC"
##  [89] "                                   -GA---ACGC A-A-----AG -ATC---CG- C-AA---TGA GGA-AATCA-"
##  [90] "                                   ---------- ---------- ---------- ---------- ----------"
##  [91] ""                                                                                         
##  [92] "                                   TTGTCACAGC TTCCTTTGGT -CATCATTAC TGTG-CAAAT AAACTGTTAG"
##  [93] "                                   TAAATACCAC ACATTTTACA -TGTCTGCAT AAAG-CATGC TCAC--ATAC"
##  [94] "                                   CTGTCACACC TTCATTCACT TCCTCGTCAC CCTGGCCACC AAGCTCTCTT"
##  [95] "                                   TTGTCACA-C TTC-TTT--T -C-TC-T-AC --TG-CAA-- AAACT-TTA-"
##  [96] "                                   ---------- ---------- ---------- ---------- ----------"
##  [97] ""                                                                                         
##  [98] "                                   AGCATGAGCC AGCAAAAACA GTGGGAAACG CAGCAATTTC CTGTATTTAA"
##  [99] "                                   ATTTAGAGTC CCTGACAACC CTGTGAAATT A------TTC CCATCTTAAA"
## [100] "                                   CTCACACAAG GACATACTTG CTTTAATATC CTCAAGGTTG CTGTTTCCTG"
## [101] "                                   A-CA-GAG-C --CAAAAAC- -TG-GAAA-- C---A--TTC CTGT-TT-AA"
## [102] "                                   ---------- ---------- ---------- ---------- ----------"
## [103] ""                                                                                         
## [104] "                                   TAGTCTGTGA GATATACTTT AATGAGATGA AATTGAAGAA AACTGAGTCA"
## [105] "                                   GAGGTGAAGT GACATGCACT GCT------- -ATTGAGGAA GACT---TCA"
## [106] "                                   AATTCAGACT GTTTTATCCC CCC------A GTTCTTAGTT CACT--CTCC"
## [107] "                                   -AGTC-G-G- GATATAC--T --T------A -ATTGAAGAA -ACT---TCA"
## [108] "                                   ---------- -ATATA---- ---------- ---------- ----------"
## [109] ""                                                                                         
## [110] "                                   TTAGAAAGGC ATTCACATAA ACTTTCCTG- GTGTATATTT CCTAACTCTC"
## [111] "                                   TTCCACATAC CCTCCCTCCC GTTTCACTGA GTGCAGCACT CTTTCCAGTA"
## [112] "                                   TCAAACAGGT CTTCCCT--A GCCACCCTAT CTACACACAC ACATTCCCTC"
## [113] "                                   TTA-A-AGGC -TTC-C---A -CTT-CCTG- GTG-A-A--T CCT--C-CTC"
## [114] "                                   ---------- ---------- ---------- ---------- ----------"
## [115] ""                                                                                         
## [116] "                                   TTCCAGTGTT TTCTACACCA GAAGAGTTCA TTAC-ATCAT TGAAGGACAA"
## [117] "                                   CATATCAGCT GTCTGGAT-A GGAAAGTGCA TAGT-TTTTT AAAAGGAAG-"
## [118] "                                   TTT-TGACTT TTCTCCTTAG CATAAATTTA TCACCATTAT CAACATATAG"
## [119] "                                   TT---G-GTT TTCT-CA--A GAA-AGTTCA T-AC-AT-AT --AAGGA-A-"
## [120] "                                   ---------- ---------- ---------- ---------- ----------"
## [121] ""                                                                                         
## [122] "                                   TGCTGAAAAA TAAGAACGCG TTTGGTTTTT CATAAACCAC ATGGTCTTGT"
## [123] "                                   ---TAAGAGA TTTTTAAGGA TTTAG----- -ACCAAGTAC ACAAGGTGTA"
## [124] "                                   T-TTGGCTAT TTGTTGGCTG TCTTT----- --CAAACCAT AAGTGTATGA"
## [125] "                                   T--TGA-AAA T----A-G-G TTT-G----- -A-AAACCAC A-G---TTG-"
## [126] "                                   ---------- ---------- ---------- ---------- ----------"
## [127] ""                                                                                         
## [128] "                                   GGGTCATGT- TGTTTTGTTT CTTTAGATTT GAGAGACGGG GAATGATGTG"
## [129] "                                   AGGTAAGAT- TGCATGTAGC CTATATTCTG AGGACTCAAA GGAGAAC---"
## [130] "                                   GAACAGAATC TGCGTCAGTG ATTTAGTGTC -ACACACCTG TTGTATC---"
## [131] "                                   GGGT-A--T- TG--T---T- CTTTAG--T- -AGA-AC--G G-AT-A----"
## [132] "                                   ---------- ---------- ---------- ---------- ----------"
## [133] ""                                                                                         
## [134] "                                   ATTTTGCCCA GTCAGCATGG ATATGATTTG GACTTC-CAT CTGTTTAAGA"
## [135] "                                   -TCCCATTCA AT-GGCCTAT -TCTTCCCTG GAATAC-AAC CACCTGGGTA"
## [136] "                                   -CTAAGGGCA GCCAGCAATG TTCTAAAATA CAGTTGGCAT CCAACAAGTA"
## [137] "                                   -TT--G--CA GTCAGCAT-G -T-T-A--TG GA-TTC-CAT C---T-A--A"
## [138] "                                   ---------- ---------- ---------- ---------- ----------"
## [139] ""                                                                                         
## [140] "                                   TTAAATGGTA GACAGAGAGA AATATTTCTG TTTTTTTTAT CCATGATTGC"
## [141] "                                   CGGAATGCTA CTTACACTGG AACCCAGTAC ATACATATAT TCTTTATTTA"
## [142] "                                   ATAAATAA-A TAAACAGATG AATGCATAGG ATGTGCTAAC GTGCTAATCA"
## [143] "                                   -TAAATG-TA -A-A-AGAG- AAT---T--G -T-T-TTTAT -C-T-ATT--"
## [144] "                                   ---------- ---------- ---------- ---------- ----------"
## [145] ""                                                                                         
## [146] "                                   AAATCTGTGG GTT---CAA- -AGTCTGCTT TTGTTCCAAA TAATCATTC-"
## [147] "                                   AAACTTAAAA GTTTTACAA- -ATACTTATC CTTTACCATA TATGGACGT-"
## [148] "                                   AACTGAGTCC TCTCACTAGC CAGATCAACT CCAGACCATT TAATGTCTTT"
## [149] "                                   AAAT-TGT-- GTT---CAA- -AG-CT--TT -T-T-CCA-A TAAT-A-T--"
## [150] "                                   ---------- ---------- ---------- ---------- ----------"
## [151] ""                                                                                         
## [152] "                                   -AAACCTGCC GTACTGTGTG GGGTGGGAAG TGAAGGAGGA TCTTATCTGG"
## [153] "                                   -ACTCCTATA CTATTTAATT TTTTAAGAAC TCTAGGCCAG GCGTGGTGGC"
## [154] "                                   AAAGATTATA TTCCTTTATT TAA-AAAAAA TGGGGTTAAG GCTCTTTGTC"
## [155] "                                   -AA-CCT--- -TACT-T-T- ---T--GAA- TG-AGG---- -CTT-T--G-"
## [156] "                                   ---------- ---------- ---------- ---------- ----------"
## [157] ""                                                                                         
## [158] "                                   AAATCATGTG CTGTATGATG AAGGCAGGAT ATGGAAAACT CCAAATATGG"
## [159] "                                   A--------- ---------- ---------- ---------- ----------"
## [160] "                                   CTATAAGATT GTTGAGAACA TTAAGTAAGT TAACCCATGT CAAACTAAGC"
## [161] "                                   A-AT-A--T- -T--A--A-- ---------T ------A--T C-AA-TA-G-"
## [162] "                                   ---------- ---------- ---------- ---------- ----------"
## [163] ""                                                                                         
## [164] "                                   ACACCTTTAT GTGTGCAAGG GAGAAAGTCT GAAGGATGCA ACCTGTTCAT"
## [165] "                                   ---------- ---------- ---------- ---------- ----------"
## [166] "                                   ACTC------ ---------- ---------- ---------- ----------"
## [167] "                                   AC-C------ ---------- ---------- ---------- ----------"
## [168] "                                   ---------- ---------- ---------- ---------- ----------"
## [169] ""                                                                                         
## [170] "                                   AACATTTTCA TTCAAATTTA AACTAGTTTG ATTAATTCCA AATGCACATT"
## [171] "                                   ---------- ---------- ---------- ---------- ----------"
## [172] "                                   ---------- ---------- ---------- ---------- ----------"
## [173] "                                   ---------- ---------- ---------- ---------- ----------"
## [174] "                                   ---------- ---------- ---------- ---------- ----------"
## [175] ""                                                                                         
## [176] "                                   TGATTTGTTG TGTTTTTATG ATGTATTTCA CAATACTGTT GCATAAAATA"
## [177] "                                   ---------- ---------- ---------- ---------- ----------"
## [178] "                                   ---------- ---------- ---------- ---------- ----------"
## [179] "                                   ---------- ---------- ---------- ---------- ----------"
## [180] "                                   ---------- ---------- ---------- ---------- ----------"
## [181] ""                                                                                         
## [182] "                                   TCTAAAAAAA ACATTTAGTT ATATGGAAGA CACTTGGACA ACTGGTTGTT"
## [183] "                                   ---------- ---------- ---------- ---------- ----------"
## [184] "                                   ---------- ---------- ---------- ---------- ----------"
## [185] "                                   ---------- ---------- ---------- ---------- ----------"
## [186] "                                   ---------- ---------- ---------- ---------- ----------"
## [187] ""                                                                                         
## [188] "                                   ATTTGTTTGT CTATTTTTAT GAATGCCTCA AAGATCAAAT AGTTACACAC"
## [189] "                                   ---------- ---------- ---------- ---------- ----------"
## [190] "                                   ---------- ---------- ---------- ---------- ----------"
## [191] "                                   ---------- ---------- ---------- ---------- ----------"
## [192] "                                   ---------- ---------- ---------- ---------- ----------"
## [193] ""                                                                                         
## [194] "                                   TTAATGCAAT CGAGCTTAGA GAGAGAAATT AAAAGTCTTA AATAAATTGT"
## [195] "                                   ---------- ---------- ---------- ---------- ----------"
## [196] "                                   ---------- ---------- ---------- ---------- ----------"
## [197] "                                   ---------- ---------- ---------- ---------- ----------"
## [198] "                                   ---------- ---------- ---------- ---------- ----------"
## [199] ""                                                                                         
## [200] "                                   GATTAGATAA ATAGAAACAG AAGCAAATTG GCGAGT"               
## [201] "                                   ---------- ---------- ---------- ------"               
## [202] "                                   ---------- ---------- ---------- ------"               
## [203] "                                   ---------- ---------- ---------- ------"               
## [204] "                                   ---------- ---------- ---------- ------"

Session info

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MotifDb_1.35.5                           
##  [2] org.Mm.eg.db_3.13.0                      
##  [3] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0
##  [4] org.Hs.eg.db_3.13.0                      
##  [5] TxDb.Hsapiens.UCSC.hg38.knownGene_3.13.0 
##  [6] GenomicFeatures_1.45.2                   
##  [7] AnnotationDbi_1.55.1                     
##  [8] Biobase_2.53.0                           
##  [9] BSgenome.Mmusculus.UCSC.mm10_1.4.3       
## [10] BSgenome.Hsapiens.UCSC.hg38_1.4.3        
## [11] BSgenome.Drerio.UCSC.danRer10_1.4.2      
## [12] BSgenome_1.61.0                          
## [13] rtracklayer_1.53.1                       
## [14] Biostrings_2.61.2                        
## [15] XVector_0.33.0                           
## [16] GenomicRanges_1.45.0                     
## [17] GenomeInfoDb_1.29.8                      
## [18] IRanges_2.27.2                           
## [19] S4Vectors_0.31.3                         
## [20] BiocGenerics_0.39.2                      
## [21] BiocParallel_1.27.7                      
## [22] enhancerHomologSearch_0.99.12            
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7                matrixStats_0.60.1         
##  [3] fs_1.5.0                    bit64_4.0.5                
##  [5] filelock_1.0.2              progress_1.2.2             
##  [7] httr_1.4.2                  rprojroot_2.0.2            
##  [9] tools_4.1.1                 utf8_1.2.2                 
## [11] R6_2.5.1                    DBI_1.1.1                  
## [13] splitstackshape_1.4.8       withr_2.4.2                
## [15] tidyselect_1.1.1            prettyunits_1.1.1          
## [17] bit_4.0.4                   curl_4.3.2                 
## [19] compiler_4.1.1              textshaping_0.3.5          
## [21] xml2_1.3.2                  desc_1.3.0                 
## [23] DelayedArray_0.19.2         rappdirs_0.3.3             
## [25] pkgdown_1.6.1               systemfonts_1.0.2          
## [27] stringr_1.4.0               digest_0.6.27              
## [29] Rsamtools_2.9.1             rmarkdown_2.10             
## [31] pkgconfig_2.0.3             htmltools_0.5.2            
## [33] MatrixGenerics_1.5.4        dbplyr_2.1.1               
## [35] fastmap_1.1.0               rlang_0.4.11               
## [37] RSQLite_2.2.8               BiocIO_1.3.0               
## [39] generics_0.1.0              jsonlite_1.7.2             
## [41] dplyr_1.0.7                 RCurl_1.98-1.4             
## [43] magrittr_2.0.1              GenomeInfoDbData_1.2.6     
## [45] Matrix_1.3-4                Rcpp_1.0.7                 
## [47] fansi_0.5.0                 lifecycle_1.0.0            
## [49] stringi_1.7.4               yaml_2.2.1                 
## [51] SummarizedExperiment_1.23.4 zlibbioc_1.39.0            
## [53] BiocFileCache_2.1.1         grid_4.1.1                 
## [55] blob_1.2.2                  parallel_4.1.1             
## [57] crayon_1.4.1                lattice_0.20-44            
## [59] hms_1.1.0                   KEGGREST_1.33.0            
## [61] knitr_1.34                  pillar_1.6.2               
## [63] rjson_0.2.20                biomaRt_2.49.4             
## [65] XML_3.99-0.7                glue_1.4.2                 
## [67] evaluate_0.14               data.table_1.14.0          
## [69] vctrs_0.3.8                 png_0.1-7                  
## [71] purrr_0.3.4                 assertthat_0.2.1           
## [73] cachem_1.0.6                xfun_0.25                  
## [75] restfulr_0.0.13             ragg_1.1.3                 
## [77] tibble_3.1.4                GenomicAlignments_1.29.0   
## [79] memoise_2.0.0               ellipsis_0.3.2
1.
Howe, K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013).
2.
Braasch, I. et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nature genetics 48, 427–437 (2016).
3.
Wong, E. S. et al. Deep conservation of the enhancer regulatory code in animals. Science 370, (2020).
4.
Pages, H., Aboyoun, P., Gentleman, R. & DebRoy, S. Package “biostrings”. R package version