There is an increasing requirement for the tools to identify of putative mammalian orthologs to enhancers in species other than human and mouse, such as zebrafish, which is lacking whole genome comparison analysis data. Take zebrafish as an example, there are two major methods to identify the orthologs to enhancers in human and mouse,
use the whole genome comparison analysis data and conservation data1,
use spotted gar genome as bridge genome to search the orthologs2.
Both methods will work well in the coding region. However, there is lacking comparative data in distal regulation region such as enhancers and silencers.
In 2020, Emily S. Wong et. al. provides a new method for identification of putative human orthologs to enhancers of zebrafish3. They used the method to interrogate conserved syntenic regions and human and mouse using candidate sponge enhancer sequences. First, they looked for overlap with available functional genomics information. For example, they used mouse ENCODE data to infer enhancer activity based on histone marks in specific tissues. Second, they select the best-aligned region by whole genome alignment from the candidates regions for human and mouse as orthologs. This method provides the possibility to search orthologs for enhancers or silencers even there is not genome comparative data available.
This package is modified from Wong’s methods and provide the easy-to-use script for researchers to quick search putative mammalian orthologs to enhancers. The modified algorithm is: The candidate regions were determined by ENCODE histone marks (default is H3K4me1) in specific tissue for human and mouse. The mapping score were calculated by pairwise Transcription Factors Binding Pattern Similarity (TFBPS) between enhancer sequences and candidates by fast motif match4. The Z-score were calculated from mapping score and then converted to P-value based on two-side test from a normal distribution. The candidates were filtered by p-value and distance from the TSS of target homologs. And then the top candidates from human and mouse were aligned to each other and exported as multiple alignments with given enhancer.
First install enhancerHomologSearch
and other packages required to run the examples. Please note the example dataset used here is from zebrafish. To run analysis with dataset from a different species or different assembly, please install the corresponding Bsgenome and TxDb. For example, to analyze cattle data aligned to bosTau9, please install BSgenome.Btaurus.UCSC.bosTau9, and TxDb.Btaurus.UCSC.bosTau9.refGene. You can also generate a TxDb object by functions makeTxDbFromGFF from a local gff file, or makeTxDbFromUCSC, makeTxDbFromBiomart, and makeTxDbFromEnsembl, from online resources in GenomicFeatures package.
if (!"BiocManager" %in% rownames(installed.packages()))
install.packages("BiocManager")
library(BiocManager)
BiocManager::install(c("enhancerHomologSearch",
"BiocParallel",
"BSgenome.Drerio.UCSC.danRer10",
"BSgenome.Hsapiens.UCSC.hg38",
"BSgenome.Mmusculus.UCSC.mm10",
"TxDb.Hsapiens.UCSC.hg38.knownGene",
"TxDb.Mmusculus.UCSC.mm10.knownGene",
"org.Hs.eg.db",
"org.Mm.eg.db",
"MotifDb",
"motifmatchr"))
If you have trouble in install enhancerHomologSearch, please check your R version first. The enhancerHomologSearch
package require R >= 4.1.0.
R.version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status RC
## major 4
## minor 2.0
## year 2022
## month 04
## day 19
## svn rev 82224
## language R
## version.string R version 4.2.0 RC (2022-04-19 r82224)
## nickname Vigorous Calisthenics
In this example, we will use an enhancer of lepb
gene in zebrafish.
# load genome sequences
library(BSgenome.Drerio.UCSC.danRer10)
# define the enhancer genomic coordinates
LEN <- GRanges("chr4", IRanges(19050041, 19051709))
# extract the sequences as Biostrings::DNAStringSet object
(seqEN <- getSeq(BSgenome.Drerio.UCSC.danRer10, LEN))
## DNAStringSet object of length 1:
## width seq
## [1] 1669 TGGCATACACAGCAAACATCATGAATTTAATTTA...TAGATAAATAGAAACAGAAGCAAATTGGCGAGT
By default, the hisone marker is H3K4me1. Users can also define the markers by markers
parameter in the function getENCODEdata
. To make sure the markers are tissue specific, we can filter the data by biosample_name
and biosample_type
parameters. For additional filters, please refer ?getENCODEdata
.
# load library
library(enhancerHomologSearch)
library(BSgenome.Hsapiens.UCSC.hg38)
library(BSgenome.Mmusculus.UCSC.mm10)
# download enhancer candidates for human heart tissue
hs <- getENCODEdata(genome=Hsapiens,
partialMatch=c(biosample_summary = "heart"))
# download enhancer candidates for mouse heart tissue
mm <- getENCODEdata(genome=Mmusculus,
partialMatch=c(biosample_summary = "heart"))
This step is time consuming step. For quick run, users can subset the data by given genomic coordinates.
# subset the data for test run
# in human, the homolog LEP gene is located at chromosome 7
# In this test run, we will only use upstream 1M and downstream 1M of homolog
# gene
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
eid <- mget("LEP", org.Hs.egALIAS2EG)[[1]]
g_hs <- select(TxDb.Hsapiens.UCSC.hg38.knownGene,
keys=eid,
columns=c("GENEID", "TXCHROM", "TXSTART", "TXEND", "TXSTRAND"),
keytype="GENEID")
g_hs <- range(with(g_hs, GRanges(TXCHROM, IRanges(TXSTART, TXEND))))
expandGR <- function(x, ext){
stopifnot(length(x)==1)
start(x) <- max(1, start(x)-ext)
end(x) <- end(x)+ext
GenomicRanges::trim(x)
}
hs <- subsetByOverlaps(hs, expandGR(g_hs, ext=1000000))
# in mouse, the homolog Lep gene is located at chromosome 6
# Here we use the subset of 1M upstream and downstream of homolog gene.
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
library(org.Mm.eg.db)
eid <- mget("Lep", org.Mm.egALIAS2EG)[[1]]
g_mm <- select(TxDb.Mmusculus.UCSC.mm10.knownGene,
keys=eid,
columns=c("GENEID", "TXCHROM", "TXSTART", "TXEND", "TXSTRAND"),
keytype="GENEID")
g_mm <- range(with(g_mm,
GRanges(TXCHROM,
IRanges(TXSTART, TXEND),
strand=TXSTRAND)))
g_mm <- g_mm[seqnames(g_mm) %in% "chr6" & strand(g_mm) %in% "+"]
mm <- subsetByOverlaps(mm, expandGR(g_mm, ext=1000000))
# search the binding pattern
data(motifs)
## In the package, there are 10 sets of motif cluster sets.
## In this example, we use motif clusters merged by distance 60, which
## is calculated by matalgin (motifStack implementation)
PWMs <- motifs[["dist60"]]
aln_hs <- searchTFBPS(seqEN, hs, PWMs = PWMs,
queryGenome = Drerio)
aln_mm <- searchTFBPS(seqEN, mm, PWMs = PWMs,
queryGenome = Drerio)
## if you want to stick to sequence similarity search, try to use ?alignmentOne
Here we will filter the candidate regions more than 5K from TSS of homolog but within 100K from the gene body. The candidates will be also filtered by p-value.
# Step4
ext <- 100000
aln_hs <- subsetByOverlaps(aln_hs, ranges = expandGR(g_hs, ext=ext))
## filter by distance
distance(aln_hs) <- distance(peaks(aln_hs), g_hs, ignore.strand=TRUE)
aln_hs <- subset(aln_hs, pval<0.1 & distance >5000)
aln_hs
## This is an object with 1 Enhancers for Homo sapiens
aln_mm <- subsetByOverlaps(aln_mm, ranges = expandGR(g_mm, ext=ext))
## filter by distance
distance(aln_mm) <- distance(peaks(aln_mm), g_mm, ignore.strand=TRUE)
aln_mm <- subset(aln_mm, pval<0.1 & distance >5000)
aln_mm
## This is an object with 3 Enhancers for Mus musculus
The selected candidates will be aligned cross human and mouse and then output as phylip multiple alignment file in text format.
al <- alignment(seqEN, list(human=aln_hs, mouse=aln_mm),
method="ClustalW", order="input")
al
## [[1]]
## DNAMultipleAlignment with 3 rows and 1676 columns
## aln names
## [1] TGGCATACACAGCAAACATCATGAAT...TAGAAACAGAAGCAAATTGGCGAGT Enhancer
## [2] --------------------------...------------------------- human_chr7:128264...
## [3] --------------------------...------------------------- mouse_chr6:290320...
##
## [[2]]
## DNAMultipleAlignment with 3 rows and 1707 columns
## aln names
## [1] TGGCATACACAGCAAACATCATGAAT...TAGAAACAGAAGCAAATTGGCGAGT Enhancer
## [2] --------------------------...------------------------- human_chr7:128264...
## [3] --------------------------...------------------------- mouse_chr6:291474...
##
## [[3]]
## DNAMultipleAlignment with 3 rows and 1683 columns
## aln names
## [1] TGGCATACACAGCAAACATCATGAAT...TAGAAACAGAAGCAAATTGGCGAGT Enhancer
## [2] --------------------------...------------------------- human_chr7:128264...
## [3] --------------------------...------------------------- mouse_chr6:290312...
library(MotifDb)
motifs <- query(MotifDb, "JASPAR_CORE")
consensus <- sapply(motifs, consensusString)
consensus <- DNAStringSet(gsub("\\?", "N", consensus))
tmpfolder <- tempdir()
saveAlignments(al, output_folder = tmpfolder, motifConsensus=consensus)
readLines(file.path(tmpfolder, "aln1.phylip.txt"))
## [1] " 5 1676"
## [2] "Enhancer TGGCATACAC AGCAAACATC ATGAATTTAA TTTAATTTAA TTTAATTTAA"
## [3] "human_chr7:128264261-128265260:+ ---------- ---------- ---------- ---------- ----------"
## [4] "mouse_chr6:29032045-29033044:+ ---------- ---------- ---------- ---------- ----------"
## [5] "Consensus ---------- ---------- ---------- ---------- ----------"
## [6] "motifConsensus ---------- ---------- ---------- ---------- ----------"
## [7] ""
## [8] " TTTAATTTTT TTAATTTAAT TTTAATATTT TAAAATAAAA TAAAATAAAA"
## [9] " ---------- ---------- ---------- ---------- ----------"
## [10] " ---------- ---------- ---------- ---------- ----------"
## [11] " ---------- ---------- ---------- ---------- ----------"
## [12] " ---------- ---------- ---------- ---------- ----------"
## [13] ""
## [14] " TAAAATAAAA TAAAAGATAA AGATAAAGAT AAAATAAAAT TCAACTCAAT"
## [15] " ---------- ---------- ---------- ---------- ----------"
## [16] " ---------- ---------- ---------- ---------- ----------"
## [17] " ---------- ---------- ---------- ---------- ----------"
## [18] " ---------- ---------- ---------- ---------- ----------"
## [19] ""
## [20] " TAAATTAAAA CTAAGCTAAA ATAAAAATAC AATAAAATAA ATTTCAATTT"
## [21] " ---------- ---------- ---------- ---------- ----------"
## [22] " ---------- ---------- ---------- ---------- ----------"
## [23] " ---------- ---------- ---------- ---------- ----------"
## [24] " ---------- ---------- ---------- ---------- ----------"
## [25] ""
## [26] " AATGTAATTT AATTTAAAAA GGGACTACGC CGAAAAGAAA ATGAATGAAT"
## [27] " ---------- ---------- ---------- ---------- ----------"
## [28] " ---------- ---------- ---------- ---------- ----------"
## [29] " ---------- ---------- ---------- ---------- ----------"
## [30] " ---------- ---------- ---------- ---------- ----------"
## [31] ""
## [32] " GGATGAATAA ATAATTTAAT TTAATTTAAT TTAATTTAAT TTAATTTAAT"
## [33] " ---------- ---------- ---------- ---------- ----------"
## [34] " ---------- ---------- ---------- ---------- ----------"
## [35] " ---------- ---------- ---------- ---------- ----------"
## [36] " ---------- ---------- ---------- ---------- ----------"
## [37] ""
## [38] " TTAATTTAAT TTAATTTAAT TTAATTTAAT TTAATTTAAT TTAATTTAAT"
## [39] " ---------- ---------- ---------- ---------- ----------"
## [40] " ---------- ---------- ---------- ---------- ----------"
## [41] " ---------- ---------- ---------- ---------- ----------"
## [42] " ---------- ---------- ---------- ---------- ----------"
## [43] ""
## [44] " TTAATTTAAT TTGTTCGGCA CAGTATAATA TGCTAGCATC TCAGTTATTT"
## [45] " ---------- ---------- ---------- ---------- ----------"
## [46] " ---------- ---------- ---------- ---------- ----------"
## [47] " ---------- ---------- ---------- ---------- ----------"
## [48] " ---------- ---------- ---------- ---------- ----------"
## [49] ""
## [50] " CACGTGTGTT GTTACTATAA AATAAGCAAA ACAGTGATAA AATAAGTTTG"
## [51] " ---------- ---------- ---------- ---------- ----------"
## [52] " ---------- ---------- ---------- ---------- ----------"
## [53] " ---------- ---------- ---------- ---------- ----------"
## [54] " ---------- ---------- ---------- ---------- ----------"
## [55] ""
## [56] " TGTTGCTTAT CTTATGACTG GTGGAATGTA ACAGGGAAAA AAAGCACATA"
## [57] " ---------- ---------- ---------- ---------- ----------"
## [58] " ---------- ---------- ---------- ---------- ----------"
## [59] " ---------- ---------- ---------- ---------- ----------"
## [60] " ---------- ---------- ---------- ---------- ----------"
## [61] ""
## [62] " CTGTGACTTT GACAAAACTG AGTGACTGAT GATAATAAAC TTCTCTTCTC"
## [63] " ---------- ---------- ---------- ------GAAC AGAGAGGATT"
## [64] " ---------- ---------- ---------- ---TCTGAAC TGCCTTGTTT"
## [65] " ---------- ---------- ---------- -----T-AAC T-C--T--T-"
## [66] " ---------- ---------- ---------- ---------- ----------"
## [67] ""
## [68] " GTAAGCTGAC AGTTCATAAA ACCTCTGCTT GTTTTTTTGT ACTTTTAATC"
## [69] " GCCTGGAG-- -GTTCCTAGG ACCACAGCAA GAGGTGTT-- GTGGGGGGCT"
## [70] " ACCAGCTG-- -TCTTGCTAA ACCTCAGCCT ATGCCGGTA- GCAGGCTGTT"
## [71] " G--AGCTG-- -GTTC-TAAA ACCTC-GC-T GT--T-TT-- -C------T-"
## [72] " ---------- ---------- ---------- ---------- ----------"
## [73] ""
## [74] " TTAAGGTGAC GCATGTAGCT TCCTGTCCTT CT-CAGTTTA CTGACAGAGG"
## [75] " T---CCCGGC T-TCCCGGAG GCCCTTCTTC CCATGACAGG AGG---GACA"
## [76] " TAGGCCGGGT TATTCTAGAG CTTGGATCTT TAGTGGTTTA ATGTTTAAGG"
## [77] " T------G-C ---T-TAG-- -CC-GTCCTT C----GTTTA -TG---GAGG"
## [78] " ---------- ---------- ---------- ---------- ----------"
## [79] ""
## [80] " TTAGGGTTTA -ATCCCAGAT ATCCAGTCTG ACTGTACAGT AGTTCAGGAG"
## [81] " GTGGAGATAG CACCTACTCT GGTGACCTTG C--TCTCTGC TTTT---GTC"
## [82] " ATGCATCAGG GGCTTCCGAT GGCCAGCCTG CGGCTGCTTC CGAT---GAC"
## [83] " -T-G-G-T-- -A-C-C-GAT --CCAG-CTG ----T-C-G- -GTT---GA-"
## [84] " ---------- ------NGAT N--------- ---------- ----------"
## [85] ""
## [86] " ACCGACGCAG ATTTATAGCA TCATTCGTCA AACCCTGAGG ATAATCATTT"
## [87] " TTGTGACC-T GCTTGTGG-- --GTGAGCCC TACCCCTTGG -TTTCCACGT"
## [88] " AGATATCCAT GCGTATGGC- TCGTGACACT CTTCCCGTGG -CATCCAGCA"
## [89] " A---A--CA- --TTAT-GC- TC-T--G-C- -ACCC-G-GG -TA--CA--T"
## [90] " ---------- ---------- ---------- ---------- ----------"
## [91] ""
## [92] " GTCACAGCTT CCTTTGGTCA TCATTACTGT GCAAATAAAC TGTTAGAGCA"
## [93] " GACCCAT-AA AGTCTACTCA TTTTCTGTCC AG-GGTTCA- -GGCACAGAG"
## [94] " GAATTG--GA GCTCCAAGCA GACCTGGTCT GCCAGTTGC- -AAAACGTCT"
## [95] " G-C-CA---- -CT-T--TCA T--TT--T-T GC-A-T--A- -G--A-AGC-"
## [96] " ---------- ---------- ---------- ---------- ----------"
## [97] ""
## [98] " TGAGCCAGCA AAAACAGTGG GAAACGCAGC AATTTCCTGT ATTTAATAGT"
## [99] " AGAAAAGGCA AGGATGTGGA GGGAGAGGAG AGTC-CCTGG AAGGAATGTC"
## [100] " TCCATACAAA GAGGCATGTC AGAATGCAAG CAAC-CGTCT GGGCATCATT"
## [101] " TGA----GCA AA-ACA--G- G-AA-GCA-- AAT--CCTGT A---AATA-T"
## [102] " ---------- ---------- ---------- ---------- ----------"
## [103] ""
## [104] " CTGTGAGATA TACTTTAATG AGATGAAATT GAAGAAAACT GAGTCATTAG"
## [105] " CCC-AGCATA T-CAGCCTGG AGTTTCCATT CAGGAAATTT CCCCAGCTCC"
## [106] " ACG-GGGTTA T-GTGGAATC TGTTCTAATT GATTAAAACC AGGAAGATCA"
## [107] " C-G-G-GATA T-CT--AATG AG-T--AATT GA-GAAAACT --G----T--"
## [108] " ---------- ---------- ---------- ---------- ----------"
## [109] ""
## [110] " AAAGGCATTC ACATAAACTT TCCTGGTGTA TATTTCCTAA CTCTCTTCCA"
## [111] " CCAGACCCTC CCA--GACTT GCCTCCTCCC TCCTGACAAA GCCCCAGCCC"
## [112] " TCAATCATTC GAA--AGCCT CGACAGTTTG TTATTAGGTA GCCACTTGCC"
## [113] " --AG-CATTC -CA--AACTT -CCT-GT-T- T--TT-C-AA --C-CTTCC-"
## [114] " ---------- ---------- ---------- ---------- ----------"
## [115] ""
## [116] " GTGTTTTCTA CACCAGAAGA GTTCATTACA TCATTGAAGG ACAATGCTGA"
## [117] " -TACC-CATA GCCCCAGCCA CCACCCCCTT T-GAGGAGAG AAGGTGCTGG"
## [118] " -TATGACACA CCTCAAAACT GCTCCAGGCA T-GCAGCATG TTAGGACACC"
## [119] " -T-T----TA C-CCA-AA-A G-TC----CA T----GAA-G A-A-TGCTG-"
## [120] " ---------- ---------- ---------- ---------- ----------"
## [121] ""
## [122] " AAAATAAGAA CGCGTTTGGT TTTTCATAAA CCACATGGTC TTGTGGGTCA"
## [123] " GTTGGGTGAG GAGCTCTGCG GGAGGACTGA AC-CA-AGCT --GTGGGCTC"
## [124] " GCTGTGGGAA AAACCGCTGG TCACGATGAT GC-CAGAGCC --GCCAGTTG"
## [125] " ----T--GAA ----T-TGG- T----AT-AA -C-CA--G-C --GTGGGT--"
## [126] " ---------- ---------- ---------- ---------- ----------"
## [127] ""
## [128] " TGTTGTTTTG TTTCTTTAGA TTTGAGAGAC GGGGAATGAT GTGATTTTGC"
## [129] " CTGGGC---A CCTGCCCA-- --TG--AGCC AGGCCTGCAT CTCCCAGACC"
## [130] " TGAAAT---G AAAACCAA-- --TGCCAGCT AACTAATTGT TTCAACTACC"
## [131] " TG--GT---G --T----A-- --TG--AG-C -GG-AAT-AT -T-A--T--C"
## [132] " ---------- ---------- ---------- ---------- ----------"
## [133] ""
## [134] " CCAGTCAGCA TGGATATGAT TTGGACTTCC ATCTGTTTAA GATTAAATGG"
## [135] " TGCTTGTGGG TGAGCCTGCA TGCTGACATG CTTGGCTGGG CTCTAGCC--"
## [136] " C-TGTGCCAA AGAATCGATT GGGGAGAACA TTTTGTCCAC ACTCACCCGA"
## [137] " C--GT--G-A TG-AT-TG-T T-GGA---C- -T-TGTT-A- --TTA---G-"
## [138] " ---------- ---------- ---------- ---------- ----------"
## [139] ""
## [140] " TAGACAGAGA GAAATATTTC TGTTTTTTTT ATCCATGATT GCAAATCTGT"
## [141] " TTGGCTATTG GTGGC--CCA GGGTGTGTGT GTGTGTGCGT GCATGTGTGT"
## [142] " TTGGC--TTG GTGGT--CTT GGTCCTGTAG AGATGCAGCC GGAGTATCAG"
## [143] " T-G-C----- G---T---T- -GTT-T-T-T AT---TG--T GCA--T-TGT"
## [144] " ---------- ---------- ---------- ---------- ----------"
## [145] ""
## [146] " GGGTTCAAAG TCTGCTTTTG TTCCAAATAA TCATTCAAAC CTGCCGTACT"
## [147] " GTGTGTCC-C TTTGGAGCTA TGTAGTTTGT GTACCAAAAT AAGAAGGGAG"
## [148] " GAATGTGG-G CCTAGTGTCA GGTA--TCAG AATTCGAGGC TGGAAGGAGC"
## [149] " G-GT-----G TCTG-T-TT- T------TA- --AT--AAAC --G--G-A--"
## [150] " ---------- ---------- ---------- ---------- ----------"
## [151] ""
## [152] " GTGTGGGGTG GGAAGTGAAG GAGGATCTTA TCTGGAA--A TCATGTGCTG"
## [153] " CTGGGGAGTT GAGCCCTGAT GAACTTTGTA ACAATAA--A GCCTTATCTT"
## [154] " CTCCAGAGAG AGAGTTTCAT GCATTCCCTT CATGTCACCA ACAAGGATCA"
## [155] " -TG-GG-GTG GGA--T--A- GA---TC-TA -CTG-AA--A -CATG--CT-"
## [156] " ---------- ---------- ---------- ---------- ----------"
## [157] ""
## [158] " TATGATGAAG GCAGGATATG GAAAACTCC- AAATATGGAC ACC-TTTATG"
## [159] " GTTAAATAAA TCAAGCCTTG AATAAATGC- CTGCTTATTC CTTGCCAATA"
## [160] " GAGGGAGCAG CCAGGCTGCT GAAGGGCGCG CCACACAATT GTGGACTATT"
## [161] " -ATGA-GAAG -CAGG-T-TG GAAAA-T-C- --A-AT---C ------TAT-"
## [162] " ---------- ---------- ---------- ---------- ----------"
## [163] ""
## [164] " TGTGCAAGGG AGAAAGTCTG AAGGATGCAA CCTGTTCATA ACATTTTCAT"
## [165] " TTTTCCTGCT ACCTAAGGCA ACATAGCTCT GTGACCTGTA ACCAGTCACT"
## [166] " GTTTTCAGAA A--GAGGTCA ATGAGGGCAC ATCTTTGCCC ACAGCATTTT"
## [167] " T-T-C-AG-- A---AG---- A-G-A-GCA- ----TT--TA ACA--TT--T"
## [168] " ---------- ---------- ---------- --------TA ACA-------"
## [169] ""
## [170] " TCAAATTTAA ACTAGTTTGA TTAATTCCAA ATGCACATTT GATTTGTTGT"
## [171] " TAACCTCTCT GG------GT CTATTTCCAT ATCTATAGAA AGTGGATATA"
## [172] " TAACATTTTC GC------AG CTATTTCAGA ATC---ACGG TGCCCACTGA"
## [173] " T-A-ATTT-- -C------G- -TA-TTCCAA AT--A-A--- --T---TTG-"
## [174] " ---------- ---------- ---------- ---------- ----------"
## [175] ""
## [176] " GTTTTTATGA TGTATTTCAC AATACTGTTG CATAAAATAT CTAAAAAAAA"
## [177] " ATACT-ATT- TGTTCTCCTC AATGCTGCTT ATTGTAGAGA TCAAATGTAA"
## [178] " ATGGC-ATTG CGTACTTTCT AAAAGTAGAC A----AGAAT TTGAGAAACA"
## [179] " -T--T-AT-- TGTA-TTC-C AATACTG-T- --T--A--AT -TAAAAAAAA"
## [180] " ---------- ---------- ---------- ---------- ----------"
## [181] ""
## [182] " CATTTAGTTA TATGGAAGAC ACTTGGACAA CTGGTTGTTA TTTGTTTGTC"
## [183] " TAACG---TG TGTGAAAG-- ---TGCTTTA TAGATGGCAA GGTGCTGTTC"
## [184] " CCCCG---TG -GTGCAAC-- ---TG-TTCA TTGCTTA-AA CCTACTGCTG"
## [185] " CA------T- T-TG-AAG-- ---TG----A -TG-TTG--A --TG-T--TC"
## [186] " ---------- ---------- ---------- ---------- ----------"
## [187] ""
## [188] " TATTTTTATG AATGCCTCAA AGATCAAATA GTTACACAC- TTAATGCAAT"
## [189] " TGCTGTAAGG AATCCTTCTG ATCATTACGC TTTTTGTAAA GCAATCAGAA"
## [190] " ACTCATTTG- ---CCCTTCG ATACTT---C TTTACCAACA TCGGTGGTGG"
## [191] " T-TT-TTA-G AAT-CCTC-- A-A---A--- -TTAC--AC- T-AATG--A-"
## [192] " ---------- ---------- ---------- ---------- ----------"
## [193] ""
## [194] " CGAGCTTAGA GAGAGAAATT AAAAGTCTTA AATAAATTGT GATTAGATAA"
## [195] " CAGCCATC-- ---------- ---------- ---------- ----------"
## [196] " CATCT----- ---------- ---------- ---------- ----------"
## [197] " C---C-T--- ---------- ---------- ---------- ----------"
## [198] " ---------- ---------- ---------- ---------- ----------"
## [199] ""
## [200] " ATAGAAACAG AAGCAAATTG GCGAGT"
## [201] " ---------- ---------- ------"
## [202] " ---------- ---------- ------"
## [203] " ---------- ---------- ------"
## [204] " ---------- ---------- ------"
conservedMotifs(al[[1]], aln_hs, aln_mm, PWMs, Drerio)
## $Enhancer
## Views on a 1669-letter DNAString subject
## subject: TGGCATACACAGCAAACATCATGAATTTAATTTA...TTAGATAAATAGAAACAGAAGCAAATTGGCGAGT
## views:
## start end width
## [1] 399 406 8 [TTCACGTG]
## [2] 469 481 13 [TGGTGGAATGTAA]
## [3] 1563 1577 15 [AAAGATCAAATAGTT]
## [4] 1616 1635 20 [AAGTCTTAAATAAATTGTGA]
##
## $`human_chr7:128264261-128265260:+`
## Views on a 1000-letter DNAString subject
## subject: GAACAGAGAGGATTGCCTGGAGGTTCCTAGGACC...TTACGCTTTTTGTAAAGCAATCAGAACAGCCATC
## views:
## start end width
## [1] 187 194 8 [CACGTGAC]
## [2] 275 287 13 [GGAAGGAATGTCC]
## [3] 886 912 27 [AGAGATCAAATGTAATAACGTGTGTGA]
##
## $`mouse_chr6:29032045-29033044:+`
## Views on a 1000-letter DNAString subject
## subject: TCTGAACTGCCTTGTTTACCAGCTGTCTTGCTAA...CGATACTTCTTTACCAACATCGGTGGTGGCATCT
## views:
## start end width
## [1] 80 94 15 [AGAGCTTGGATCTTT]
## [2] 179 186 8 [CTCGTGAC]
## [3] 628 640 13 [ATCAGGAATGTGG]
## [4] 816 835 20 [CCACAGCATTTTTAACATTT]
sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] motifmatchr_1.18.0
## [2] MotifDb_1.38.0
## [3] org.Mm.eg.db_3.15.0
## [4] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0
## [5] org.Hs.eg.db_3.15.0
## [6] TxDb.Hsapiens.UCSC.hg38.knownGene_3.15.0
## [7] GenomicFeatures_1.48.0
## [8] AnnotationDbi_1.58.0
## [9] Biobase_2.56.0
## [10] BSgenome.Mmusculus.UCSC.mm10_1.4.3
## [11] BSgenome.Hsapiens.UCSC.hg38_1.4.4
## [12] BSgenome.Drerio.UCSC.danRer10_1.4.2
## [13] BSgenome_1.64.0
## [14] rtracklayer_1.56.0
## [15] Biostrings_2.64.0
## [16] XVector_0.36.0
## [17] GenomicRanges_1.48.0
## [18] GenomeInfoDb_1.32.0
## [19] IRanges_2.30.0
## [20] S4Vectors_0.34.0
## [21] BiocGenerics_0.42.0
## [22] enhancerHomologSearch_1.2.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_2.0-3 rjson_0.2.21
## [3] ellipsis_0.3.2 bit64_4.0.5
## [5] fansi_1.0.3 xml2_1.3.3
## [7] R.methodsS3_1.8.1 cachem_1.0.6
## [9] knitr_1.38 splitstackshape_1.4.8
## [11] jsonlite_1.8.0 Rsamtools_2.12.0
## [13] seqLogo_1.62.0 annotate_1.74.0
## [15] GO.db_3.15.0 dbplyr_2.1.1
## [17] png_0.1-7 R.oo_1.24.0
## [19] readr_2.1.2 compiler_4.2.0
## [21] httr_1.4.2 assertthat_0.2.1
## [23] Matrix_1.4-1 fastmap_1.1.0
## [25] cli_3.3.0 htmltools_0.5.2
## [27] prettyunits_1.1.1 tools_4.2.0
## [29] gtable_0.3.0 glue_1.6.2
## [31] TFMPvalue_0.0.8 GenomeInfoDbData_1.2.8
## [33] reshape2_1.4.4 dplyr_1.0.8
## [35] rappdirs_0.3.3 Rcpp_1.0.8.3
## [37] jquerylib_0.1.4 vctrs_0.4.1
## [39] xfun_0.30 CNEr_1.32.0
## [41] stringr_1.4.0 lifecycle_1.0.1
## [43] restfulr_0.0.13 poweRlaw_0.70.6
## [45] gtools_3.9.2 XML_3.99-0.9
## [47] zlibbioc_1.42.0 scales_1.2.0
## [49] hms_1.1.1 MatrixGenerics_1.8.0
## [51] parallel_4.2.0 SummarizedExperiment_1.26.0
## [53] yaml_2.3.5 curl_4.3.2
## [55] memoise_2.0.1 ggplot2_3.3.5
## [57] sass_0.4.1 biomaRt_2.52.0
## [59] stringi_1.7.6 RSQLite_2.2.12
## [61] BiocIO_1.6.0 caTools_1.18.2
## [63] filelock_1.0.2 BiocParallel_1.30.0
## [65] rlang_1.0.2 pkgconfig_2.0.3
## [67] matrixStats_0.62.0 bitops_1.0-7
## [69] pracma_2.3.8 evaluate_0.15
## [71] lattice_0.20-45 purrr_0.3.4
## [73] GenomicAlignments_1.32.0 bit_4.0.4
## [75] tidyselect_1.1.2 plyr_1.8.7
## [77] magrittr_2.0.3 R6_2.5.1
## [79] generics_0.1.2 DelayedArray_0.22.0
## [81] DBI_1.1.2 pillar_1.7.0
## [83] KEGGREST_1.36.0 RCurl_1.98-1.6
## [85] tibble_3.1.6 crayon_1.5.1
## [87] utf8_1.2.2 BiocFileCache_2.4.0
## [89] tzdb_0.3.0 rmarkdown_2.14
## [91] progress_1.2.2 TFBSTools_1.34.0
## [93] grid_4.2.0 data.table_1.14.2
## [95] blob_1.2.3 digest_0.6.29
## [97] xtable_1.8-4 R.utils_2.11.0
## [99] munsell_0.5.0 DirichletMultinomial_1.38.0
## [101] bslib_0.3.1
1. Howe, K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013).
2. Braasch, I. et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nature genetics 48, 427–437 (2016).
3. Wong, E. S. et al. Deep conservation of the enhancer regulatory code in animals. Science 370, (2020).
4. A, S. Motifmatchr: Fast motif matching in r. R package version