See ?OikScrambling:::loadAllGenomes()
and
vignette("LoadGenomicBreaks", package = "OikScrambling")
for how the different objects are prepared.
library('OikScrambling') |> suppressPackageStartupMessages()
genomes <- OikScrambling:::loadAllGenomes(compat = F)
## Warning in runHook(".onLoad", env, package.lib, package): input string 'Génoscope' cannot be translated from 'ANSI_X3.4-1968' to UTF-8, but is valid UTF-8
## Warning in runHook(".onLoad", env, package.lib, package): input string 'Génoscope' cannot be translated from 'ANSI_X3.4-1968' to UTF-8, but is valid UTF-8
load("BreakPoints.Rdata")
Annotated versions of the alignments can be found in the
PAC3.svg
file that should be in the same directory as the
PAC3.Rmd
source file of this document.
The gene g3146
(Oidioi.mRNA.OKI2018_I69.chr1.g3144.t1.cds
or
CAG5107096
in OKI2018, CBY24606
in OdB3) is on
chr1
in Oki and Kum but on XSR
in Osa and Bar.
The OdB3 protein contains a non-specific hit for the pfam10178
(Proteasome assembly chaperone 3) PAC3 domain, and Psi-BLAST also
discovers it.
O_dio MHPTTILREIELENEPVALSIMKFAEETLVTISDNGTFGAFHDI-DIA-EMRPGKEPLIT
O_dio_Oska MHPETILREVEIDNSPVALAIVKFADQTMVTISDNGTFGAFHDI-EIA-EMRPGTDPVVT
O_van ----SIVESTQIEGVDVSLAIMGFADCTMIAVSDTGTFGSFYKV-QVAKGGHTNQEAVVT
O_alb ---------LDLENADIHLAAMSFSDCLMLAVSDTGTFGSFYQV-EKSETSHQNIEPVIS
O_lon ---------IYLGEEKVHIAVMGFGDATMITVSDGALFGPIYKV-TKS--QKGANEPVIT
O_lon_N --------EIEIDNEVCYISVTQFGDTNFVTISDTGTFGSFYQV-EKK--EKMGNEPIFN
B_sty ---------FEIDDLVIYFSFMEFDDYVFLALSNTGTFGSFYKV-QKN--FRMDSNPIIS
M_ery ------IRIIVIDNTEIYLSVMKFGDCNFIVISDTGTFGSFFKI-EKN--FKMDNNPVVN
O_ruf_Y ------LKRLRLMSQKVHLSSMKFDDHIMLGVSDSGTFWLILSCFFKSTNQHKDKEPVFE
O_dio IKPF-FGYSDDFTKV------------VCRNLATSLQVKNK-LMISIGLKQEKVNKQNLD
O_dio_Oska VKPF-FGYSDDFAKVV-----------VCRNLATGLEVKNK-LMISIGLKQERINKNN--
O_van IDPI-FGINDHYFQVLIR------------------------------------------
O_alb IRPI-FGVDDQYFQVFLQVLPDSYFKVVCRYLGVHLAESRR-LLVSITLQKDKINKDN--
O_lon VDGV-FGEQHDYAQVV------------CRHLRLHCRVETEQLLASVLLKRDLVTREN--
O_lon_N IQPI-FGQAEDYLQVCKFQIETS--IVVCRYLCQNCTLQGPM-MVSVILSREKVNKKN--
B_sty TEQI-FGNKKDLLEVINK------------------------------------------
M_ery IEPI-FGENKDYLQV---------------------------------------------
O_ruf_Y IVPIFFGCDDQYFQV---------------------------------------------
Private ZENBU links:
PAC3 is on range number 2 in all pairs except for Oki_Kum since these two are syntenic.
PAC3 <- GRanges("chr1:11497331-11498497")
PAC3_in_Kum <- GRanges("contig_3_1:3247998-3248575")
coa$Oki_Osa |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] chr1 11478580-11497681 - | Chr1:8156015-8174791 19102 long rnd 0 g3141.t1 Tra FALSE
## [2] chr1 11497718-11498101 - | XSR:3167645-3168027 384 long <NA> 0 <NA> <NA> TRUE
## [3] chr1 11498259-11511831 - | Chr1:8145427-8155956 13573 long unknown 121 <NA> <NA> FALSE
## -------
## seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Aom |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] chr1 11478560-11497681 - | contig_3_1:2560164-2578352 19122 long rnd 0 g3141.t1 Tra FALSE
## [2] chr1 11497718-11498101 + | contig_4_1:8752930-8753312 384 long <NA> 0 g3146.t1 <NA> TRUE
## [3] chr1 11498259-11512002 - | contig_3_1:2549534-2560105 13744 long unknown 121 <NA> <NA> FALSE
## -------
## seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Bar |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] chr1 11478590-11497665 + | Chr1:7645053-7663388 19076 long rnd 84 g3142.t1;g3143.t1;g3.. Tra FALSE
## [2] chr1 11497717-11498108 + | XSR:9802763-9803154 392 long <NA> 0 g3146.t1 <NA> TRUE
## [3] chr1 11498256-11511831 + | Chr1:7663467-7673701 13576 long unknown 314 g3147.t1;g3148.t1;g3.. <NA> FALSE
## -------
## seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Nor |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] chr1 11478590-11497665 + | scaffold_3:1851080-1868702 19076 long rnd 84 g3142.t1;g3143.t1;g3.. Tra FALSE
## [2] chr1 11497717-11498108 - | scaffold_42:106169-106560 392 long <NA> 0 <NA> <NA> TRUE
## [3] chr1 11498256-11511831 + | scaffold_3:1868781-1879542 13576 long unknown 314 g3147.t1;g3148.t1;g3.. <NA> FALSE
## -------
## seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Kum |> subsetByOverlaps(PAC3)
## GBreaks object with 1 range and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] chr1 11472434-11508602 + | contig_3_1:3221478-3260386 36169 long rnd,unknown 310 g3140.t1;g3142.t1;g3.. <NA> FALSE
## -------
## seqinfo: 19 sequences from OKI2018.I69 genome
On chromosome 1, PAC3 is flanked by UBXN6 and a gene
ressembling mediator of RNA polymerase II transcription subunit
30. In genomes where PAC3 is on the XSR, these flanking genes are
separated by a short bridge region, where little similarity remains
between the Atlantic and Pacific branches of the “North” species. The
conserved AG
residues are trans-splicing sites according to
CAGE data.
PAC3_bridge_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps() |> plyranges::mutate(strand = '-')
PAC3_bridge_Aom <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps() |> plyranges::mutate(strand = '-')
PAC3_bridge_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps()
PAC3_bridge_Nor <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps()
list( Osa = PAC3_bridge_Osa
, Aom = PAC3_bridge_Aom
, Bar = PAC3_bridge_Bar
, Nor = PAC3_bridge_Nor) |> lapply(\(gb) gb + 0) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 4 rows and 81 columns
## aln names
## [1] ACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCGTTCTTACAGAATTGATTCCACAACGCATTTTTTCAGACAGA--- Bar
## [2] ACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCGTTCTTACAGAATTGATTCCAGAACGCATTTTTTCAGACAGA--- Nor
## [3] -----------------------AAAGTCGTAAAAAATCTTTTCGGAGAACTTTTTCGATCGCAATCTTTCAGACAGGTAA Osa
## [4] -----------------------AAAGTCGTAAAATTTTTTTTCGGAGAACTTTTTCGATCGTTATCTTTCAGACAGGTAA Aom
Alignment of 70 nucleotides flanking the alignment breakpoint on the left side of PAC3 illustrates the conservation in all 6 genomes (upstream) and the lack of similarity between the “Oki” sequences and the “North” bridge region.
PAC3_left_Oki_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Kum_Osa <- coa$Osa_Kum |> swap () |> subsetByOverlaps(PAC3_in_Kum + 1) |> sort(i=T) |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(1) |> get_bps("left") |> plyranges::mutate(strand = '-')
PAC3_left_Aom <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(1) |> get_bps("left") |> plyranges::mutate(strand = '-')
PAC3_left_Oki_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Nor <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(1) |> get_bps("right")
# Shifting Bar and Nor so that they all align well
PAC3_left <-
list( Aom = PAC3_left_Aom
, Osa = PAC3_left_Osa
, Oki = PAC3_left_Oki_Osa
, Kum = PAC3_left_Kum_Osa
, Bar = PAC3_left_Bar |> shift(20)
, Nor = PAC3_left_Nor |> shift(20))
PAC3_left |> lapply(\(gb) gb + 70) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 157 columns
## aln names
## [1] AGCTTGAAATTTTTACTTTCTTCAA------ACATTTAACTGCAATAAAATACATGATAATCAAAGT--------TCCGCACAATAAAGTCGTAAAATTTTTTTTCGGAGAACTTTTTCGATCGTTATCTTTCAGACAGGTAATCGA-ATGTCTAG- Aom
## [2] TGCATGTCAGCTTTACTTTCTTCAA------ACATTTAACTGCAATAAAATACATGATAATCAATGT--------TCCGCACAATAAAGTCGTAAAAAATCTTTTCGGAGAACTTTTTCGATCGCAATCTTTCAGACAGGTAATCGA-ATGTCTAG- Osa
## [3] --TTTCTAATTTTAACTTTCTTCAA------ACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCG---TTCTTACAGAATTGATTCCACAACGCATTTTTTCAGACAGATTATCGA-ATGTC---- Bar
## [4] --TTTTTAATTTTAACTTTCTTCAA------ACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCG---TTCTTACAGAATTGATTCCAGAACGCATTTTTTCAGACAGATTATCGA-ATGTC---- Nor
## [5] --ATTTTTTTTTTCACTTTCTTCAACACACCACTTTTAACTGCAATAA--CATGTTTAATC--AAGT--------TCCGCATGATCTCGTTTTTAACCTTTCTTCTTAGAACAAAAAAAAAAGATGCACCCGACAACGATTCTCCGAGAAATCGA-- Oki
## [6] --AATTTTTCTTTCACTTTCTTCAACACACTACTTTTAACTGCAATAA--CATGTTTAATC--AAGT--------TCCGCATGATCTTGTTTTTAACCTTTCTTCTTAGAATAACAAAA--AGATGCACCCAGCAACGATTCTCCGAGAAATCGAGC Kum
Same story on the right side. Very near the alignment break (at the
center), there is the ATG
and upstream of it there are two
conserved AG
that may be trans-splicing sites (with CAGE
support at least in Oki).
PAC3_right_Oki_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Kum_Osa <- coa$Osa_Kum |> swap () |> subsetByOverlaps(PAC3_in_Kum + 1) |> sort(i=T) |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("right") |> plyranges::mutate(strand = '-')
PAC3_right_Aom <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("right") |> plyranges::mutate(strand = '-')
PAC3_right_Oki_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Nor <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("left")
PAC3_right <-
list( Aom = PAC3_right_Aom
, Osa = PAC3_right_Osa
, Oki = PAC3_right_Oki_Osa
, Kum = PAC3_right_Kum_Osa
, Bar = PAC3_right_Bar
, Nor = PAC3_right_Nor)
PAC3_right |> lapply(\(gb) gb + 70) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 150 columns
## aln names
## [1] ------GTTCCGCACAATAAA-GTCGTAAAATTTTTTTTCGGAGAACTTTTTCGATCGTTATCTTTCAGACAGG--TAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGGGAAATACGGTCTGCAGCTCT Aom
## [2] ------GTTCCGCACAATAAA-GTCGTAAAAAATCTTTTCGGAGAACTTTTTCGATCGCAATCTTTCAGACAGG--TAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGGGAAATACGGTCTGCAGCTCT Osa
## [3] AACAAATTTCTGCAAAATGTA-GTCGTTCG---TTCTTACAGAATTGATTCCACAACGCATTTTTTCAGACAGA--TTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCTTGGAAAATACGGTCTGCAGC--- Bar
## [4] AACAAATTTCTGCAAAATGTA-GTCGTTCG---TTCTTACAGAATTGATTCCAGAACGCATTTTTTCAGACAGA--TTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCTTGGAAAATACGGTCTGCAGC--- Nor
## [5] -------TGCAGCAATTTAT--TTCAGTGTCCAGCAAAGCCTTATTTTTCCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACCAGACTGGGCAAGTTGGGCCAGGAGTTGC Oki
## [6] ---------CAGCAATTTATATTTCAGTGTCCAGCAAAGCCTTATTTTTCCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACTAGACTGGGCAAGTTGGGCCAGGAGTTGC Kum
The combined view shows that the sequence on the left and right of PAC3 do not have visible similarity with each other or with the bridge region.
PAC3_left_right <- list(
Aom = range(PAC3_left_Aom, PAC3_right_Aom),
Osa = range(PAC3_left_Osa, PAC3_right_Osa),
OkiL = PAC3_left_Oki_Osa,
OkiR = PAC3_right_Oki_Osa,
KumL = PAC3_left_Kum_Osa,
KumR = PAC3_right_Kum_Osa,
Bar = range(PAC3_left_Bar |> shift(20), PAC3_right_Bar),
Nor = range(PAC3_left_Nor |> shift(20), PAC3_right_Nor)
)
PAC3_left_right.aln <- PAC3_left_right |>
lapply(\(gb) gb + 50) |>
lapply(getSeq) |>
lapply(unlist) |>
as("DNAStringSet") |>
msa::msaClustalOmega() |>
as("DNAMultipleAlignment")
## using Gonnet
# Re-order based on my preference
PAC3_left_right.aln@unmasked[c(1,2,5,6,7,8,3,4)]
## DNAStringSet object of length 8:
## width seq names
## [1] 172 CAACACACCACTTTTAACTGCAATAACATGTT------------TAATCAAGTTCCGCATGATCTCGTTTTTAACCTTTCTT-...------------------CTTAGAACAA---AAAAAAAAGATGCACCCGACA------------------------------- OkiL
## [2] 172 CAACACACTACTTTTAACTGCAATAACATGTT------------TAATCAAGTTCCGCATGATCTTGTTTTTAACCTTTCTT-...------------------CTTAGAATAA---CAAAAAGATGCACCCAGCAAC------------------------------- KumL
## [3] 172 ------CAAACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAA---ATGTAGTCGTTCGTTCTTA...TGATTCCACAACGCATTTTTTCAGACAGATTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCT--- Bar
## [4] 172 ------CAAACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAA---ATGTAGTCGTTCGTTCTTA...TGATTCCAGAACGCATTTTTTCAGACAGATTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCT--- Nor
## [5] 172 ----TTCAAACATTTAACTGCAATAAAATACATGATAATCAA--------AGTTCCGCACAATAAAGTCGTAAAATTTTTTTT...ACTTTTTCGATCGTTATCTTTCAGACAGGTAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGG Aom
## [6] 172 ----TTCAAACATTTAACTGCAATAAAATACATGATAATCAA--------TGTTCCGCACAATAAAGTCGTAAAAAATCTTTT...ACTTTTTCGATCGCAATCTTTCAGACAGGTAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGG Osa
## [7] 172 --------------------------------------------------------------TCCAGCAAAGCCTTATTTTT-...--CCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACCAGACTGGG OkiR
## [8] 172 --------------------------------------------------------------TCCAGCAAAGCCTTATTTTT-...--CCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACTAGACTGGG KumR
In Okinawa, the bridge region contains a bidirectional promoter.
Flanking genes might be a leucine-rich repeat domain-containing protein on the left (same operon) and a glutamine-tRNA ligase on the right (different strand).
PAC3_XSR_Osa_range <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Aom_range <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Bar_range <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Nor_range <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Aom <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Nor <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
(PAC3_XSR_Osa_Oki_triple <- coa$Osa_Oki |> subsetByOverlaps(PAC3_XSR_Osa +1000, ignore.strand = TRUE) |> sort(i=T))
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] XSR 3162876-3167494 + | XSR:6227187-6231744 4619 XSR <NA> 0 <NA> Tra TRUE
## [2] XSR 3167645-3168027 - | chr1:11497718-11498101 383 XSR <NA> 0 g12442.t1 <NA> TRUE
## [3] XSR 3168339-3245366 + | XSR:6232134-6307121 77028 XSR rnd,unknown,tandem,... 3497 g12443.t1;g12444.t1;.. <NA> FALSE
## -------
## seqinfo: 483 sequences from OSKA2016v1.9 genome
(PAC3_XSR_Osa_Kum_triple <- coa$Osa_Kum |> subsetByOverlaps(PAC3_XSR_Osa +1000, ignore.strand = TRUE) |> sort(i=T))
## Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
## suppressWarnings() to suppress this warning.)
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] XSR 3162876-3167494 + | contig_42_1:6217599-6222157 4619 XSR <NA> 0 <NA> Tra TRUE
## [2] XSR 3167635-3168032 - | contig_3_1:3248027-3248425 398 XSR <NA> 0 g12442.t1 <NA> TRUE
## [3] XSR 3168339-3245347 + | contig_42_1:6222547-6298065 77009 XSR rnd,unknown,tandem,... 3497 g12443.t1;g12444.t1;.. <NA> FALSE
## -------
## seqinfo: 483 sequences from OSKA2016v1.9 genome
(PAC3_XSR_Aom_Oki_triple <- coa$Oki_Aom |> swap () |> subsetByOverlaps(PAC3_XSR_Aom + 1000, ignore.strand=T) |> sort(i=T))
## GBreaks object with 3 ranges and 4 metadata columns:
## seqnames ranges strand | rep repOvlp transcripts query
## <Rle> <IRanges> <Rle> | <CharacterList> <integer> <Rle> <GRanges>
## [1] contig_4_1 8681814-8752620 - | unknown,rnd,LowComplexity,... 1129 g10296.t1;g10297.t1;.. XSR:6232134-6307091
## [2] contig_4_1 8752930-8753312 + | <NA> 0 g10322.t1 chr1:11497718-11498101
## [3] contig_4_1 8753461-8758085 - | <NA> 0 g10323.t1 XSR:6227184-6231744
## -------
## seqinfo: 33 sequences from AOM.5.5f genome
# Bar-Oki alignment is in the opposite orientation
(PAC3_XSR_Bar_Oki_triple <- coa$Bar_Oki |> subsetByOverlaps(PAC3_XSR_Bar +1000, ignore.strand = TRUE) |> sort(i=T))
## GBreaks object with 3 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## [1] XSR 9725709-9802129 - | XSR:6232388-6307088 76421 XSR tandem,rnd,unknown 2123 g12618.t1;g12619.t1;.. Tra FALSE
## [2] XSR 9802763-9803154 + | chr1:11497717-11498108 392 XSR <NA> 0 g12642.t1;g12642.t2 <NA> TRUE
## [3] XSR 9803275-9808246 - | XSR:6227190-6231744 4972 XSR rnd 240 <NA> <NA> FALSE
## -------
## seqinfo: 68 sequences from Bar2.p4 genome
(PAC3_XSR_Nor_Oki_triple <- coa$Oki_Nor |> swap () |> subsetByOverlaps(PAC3_XSR_Nor + 1000, ignore.strand=T) |> sort(i=T))
## GBreaks object with 3 ranges and 4 metadata columns:
## seqnames ranges strand | rep repOvlp transcripts query
## <Rle> <IRanges> <Rle> | <CharacterList> <integer> <Rle> <GRanges>
## [1] scaffold_42 100573-106048 + | rnd,MITE 663 GSOIDT00012492001;GS.. XSR:6227187-6231744
## [2] scaffold_42 106169-106560 - | <NA> 0 GSOIDT00012497001 chr1:11497717-11498108
## [3] scaffold_42 107194-181908 + | LowComplexity,rnd,tandem,... 3967 GSOIDT00012498001;GS.. XSR:6232388-6307088
## -------
## seqinfo: 1260 sequences from OdB3 genome
coa$Osa_Nor |> subsetByOverlaps(PAC3_XSR_Osa)
## Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
## suppressWarnings() to suppress this warning.)
## GBreaks object with 0 ranges and 8 metadata columns:
## seqnames ranges strand | query score Arm rep repOvlp transcripts flag nonCoa
## <Rle> <IRanges> <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer> <Rle> <character> <logical>
## -------
## seqinfo: 483 sequences from OSKA2016v1.9 genome
PAC3_XSR_Osa_left <- PAC3_XSR_Osa_Oki_triple[1] |> get_bps(dir = 'r')
PAC3_XSR_Osa_right <- PAC3_XSR_Osa_Oki_triple[3] |> get_bps(dir = 'l')
PAC3_XSR_Oki_left <- PAC3_XSR_Osa_Oki_triple[1] |> swap() |> get_bps(dir = 'r')
PAC3_XSR_Oki_right <- PAC3_XSR_Osa_Oki_triple[3] |> swap() |> get_bps(dir = 'l')
PAC3_XSR_Kum_left <- PAC3_XSR_Osa_Kum_triple[1] |> swap() |> get_bps(dir = 'r')
PAC3_XSR_Kum_right <- PAC3_XSR_Osa_Kum_triple[3] |> swap() |> get_bps(dir = 'l')
PAC3_XSR_Nor_left <- PAC3_XSR_Nor_Oki_triple[1] |> get_bps(dir = 'r')
PAC3_XSR_Nor_right <- PAC3_XSR_Nor_Oki_triple[3] |> get_bps(dir = 'l')
# Remember that Bar-Oki alignment is in the opposite orientation
PAC3_XSR_Bar_left <- PAC3_XSR_Bar_Oki_triple[3] |> get_bps(dir = 'l') |> plyranges::mutate(strand = '-')
PAC3_XSR_Bar_right <- PAC3_XSR_Bar_Oki_triple[1] |> get_bps(dir = 'r') |> plyranges::mutate(strand = '-')
# Remember that Oki-Aom alignment is in the opposite orientation
PAC3_XSR_Aom_left <- PAC3_XSR_Aom_Oki_triple[3] |> get_bps(dir = 'l') |> plyranges::mutate(strand = '-')
PAC3_XSR_Aom_right <- PAC3_XSR_Aom_Oki_triple[1] |> get_bps(dir = 'r') |> plyranges::mutate(strand = '-')
PAC3_XSR_Aom <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Nor <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
In Okinawa and Kume, the bridge region is ~400 bp-long. It will not be possible to make a combined left-right alignment like on chr1.
n <- -40
PAC3_XSR_Osa.aln.left <- list(
Osa_left = PAC3_XSR_Osa_left |> shift(n),
Aom_left = PAC3_XSR_Aom_left |> shift(-n),
Oki_left = PAC3_XSR_Oki_left |> shift(n),
Kum_left = PAC3_XSR_Kum_left |> shift(n),
Nor_left = PAC3_XSR_Nor_left |> shift(n),
Bar_left = PAC3_XSR_Bar_left |> shift(-n))
PAC3_XSR_Osa.aln.left |> lapply(\(gb) gb + 90) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 198 columns
## aln names
## [1] --------ATAGCCCCATTGGCAAAGAGTGTA-GAAATGTTAAGTTTTGTAAACTTT-GTCGATATCTAGCTTGTTTCCGTGATTC...AAAGAT--TGAACAATACCCTGATTGTTTTTTTCTTCAGGAAGTCTATCTTATACGAAGCGTGTTGAATACACGTGATAAG---- Osa_left
## [2] ---------TAGCCCCATTGGCAAAGAGTGTA-GAAGTGTTAAGTTTTGTAAACTTTTGTCGATATCTAGCTTGTTTCGGTGATTC...AAAGAT--TGAACAATACCCTGATTGTTTTTT-CTTCAGGAATTCTATCTTATACGAAGCGTGTTGAATACACGTGATAAGC--- Aom_left
## [3] TCTCTTGAAAATCCCCATCGTCGAAAAATGGATGTAGTGTTATGTTTCGTAAACTTTTGTTGATATCT----------CGTGATTC...AAAAGT--TCAACAATACACCAATTGTTTTTA----AGGGAATTCTGTTTTATAAACAGTGTGTAACTTGCAATTCGTCATTTCT Nor_left
## [4] TCTCTTGAAAATCCCCATCGTCGAAAAATGGATGTAGTGTTATGTTTCGTAAACTTTTGTTGATTTCT----------CGTGATTC...AAAAGT--TCAACAATACACCAATTGTTTTTA----AGGGAATTCTGTTTTATAAACAGTGTGTAACTTGCAATTCGTCATTTCT Bar_left
## [5] ---------AATACCCATCTTTGAAATTTTCCACTAGC-TCAGTTCTAGTTGGACAACGTCG-TATGGAG--AGTTCTAGTGATTC...AAATACGACGAACAATACGCGGATTGTTATTCACTGGGGATCACTCATGGCCTTGTGAATGCGTCTGGAGACCATTGTCGA---- Oki_left
## [6] ---------AATACCCATCTTTGAAATTTTCCACTAGC-TCAGTTCTAGTTGGACAACGTCG-TATGGAG--AGTTCTAGTGATTC...AAATACGACGAACAATACGCGGATTGTTATTCACTGGGGATCACTCATGGCCTTGTGAATGCGTCTGGAGACCATTGTCGA---- Kum_left
The right-side region is near a bidirectional promoter in both the Oki and Osa genomes.
PAC3_XSR_Osa.aln.right <- list(
Osa_right = PAC3_XSR_Osa_right,
Aom_right = PAC3_XSR_Aom_right,
Oki_right = PAC3_XSR_Oki_right,
Kum_right = PAC3_XSR_Kum_right,
Nor_right = PAC3_XSR_Nor_right |> shift(-306),
Bar_right = PAC3_XSR_Bar_right |> shift(306))
PAC3_XSR_Osa.aln.right |> lapply(\(gb) gb + 90) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 188 columns
## aln names
## [1] ---AGCA----AGCGTGAGAAAAAGTGAGCCCGCACTCCAAATGAATATTATCAACCCATCGCCAGACGTACTGTCAGAGAAACAC...ACTCTTAGAACAGCACAGAGATTTATCGCTCTACGAACCAAAAGTATACTGCGCCCTGTGGCTGTTCAAATAAAGGTTTACCCTT Nor_right
## [2] ---AGCA----AGCGTGAGAAAAAGTGAGCCCGCACTCCAAATGAATATTATCAACCCATCGCCAGACGTACTGTCAGAGAAACAC...ACTCTTAGAACAGCACAGAGATTTATCGCTCTACGAACCAAAAGTATACTGCGCCCTGTGGCTGTTCAAATAAAGGTTTACCCTT Bar_right
## [3] ---AGCC----AGCGTGAGAAAAAGTGCGCCCGCGCTCCAAATGAATATCATCAACCCATCGCCACTCGTACTGTCAGAGAAACAG...ACTATTAGAATAGCACAGAGCATAATCCGCCTACGAACAAAAAGTATTCTGCGCCCCGTGGCTGTCCAAATAAAGGTTTATCTTT Aom_right
## [4] ---AGCC----AGCGTGAGAAAAAGTGCGCCCGCGCTCCAAATGAATATCATCAACCCATCGCCACTCGTACTGTCAGAGAAACAG...ACTATTAGAATAGCACAGAGCATAATCCGCCTACGAACAAAAAGTATTCTGCGCCCTGTGGCTGTCCAAATAAAGGTTTATCTTT Osa_right
## [5] AGGAGCGCTGCGGCGTGAGAAAAGCTGCGCCCGCGCGC--ACTG--CTGCGCTGATGAGTATTTGGAC---TTGATTGGGAGAGAA...TCGATCAGAGTTGCGCAGAACTTTGTCCGCGTGAGAACGAAAAGTATTTTGCGCCCAGTGGCAGTTCAGATAAAGGTTATTCTTT Oki_right
## [6] AGGAGCGCTGCGGCGTGAGAAAAGCTGCGCCCGCGCGC--ACTG--CTGCGCTGATGAGTATTTGGAC---TTGATTGGGAGAGAA...TCGATCAGAGTTGCGCAGAACTTTGTCCGCGTGAGAACGAAAAGTATTTTGCGCCCAGTGGCAGTTCAGATAAAGGTTATTCTTT Kum_right
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 12 (bookworm)
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BSgenome.Oidioi.genoscope.OdB3_1.0.0 BSgenome.Oidioi.OIST.AOM.5.5f_1.0.1 BSgenome.Oidioi.OIST.KUM.M3.7f_1.0.1 BSgenome.Oidioi.OIST.Bar2.p4_1.0.1
## [5] BSgenome.Oidioi.OIST.OSKA2016v1.9_1.0.0 BSgenome.Oidioi.OIST.OKI2018.I69_1.0.1 OikScrambling_5.0.0 ggplot2_3.4.3
## [9] GenomicBreaks_0.14.2 BSgenome_1.68.0 rtracklayer_1.60.0 Biostrings_2.68.1
## [13] XVector_0.40.0 GenomicRanges_1.52.0 GenomeInfoDb_1.36.1 IRanges_2.34.1
## [17] S4Vectors_0.38.1 BiocGenerics_0.46.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.3.1 BiocIO_1.10.0 bitops_1.0-7 tibble_3.2.1 R.oo_1.25.0 XML_3.99-0.14
## [7] rpart_4.1.19 lifecycle_1.0.3 rprojroot_2.0.3 lattice_0.20-45 MASS_7.3-58.2 backports_1.4.1
## [13] magrittr_2.0.3 Hmisc_5.1-0 sass_0.4.7 rmarkdown_2.23 jquerylib_0.1.4 yaml_2.3.7
## [19] plotrix_3.8-2 DBI_1.1.3 CNEr_1.36.0 minqa_1.2.5 RColorBrewer_1.1-3 ade4_1.7-22
## [25] abind_1.4-5 zlibbioc_1.46.0 purrr_1.0.2 R.utils_2.12.2 RCurl_1.98-1.12 nnet_7.3-18
## [31] pracma_2.4.2 GenomeInfoDbData_1.2.10 gdata_2.19.0 annotate_1.78.0 pkgdown_2.0.7 codetools_0.2-19
## [37] DelayedArray_0.26.7 tidyselect_1.2.0 shape_1.4.6 lme4_1.1-34 matrixStats_1.0.0 base64enc_0.1-3
## [43] GenomicAlignments_1.36.0 jsonlite_1.8.7 msa_1.32.0 mitml_0.4-5 Formula_1.2-5 survival_3.5-3
## [49] iterators_1.0.14 systemfonts_1.0.5 foreach_1.5.2 tools_4.3.1 ragg_1.2.5 Rcpp_1.0.11
## [55] glue_1.6.2 gridExtra_2.3 pan_1.8 xfun_0.40 MatrixGenerics_1.12.2 EBImage_4.42.0
## [61] dplyr_1.1.3 withr_2.5.1 fastmap_1.1.1 boot_1.3-28.1 fansi_1.0.5 digest_0.6.33
## [67] R6_2.5.1 mice_3.16.0 textshaping_0.3.7 colorspace_2.1-0 GO.db_3.17.0 gtools_3.9.4
## [73] poweRlaw_0.70.6 jpeg_0.1-10 RSQLite_2.3.1 weights_1.0.4 R.methodsS3_1.8.2 utf8_1.2.3
## [79] tidyr_1.3.0 generics_0.1.3 data.table_1.14.8 httr_1.4.7 htmlwidgets_1.6.2 S4Arrays_1.0.5
## [85] pkgconfig_2.0.3 gtable_0.3.4 blob_1.2.4 htmltools_0.5.6.1 fftwtools_0.9-11 plyranges_1.20.0
## [91] scales_1.2.1 Biobase_2.60.0 png_0.1-8 knitr_1.44 heatmaps_1.24.0 rstudioapi_0.15.0
## [97] tzdb_0.4.0 reshape2_1.4.4 rjson_0.2.21 checkmate_2.2.0 nlme_3.1-162 nloptr_2.0.3
## [103] cachem_1.0.8 stringr_1.5.0 KernSmooth_2.23-20 parallel_4.3.1 genoPlotR_0.8.11 foreign_0.8-84
## [109] AnnotationDbi_1.62.2 restfulr_0.0.15 desc_1.4.2 pillar_1.9.0 grid_4.3.1 vctrs_0.6.3
## [115] jomo_2.7-6 xtable_1.8-4 cluster_2.1.4 htmlTable_2.4.1 evaluate_0.22 readr_2.1.4
## [121] cli_3.6.1 locfit_1.5-9.8 compiler_4.3.1 Rsamtools_2.16.0 rlang_1.1.1 crayon_1.5.2
## [127] plyr_1.8.8 fs_1.6.3 stringi_1.7.12 BiocParallel_1.34.2 munsell_0.5.0 tiff_0.1-11
## [133] glmnet_4.1-7 Matrix_1.5-3 hms_1.1.3 bit64_4.0.5 KEGGREST_1.40.0 SummarizedExperiment_1.30.2
## [139] broom_1.0.5 memoise_2.0.1 bslib_0.5.1 bit_4.0.5