knitr::opts_knit$set(cache = TRUE)
options(width = 200)

Load packages and data

See ?OikScrambling:::loadAllGenomes() and vignette("LoadGenomicBreaks", package = "OikScrambling") for how the different objects are prepared.

library('OikScrambling')   |> suppressPackageStartupMessages()
genomes <- OikScrambling:::loadAllGenomes(compat = F)
## Warning in runHook(".onLoad", env, package.lib, package): input string 'Génoscope' cannot be translated from 'ANSI_X3.4-1968' to UTF-8, but is valid UTF-8

## Warning in runHook(".onLoad", env, package.lib, package): input string 'Génoscope' cannot be translated from 'ANSI_X3.4-1968' to UTF-8, but is valid UTF-8
load("BreakPoints.Rdata")

PAC3 moved between chr1 and XSR; its ancestral state is unknown

Annotated versions of the alignments can be found in the PAC3.svg file that should be in the same directory as the PAC3.Rmd source file of this document.

General information.

The gene g3146 (Oidioi.mRNA.OKI2018_I69.chr1.g3144.t1.cds or CAG5107096 in OKI2018, CBY24606 in OdB3) is on chr1 in Oki and Kum but on XSR in Osa and Bar. The OdB3 protein contains a non-specific hit for the pfam10178 (Proteasome assembly chaperone 3) PAC3 domain, and Psi-BLAST also discovers it.

Click here to see a provisional alignment with similar sequences in other larvacean genomes. Proper attention to splicing would be needed to make it more accurate.
O_dio       MHPTTILREIELENEPVALSIMKFAEETLVTISDNGTFGAFHDI-DIA-EMRPGKEPLIT
O_dio_Oska  MHPETILREVEIDNSPVALAIVKFADQTMVTISDNGTFGAFHDI-EIA-EMRPGTDPVVT
O_van       ----SIVESTQIEGVDVSLAIMGFADCTMIAVSDTGTFGSFYKV-QVAKGGHTNQEAVVT
O_alb       ---------LDLENADIHLAAMSFSDCLMLAVSDTGTFGSFYQV-EKSETSHQNIEPVIS
O_lon       ---------IYLGEEKVHIAVMGFGDATMITVSDGALFGPIYKV-TKS--QKGANEPVIT
O_lon_N     --------EIEIDNEVCYISVTQFGDTNFVTISDTGTFGSFYQV-EKK--EKMGNEPIFN
B_sty       ---------FEIDDLVIYFSFMEFDDYVFLALSNTGTFGSFYKV-QKN--FRMDSNPIIS
M_ery       ------IRIIVIDNTEIYLSVMKFGDCNFIVISDTGTFGSFFKI-EKN--FKMDNNPVVN
O_ruf_Y     ------LKRLRLMSQKVHLSSMKFDDHIMLGVSDSGTFWLILSCFFKSTNQHKDKEPVFE


O_dio       IKPF-FGYSDDFTKV------------VCRNLATSLQVKNK-LMISIGLKQEKVNKQNLD
O_dio_Oska  VKPF-FGYSDDFAKVV-----------VCRNLATGLEVKNK-LMISIGLKQERINKNN--
O_van       IDPI-FGINDHYFQVLIR------------------------------------------
O_alb       IRPI-FGVDDQYFQVFLQVLPDSYFKVVCRYLGVHLAESRR-LLVSITLQKDKINKDN--
O_lon       VDGV-FGEQHDYAQVV------------CRHLRLHCRVETEQLLASVLLKRDLVTREN--
O_lon_N     IQPI-FGQAEDYLQVCKFQIETS--IVVCRYLCQNCTLQGPM-MVSVILSREKVNKKN--
B_sty       TEQI-FGNKKDLLEVINK------------------------------------------
M_ery       IEPI-FGENKDYLQV---------------------------------------------
O_ruf_Y     IVPIFFGCDDQYFQV---------------------------------------------


Private ZENBU links:

PAC3 in five genome pairs

PAC3 is on range number 2 in all pairs except for Oki_Kum since these two are syntenic.

PAC3 <- GRanges("chr1:11497331-11498497")
PAC3_in_Kum <- GRanges("contig_3_1:3247998-3248575")

coa$Oki_Osa |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames            ranges strand |                query     score      Arm             rep   repOvlp transcripts        flag    nonCoa
##          <Rle>         <IRanges>  <Rle> |            <GRanges> <integer> <factor> <CharacterList> <integer>       <Rle> <character> <logical>
##   [1]     chr1 11478580-11497681      - | Chr1:8156015-8174791     19102     long             rnd         0    g3141.t1         Tra     FALSE
##   [2]     chr1 11497718-11498101      - |  XSR:3167645-3168027       384     long            <NA>         0        <NA>        <NA>      TRUE
##   [3]     chr1 11498259-11511831      - | Chr1:8145427-8155956     13573     long         unknown       121        <NA>        <NA>     FALSE
##   -------
##   seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Aom |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames            ranges strand |                      query     score      Arm             rep   repOvlp transcripts        flag    nonCoa
##          <Rle>         <IRanges>  <Rle> |                  <GRanges> <integer> <factor> <CharacterList> <integer>       <Rle> <character> <logical>
##   [1]     chr1 11478560-11497681      - | contig_3_1:2560164-2578352     19122     long             rnd         0    g3141.t1         Tra     FALSE
##   [2]     chr1 11497718-11498101      + | contig_4_1:8752930-8753312       384     long            <NA>         0    g3146.t1        <NA>      TRUE
##   [3]     chr1 11498259-11512002      - | contig_3_1:2549534-2560105     13744     long         unknown       121        <NA>        <NA>     FALSE
##   -------
##   seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Bar |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames            ranges strand |                query     score      Arm             rep   repOvlp            transcripts        flag    nonCoa
##          <Rle>         <IRanges>  <Rle> |            <GRanges> <integer> <factor> <CharacterList> <integer>                  <Rle> <character> <logical>
##   [1]     chr1 11478590-11497665      + | Chr1:7645053-7663388     19076     long             rnd        84 g3142.t1;g3143.t1;g3..         Tra     FALSE
##   [2]     chr1 11497717-11498108      + |  XSR:9802763-9803154       392     long            <NA>         0               g3146.t1        <NA>      TRUE
##   [3]     chr1 11498256-11511831      + | Chr1:7663467-7673701     13576     long         unknown       314 g3147.t1;g3148.t1;g3..        <NA>     FALSE
##   -------
##   seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Nor |> subsetByOverlaps(PAC3)
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames            ranges strand |                      query     score      Arm             rep   repOvlp            transcripts        flag    nonCoa
##          <Rle>         <IRanges>  <Rle> |                  <GRanges> <integer> <factor> <CharacterList> <integer>                  <Rle> <character> <logical>
##   [1]     chr1 11478590-11497665      + | scaffold_3:1851080-1868702     19076     long             rnd        84 g3142.t1;g3143.t1;g3..         Tra     FALSE
##   [2]     chr1 11497717-11498108      - |  scaffold_42:106169-106560       392     long            <NA>         0                   <NA>        <NA>      TRUE
##   [3]     chr1 11498256-11511831      + | scaffold_3:1868781-1879542     13576     long         unknown       314 g3147.t1;g3148.t1;g3..        <NA>     FALSE
##   -------
##   seqinfo: 19 sequences from OKI2018.I69 genome
coa$Oki_Kum |> subsetByOverlaps(PAC3)
## GBreaks object with 1 range and 8 metadata columns:
##       seqnames            ranges strand |                      query     score      Arm             rep   repOvlp            transcripts        flag    nonCoa
##          <Rle>         <IRanges>  <Rle> |                  <GRanges> <integer> <factor> <CharacterList> <integer>                  <Rle> <character> <logical>
##   [1]     chr1 11472434-11508602      + | contig_3_1:3221478-3260386     36169     long     rnd,unknown       310 g3140.t1;g3142.t1;g3..        <NA>     FALSE
##   -------
##   seqinfo: 19 sequences from OKI2018.I69 genome

PAC3 on chr1

The bridge region in the “Northern” genomes.

On chromosome 1, PAC3 is flanked by UBXN6 and a gene ressembling mediator of RNA polymerase II transcription subunit 30. In genomes where PAC3 is on the XSR, these flanking genes are separated by a short bridge region, where little similarity remains between the Atlantic and Pacific branches of the “North” species. The conserved AG residues are trans-splicing sites according to CAGE data.

PAC3_bridge_Osa <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps() |> plyranges::mutate(strand = '-')
PAC3_bridge_Aom <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps() |> plyranges::mutate(strand = '-')
PAC3_bridge_Bar <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps()
PAC3_bridge_Nor <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> swap() |> cleanGaps()

list( Osa = PAC3_bridge_Osa
    , Aom = PAC3_bridge_Aom
    , Bar = PAC3_bridge_Bar
    , Nor = PAC3_bridge_Nor) |> lapply(\(gb) gb + 0)  |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 4 rows and 81 columns
##      aln                                                                                                                                                                            names               
## [1] ACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCGTTCTTACAGAATTGATTCCACAACGCATTTTTTCAGACAGA---                                                                                              Bar
## [2] ACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCGTTCTTACAGAATTGATTCCAGAACGCATTTTTTCAGACAGA---                                                                                              Nor
## [3] -----------------------AAAGTCGTAAAAAATCTTTTCGGAGAACTTTTTCGATCGCAATCTTTCAGACAGGTAA                                                                                              Osa
## [4] -----------------------AAAGTCGTAAAATTTTTTTTCGGAGAACTTTTTCGATCGTTATCTTTCAGACAGGTAA                                                                                              Aom

The left region of PAC3 in all genomes

Alignment of 70 nucleotides flanking the alignment breakpoint on the left side of PAC3 illustrates the conservation in all 6 genomes (upstream) and the lack of similarity between the “Oki” sequences and the “North” bridge region.

PAC3_left_Oki_Osa  <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Kum_Osa  <- coa$Osa_Kum |> swap () |> subsetByOverlaps(PAC3_in_Kum + 1) |> sort(i=T) |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Osa      <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(1) |> get_bps("left") |> plyranges::mutate(strand = '-')
PAC3_left_Aom      <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(1) |> get_bps("left") |> plyranges::mutate(strand = '-')
PAC3_left_Oki_Bar  <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Bar      <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> swap()  |> plyranges::slice(1) |> get_bps("right")
PAC3_left_Nor      <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> swap()  |> plyranges::slice(1) |> get_bps("right")

# Shifting Bar and Nor so that they all align well

PAC3_left <- 
list( Aom = PAC3_left_Aom
    , Osa = PAC3_left_Osa
    , Oki = PAC3_left_Oki_Osa
    , Kum = PAC3_left_Kum_Osa
    , Bar = PAC3_left_Bar |> shift(20)
    , Nor = PAC3_left_Nor |> shift(20))

PAC3_left |> lapply(\(gb) gb + 70) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment")
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 157 columns
##      aln                                                                                                                                                                            names               
## [1] AGCTTGAAATTTTTACTTTCTTCAA------ACATTTAACTGCAATAAAATACATGATAATCAAAGT--------TCCGCACAATAAAGTCGTAAAATTTTTTTTCGGAGAACTTTTTCGATCGTTATCTTTCAGACAGGTAATCGA-ATGTCTAG-                  Aom
## [2] TGCATGTCAGCTTTACTTTCTTCAA------ACATTTAACTGCAATAAAATACATGATAATCAATGT--------TCCGCACAATAAAGTCGTAAAAAATCTTTTCGGAGAACTTTTTCGATCGCAATCTTTCAGACAGGTAATCGA-ATGTCTAG-                  Osa
## [3] --TTTCTAATTTTAACTTTCTTCAA------ACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCG---TTCTTACAGAATTGATTCCACAACGCATTTTTTCAGACAGATTATCGA-ATGTC----                  Bar
## [4] --TTTTTAATTTTAACTTTCTTCAA------ACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAAATGTAGTCGTTCG---TTCTTACAGAATTGATTCCAGAACGCATTTTTTCAGACAGATTATCGA-ATGTC----                  Nor
## [5] --ATTTTTTTTTTCACTTTCTTCAACACACCACTTTTAACTGCAATAA--CATGTTTAATC--AAGT--------TCCGCATGATCTCGTTTTTAACCTTTCTTCTTAGAACAAAAAAAAAAGATGCACCCGACAACGATTCTCCGAGAAATCGA--                  Oki
## [6] --AATTTTTCTTTCACTTTCTTCAACACACTACTTTTAACTGCAATAA--CATGTTTAATC--AAGT--------TCCGCATGATCTTGTTTTTAACCTTTCTTCTTAGAATAACAAAA--AGATGCACCCAGCAACGATTCTCCGAGAAATCGAGC                  Kum

The right region of PAC3 on in all genomes

Same story on the right side. Very near the alignment break (at the center), there is the ATG and upstream of it there are two conserved AG that may be trans-splicing sites (with CAGE support at least in Oki).

PAC3_right_Oki_Osa  <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Kum_Osa  <- coa$Osa_Kum |> swap () |> subsetByOverlaps(PAC3_in_Kum + 1) |> sort(i=T) |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Osa      <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("right") |> plyranges::mutate(strand = '-')
PAC3_right_Aom      <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("right") |> plyranges::mutate(strand = '-')
PAC3_right_Oki_Bar  <- coa$Oki_Bar |> subsetByOverlaps(PAC3)           |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Bar      <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("left")
PAC3_right_Nor      <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> swap() |> plyranges::slice(3) |> get_bps("left")

PAC3_right <-
list( Aom = PAC3_right_Aom
    , Osa = PAC3_right_Osa
    , Oki = PAC3_right_Oki_Osa
    , Kum = PAC3_right_Kum_Osa
    , Bar = PAC3_right_Bar
    , Nor = PAC3_right_Nor)
PAC3_right |> lapply(\(gb) gb + 70) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment") 
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 150 columns
##      aln                                                                                                                                                                            names               
## [1] ------GTTCCGCACAATAAA-GTCGTAAAATTTTTTTTCGGAGAACTTTTTCGATCGTTATCTTTCAGACAGG--TAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGGGAAATACGGTCTGCAGCTCT                         Aom
## [2] ------GTTCCGCACAATAAA-GTCGTAAAAAATCTTTTCGGAGAACTTTTTCGATCGCAATCTTTCAGACAGG--TAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGGGAAATACGGTCTGCAGCTCT                         Osa
## [3] AACAAATTTCTGCAAAATGTA-GTCGTTCG---TTCTTACAGAATTGATTCCACAACGCATTTTTTCAGACAGA--TTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCTTGGAAAATACGGTCTGCAGC---                         Bar
## [4] AACAAATTTCTGCAAAATGTA-GTCGTTCG---TTCTTACAGAATTGATTCCAGAACGCATTTTTTCAGACAGA--TTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCTTGGAAAATACGGTCTGCAGC---                         Nor
## [5] -------TGCAGCAATTTAT--TTCAGTGTCCAGCAAAGCCTTATTTTTCCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACCAGACTGGGCAAGTTGGGCCAGGAGTTGC                         Oki
## [6] ---------CAGCAATTTATATTTCAGTGTCCAGCAAAGCCTTATTTTTCCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACTAGACTGGGCAAGTTGGGCCAGGAGTTGC                         Kum

Combined view of the left and right regions

The combined view shows that the sequence on the left and right of PAC3 do not have visible similarity with each other or with the bridge region.

PAC3_left_right <- list(
  Aom  = range(PAC3_left_Aom, PAC3_right_Aom),
  Osa  = range(PAC3_left_Osa, PAC3_right_Osa),
  OkiL = PAC3_left_Oki_Osa,
  OkiR = PAC3_right_Oki_Osa,
  KumL = PAC3_left_Kum_Osa,
  KumR = PAC3_right_Kum_Osa,
  Bar  = range(PAC3_left_Bar |> shift(20), PAC3_right_Bar),
  Nor  = range(PAC3_left_Nor |> shift(20), PAC3_right_Nor)
)
PAC3_left_right.aln <- PAC3_left_right |>
  lapply(\(gb) gb + 50) |>
  lapply(getSeq) |>
  lapply(unlist) |>
  as("DNAStringSet") |>
  msa::msaClustalOmega() |>
  as("DNAMultipleAlignment")
## using Gonnet
# Re-order based on my preference
PAC3_left_right.aln@unmasked[c(1,2,5,6,7,8,3,4)]
## DNAStringSet object of length 8:
##     width seq                                                                                                                                                                       names               
## [1]   172 CAACACACCACTTTTAACTGCAATAACATGTT------------TAATCAAGTTCCGCATGATCTCGTTTTTAACCTTTCTT-...------------------CTTAGAACAA---AAAAAAAAGATGCACCCGACA------------------------------- OkiL
## [2]   172 CAACACACTACTTTTAACTGCAATAACATGTT------------TAATCAAGTTCCGCATGATCTTGTTTTTAACCTTTCTT-...------------------CTTAGAATAA---CAAAAAGATGCACCCAGCAAC------------------------------- KumL
## [3]   172 ------CAAACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAA---ATGTAGTCGTTCGTTCTTA...TGATTCCACAACGCATTTTTTCAGACAGATTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCT--- Bar
## [4]   172 ------CAAACATTTAACTGCAATAAATCATGTTTAAACTAAAGTAACAAATTTCTGCAAA---ATGTAGTCGTTCGTTCTTA...TGATTCCAGAACGCATTTTTTCAGACAGATTATCGAATGTCTAGTCAGCGACCCGGCTCGACGCTGACCCACATCCGGCT--- Nor
## [5]   172 ----TTCAAACATTTAACTGCAATAAAATACATGATAATCAA--------AGTTCCGCACAATAAAGTCGTAAAATTTTTTTT...ACTTTTTCGATCGTTATCTTTCAGACAGGTAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGG Aom
## [6]   172 ----TTCAAACATTTAACTGCAATAAAATACATGATAATCAA--------TGTTCCGCACAATAAAGTCGTAAAAAATCTTTT...ACTTTTTCGATCGCAATCTTTCAGACAGGTAATCGAATGTCTAGTCAGACACCCGGCTCGACGCTGACACAATTCCGGCTTGG Osa
## [7]   172 --------------------------------------------------------------TCCAGCAAAGCCTTATTTTT-...--CCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACCAGACTGGG OkiR
## [8]   172 --------------------------------------------------------------TCCAGCAAAGCCTTATTTTT-...--CCACAAATTTGATATTTAAGAAAGGCGTTGTCGAATGTCTAGCCACCGGTCCGGCTCAACGCTCACTCATACTAGACTGGG KumR

PAC3 on the X-specific region of chrX (XSR)

The PAC3 (North) or bridge region (Oki) on the XSR.

In Okinawa, the bridge region contains a bidirectional promoter.

Flanking genes might be a leucine-rich repeat domain-containing protein on the left (same operon) and a glutamine-tRNA ligase on the right (different strand).

PAC3_XSR_Osa_range <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Aom_range <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Bar_range <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> range()
PAC3_XSR_Nor_range <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> range()

PAC3_XSR_Osa    <- coa$Oki_Osa |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Aom    <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Bar    <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Nor    <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()


(PAC3_XSR_Osa_Oki_triple <- coa$Osa_Oki |> subsetByOverlaps(PAC3_XSR_Osa +1000, ignore.strand = TRUE) |> sort(i=T))
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames          ranges strand |                  query     score      Arm                    rep   repOvlp            transcripts        flag    nonCoa
##          <Rle>       <IRanges>  <Rle> |              <GRanges> <integer> <factor>        <CharacterList> <integer>                  <Rle> <character> <logical>
##   [1]      XSR 3162876-3167494      + |    XSR:6227187-6231744      4619      XSR                   <NA>         0                   <NA>         Tra      TRUE
##   [2]      XSR 3167645-3168027      - | chr1:11497718-11498101       383      XSR                   <NA>         0              g12442.t1        <NA>      TRUE
##   [3]      XSR 3168339-3245366      + |    XSR:6232134-6307121     77028      XSR rnd,unknown,tandem,...      3497 g12443.t1;g12444.t1;..        <NA>     FALSE
##   -------
##   seqinfo: 483 sequences from OSKA2016v1.9 genome
(PAC3_XSR_Osa_Kum_triple <- coa$Osa_Kum |> subsetByOverlaps(PAC3_XSR_Osa +1000, ignore.strand = TRUE) |> sort(i=T))
## Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
##   suppressWarnings() to suppress this warning.)
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames          ranges strand |                       query     score      Arm                    rep   repOvlp            transcripts        flag    nonCoa
##          <Rle>       <IRanges>  <Rle> |                   <GRanges> <integer> <factor>        <CharacterList> <integer>                  <Rle> <character> <logical>
##   [1]      XSR 3162876-3167494      + | contig_42_1:6217599-6222157      4619      XSR                   <NA>         0                   <NA>         Tra      TRUE
##   [2]      XSR 3167635-3168032      - |  contig_3_1:3248027-3248425       398      XSR                   <NA>         0              g12442.t1        <NA>      TRUE
##   [3]      XSR 3168339-3245347      + | contig_42_1:6222547-6298065     77009      XSR rnd,unknown,tandem,...      3497 g12443.t1;g12444.t1;..        <NA>     FALSE
##   -------
##   seqinfo: 483 sequences from OSKA2016v1.9 genome
(PAC3_XSR_Aom_Oki_triple <- coa$Oki_Aom |> swap () |> subsetByOverlaps(PAC3_XSR_Aom + 1000, ignore.strand=T) |> sort(i=T))
## GBreaks object with 3 ranges and 4 metadata columns:
##         seqnames          ranges strand |                           rep   repOvlp            transcripts                  query
##            <Rle>       <IRanges>  <Rle> |               <CharacterList> <integer>                  <Rle>              <GRanges>
##   [1] contig_4_1 8681814-8752620      - | unknown,rnd,LowComplexity,...      1129 g10296.t1;g10297.t1;..    XSR:6232134-6307091
##   [2] contig_4_1 8752930-8753312      + |                          <NA>         0              g10322.t1 chr1:11497718-11498101
##   [3] contig_4_1 8753461-8758085      - |                          <NA>         0              g10323.t1    XSR:6227184-6231744
##   -------
##   seqinfo: 33 sequences from AOM.5.5f genome
# Bar-Oki alignment is in the opposite orientation
(PAC3_XSR_Bar_Oki_triple <- coa$Bar_Oki |> subsetByOverlaps(PAC3_XSR_Bar +1000, ignore.strand = TRUE) |> sort(i=T))
## GBreaks object with 3 ranges and 8 metadata columns:
##       seqnames          ranges strand |                  query     score      Arm                rep   repOvlp            transcripts        flag    nonCoa
##          <Rle>       <IRanges>  <Rle> |              <GRanges> <integer> <factor>    <CharacterList> <integer>                  <Rle> <character> <logical>
##   [1]      XSR 9725709-9802129      - |    XSR:6232388-6307088     76421      XSR tandem,rnd,unknown      2123 g12618.t1;g12619.t1;..         Tra     FALSE
##   [2]      XSR 9802763-9803154      + | chr1:11497717-11498108       392      XSR               <NA>         0    g12642.t1;g12642.t2        <NA>      TRUE
##   [3]      XSR 9803275-9808246      - |    XSR:6227190-6231744      4972      XSR                rnd       240                   <NA>        <NA>     FALSE
##   -------
##   seqinfo: 68 sequences from Bar2.p4 genome
(PAC3_XSR_Nor_Oki_triple <- coa$Oki_Nor |> swap () |> subsetByOverlaps(PAC3_XSR_Nor + 1000, ignore.strand=T) |> sort(i=T))
## GBreaks object with 3 ranges and 4 metadata columns:
##          seqnames        ranges strand |                          rep   repOvlp            transcripts                  query
##             <Rle>     <IRanges>  <Rle> |              <CharacterList> <integer>                  <Rle>              <GRanges>
##   [1] scaffold_42 100573-106048      + |                     rnd,MITE       663 GSOIDT00012492001;GS..    XSR:6227187-6231744
##   [2] scaffold_42 106169-106560      - |                         <NA>         0      GSOIDT00012497001 chr1:11497717-11498108
##   [3] scaffold_42 107194-181908      + | LowComplexity,rnd,tandem,...      3967 GSOIDT00012498001;GS..    XSR:6232388-6307088
##   -------
##   seqinfo: 1260 sequences from OdB3 genome
coa$Osa_Nor |> subsetByOverlaps(PAC3_XSR_Osa)
## Warning in .merge_two_Seqinfo_objects(x, y): The 2 combined objects have no sequence levels in common. (Use
##   suppressWarnings() to suppress this warning.)
## GBreaks object with 0 ranges and 8 metadata columns:
##    seqnames    ranges strand |     query     score      Arm             rep   repOvlp transcripts        flag    nonCoa
##       <Rle> <IRanges>  <Rle> | <GRanges> <integer> <factor> <CharacterList> <integer>       <Rle> <character> <logical>
##   -------
##   seqinfo: 483 sequences from OSKA2016v1.9 genome
PAC3_XSR_Osa_left  <- PAC3_XSR_Osa_Oki_triple[1] |> get_bps(dir = 'r')
PAC3_XSR_Osa_right <- PAC3_XSR_Osa_Oki_triple[3] |> get_bps(dir = 'l')
PAC3_XSR_Oki_left  <- PAC3_XSR_Osa_Oki_triple[1] |> swap() |> get_bps(dir = 'r')
PAC3_XSR_Oki_right <- PAC3_XSR_Osa_Oki_triple[3] |> swap() |> get_bps(dir = 'l')

PAC3_XSR_Kum_left  <- PAC3_XSR_Osa_Kum_triple[1] |> swap() |> get_bps(dir = 'r')
PAC3_XSR_Kum_right <- PAC3_XSR_Osa_Kum_triple[3] |> swap() |> get_bps(dir = 'l')

PAC3_XSR_Nor_left  <- PAC3_XSR_Nor_Oki_triple[1] |> get_bps(dir = 'r')
PAC3_XSR_Nor_right <- PAC3_XSR_Nor_Oki_triple[3] |> get_bps(dir = 'l')

# Remember that Bar-Oki alignment is in the opposite orientation
PAC3_XSR_Bar_left  <- PAC3_XSR_Bar_Oki_triple[3] |> get_bps(dir = 'l') |> plyranges::mutate(strand = '-')
PAC3_XSR_Bar_right <- PAC3_XSR_Bar_Oki_triple[1] |> get_bps(dir = 'r') |> plyranges::mutate(strand = '-')

# Remember that Oki-Aom alignment is in the opposite orientation
PAC3_XSR_Aom_left  <- PAC3_XSR_Aom_Oki_triple[3] |> get_bps(dir = 'l') |> plyranges::mutate(strand = '-')
PAC3_XSR_Aom_right <- PAC3_XSR_Aom_Oki_triple[1] |> get_bps(dir = 'r') |> plyranges::mutate(strand = '-')

PAC3_XSR_Aom    <- coa$Oki_Aom |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Bar    <- coa$Oki_Bar |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()
PAC3_XSR_Nor    <- coa$Oki_Nor |> subsetByOverlaps(PAC3) |> plyranges::slice(2) |> swap()

Left side

In Okinawa and Kume, the bridge region is ~400 bp-long. It will not be possible to make a combined left-right alignment like on chr1.

n <- -40
PAC3_XSR_Osa.aln.left <- list(
  Osa_left = PAC3_XSR_Osa_left |> shift(n),
  Aom_left = PAC3_XSR_Aom_left |> shift(-n),
  Oki_left = PAC3_XSR_Oki_left |> shift(n),
  Kum_left = PAC3_XSR_Kum_left |> shift(n),
  Nor_left = PAC3_XSR_Nor_left |> shift(n),
  Bar_left = PAC3_XSR_Bar_left |> shift(-n))

PAC3_XSR_Osa.aln.left |> lapply(\(gb) gb + 90) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment") 
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 198 columns
##      aln                                                                                                                                                                            names               
## [1] --------ATAGCCCCATTGGCAAAGAGTGTA-GAAATGTTAAGTTTTGTAAACTTT-GTCGATATCTAGCTTGTTTCCGTGATTC...AAAGAT--TGAACAATACCCTGATTGTTTTTTTCTTCAGGAAGTCTATCTTATACGAAGCGTGTTGAATACACGTGATAAG---- Osa_left
## [2] ---------TAGCCCCATTGGCAAAGAGTGTA-GAAGTGTTAAGTTTTGTAAACTTTTGTCGATATCTAGCTTGTTTCGGTGATTC...AAAGAT--TGAACAATACCCTGATTGTTTTTT-CTTCAGGAATTCTATCTTATACGAAGCGTGTTGAATACACGTGATAAGC--- Aom_left
## [3] TCTCTTGAAAATCCCCATCGTCGAAAAATGGATGTAGTGTTATGTTTCGTAAACTTTTGTTGATATCT----------CGTGATTC...AAAAGT--TCAACAATACACCAATTGTTTTTA----AGGGAATTCTGTTTTATAAACAGTGTGTAACTTGCAATTCGTCATTTCT Nor_left
## [4] TCTCTTGAAAATCCCCATCGTCGAAAAATGGATGTAGTGTTATGTTTCGTAAACTTTTGTTGATTTCT----------CGTGATTC...AAAAGT--TCAACAATACACCAATTGTTTTTA----AGGGAATTCTGTTTTATAAACAGTGTGTAACTTGCAATTCGTCATTTCT Bar_left
## [5] ---------AATACCCATCTTTGAAATTTTCCACTAGC-TCAGTTCTAGTTGGACAACGTCG-TATGGAG--AGTTCTAGTGATTC...AAATACGACGAACAATACGCGGATTGTTATTCACTGGGGATCACTCATGGCCTTGTGAATGCGTCTGGAGACCATTGTCGA---- Oki_left
## [6] ---------AATACCCATCTTTGAAATTTTCCACTAGC-TCAGTTCTAGTTGGACAACGTCG-TATGGAG--AGTTCTAGTGATTC...AAATACGACGAACAATACGCGGATTGTTATTCACTGGGGATCACTCATGGCCTTGTGAATGCGTCTGGAGACCATTGTCGA---- Kum_left

Right side

The right-side region is near a bidirectional promoter in both the Oki and Osa genomes.

PAC3_XSR_Osa.aln.right <- list(
  Osa_right = PAC3_XSR_Osa_right,
  Aom_right = PAC3_XSR_Aom_right,
  Oki_right = PAC3_XSR_Oki_right,
  Kum_right = PAC3_XSR_Kum_right,
  Nor_right = PAC3_XSR_Nor_right |> shift(-306),
  Bar_right = PAC3_XSR_Bar_right |> shift(306))

PAC3_XSR_Osa.aln.right |> lapply(\(gb) gb + 90) |> lapply(getSeq) |> lapply(unlist) |> as("DNAStringSet") |> msa::msaClustalW() |> as("DNAMultipleAlignment") 
## use default substitution matrix
## DNAMultipleAlignment with 6 rows and 188 columns
##      aln                                                                                                                                                                            names               
## [1] ---AGCA----AGCGTGAGAAAAAGTGAGCCCGCACTCCAAATGAATATTATCAACCCATCGCCAGACGTACTGTCAGAGAAACAC...ACTCTTAGAACAGCACAGAGATTTATCGCTCTACGAACCAAAAGTATACTGCGCCCTGTGGCTGTTCAAATAAAGGTTTACCCTT Nor_right
## [2] ---AGCA----AGCGTGAGAAAAAGTGAGCCCGCACTCCAAATGAATATTATCAACCCATCGCCAGACGTACTGTCAGAGAAACAC...ACTCTTAGAACAGCACAGAGATTTATCGCTCTACGAACCAAAAGTATACTGCGCCCTGTGGCTGTTCAAATAAAGGTTTACCCTT Bar_right
## [3] ---AGCC----AGCGTGAGAAAAAGTGCGCCCGCGCTCCAAATGAATATCATCAACCCATCGCCACTCGTACTGTCAGAGAAACAG...ACTATTAGAATAGCACAGAGCATAATCCGCCTACGAACAAAAAGTATTCTGCGCCCCGTGGCTGTCCAAATAAAGGTTTATCTTT Aom_right
## [4] ---AGCC----AGCGTGAGAAAAAGTGCGCCCGCGCTCCAAATGAATATCATCAACCCATCGCCACTCGTACTGTCAGAGAAACAG...ACTATTAGAATAGCACAGAGCATAATCCGCCTACGAACAAAAAGTATTCTGCGCCCTGTGGCTGTCCAAATAAAGGTTTATCTTT Osa_right
## [5] AGGAGCGCTGCGGCGTGAGAAAAGCTGCGCCCGCGCGC--ACTG--CTGCGCTGATGAGTATTTGGAC---TTGATTGGGAGAGAA...TCGATCAGAGTTGCGCAGAACTTTGTCCGCGTGAGAACGAAAAGTATTTTGCGCCCAGTGGCAGTTCAGATAAAGGTTATTCTTT Oki_right
## [6] AGGAGCGCTGCGGCGTGAGAAAAGCTGCGCCCGCGCGC--ACTG--CTGCGCTGATGAGTATTTGGAC---TTGATTGGGAGAGAA...TCGATCAGAGTTGCGCAGAACTTTGTCCGCGTGAGAACGAAAAGTATTTTGCGCCCAGTGGCAGTTCAGATAAAGGTTATTCTTT Kum_right

Session information

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux 12 (bookworm)
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
##  [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] BSgenome.Oidioi.genoscope.OdB3_1.0.0    BSgenome.Oidioi.OIST.AOM.5.5f_1.0.1     BSgenome.Oidioi.OIST.KUM.M3.7f_1.0.1    BSgenome.Oidioi.OIST.Bar2.p4_1.0.1     
##  [5] BSgenome.Oidioi.OIST.OSKA2016v1.9_1.0.0 BSgenome.Oidioi.OIST.OKI2018.I69_1.0.1  OikScrambling_5.0.0                     ggplot2_3.4.3                          
##  [9] GenomicBreaks_0.14.2                    BSgenome_1.68.0                         rtracklayer_1.60.0                      Biostrings_2.68.1                      
## [13] XVector_0.40.0                          GenomicRanges_1.52.0                    GenomeInfoDb_1.36.1                     IRanges_2.34.1                         
## [17] S4Vectors_0.38.1                        BiocGenerics_0.46.0                    
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.3.1               BiocIO_1.10.0               bitops_1.0-7                tibble_3.2.1                R.oo_1.25.0                 XML_3.99-0.14              
##   [7] rpart_4.1.19                lifecycle_1.0.3             rprojroot_2.0.3             lattice_0.20-45             MASS_7.3-58.2               backports_1.4.1            
##  [13] magrittr_2.0.3              Hmisc_5.1-0                 sass_0.4.7                  rmarkdown_2.23              jquerylib_0.1.4             yaml_2.3.7                 
##  [19] plotrix_3.8-2               DBI_1.1.3                   CNEr_1.36.0                 minqa_1.2.5                 RColorBrewer_1.1-3          ade4_1.7-22                
##  [25] abind_1.4-5                 zlibbioc_1.46.0             purrr_1.0.2                 R.utils_2.12.2              RCurl_1.98-1.12             nnet_7.3-18                
##  [31] pracma_2.4.2                GenomeInfoDbData_1.2.10     gdata_2.19.0                annotate_1.78.0             pkgdown_2.0.7               codetools_0.2-19           
##  [37] DelayedArray_0.26.7         tidyselect_1.2.0            shape_1.4.6                 lme4_1.1-34                 matrixStats_1.0.0           base64enc_0.1-3            
##  [43] GenomicAlignments_1.36.0    jsonlite_1.8.7              msa_1.32.0                  mitml_0.4-5                 Formula_1.2-5               survival_3.5-3             
##  [49] iterators_1.0.14            systemfonts_1.0.5           foreach_1.5.2               tools_4.3.1                 ragg_1.2.5                  Rcpp_1.0.11                
##  [55] glue_1.6.2                  gridExtra_2.3               pan_1.8                     xfun_0.40                   MatrixGenerics_1.12.2       EBImage_4.42.0             
##  [61] dplyr_1.1.3                 withr_2.5.1                 fastmap_1.1.1               boot_1.3-28.1               fansi_1.0.5                 digest_0.6.33              
##  [67] R6_2.5.1                    mice_3.16.0                 textshaping_0.3.7           colorspace_2.1-0            GO.db_3.17.0                gtools_3.9.4               
##  [73] poweRlaw_0.70.6             jpeg_0.1-10                 RSQLite_2.3.1               weights_1.0.4               R.methodsS3_1.8.2           utf8_1.2.3                 
##  [79] tidyr_1.3.0                 generics_0.1.3              data.table_1.14.8           httr_1.4.7                  htmlwidgets_1.6.2           S4Arrays_1.0.5             
##  [85] pkgconfig_2.0.3             gtable_0.3.4                blob_1.2.4                  htmltools_0.5.6.1           fftwtools_0.9-11            plyranges_1.20.0           
##  [91] scales_1.2.1                Biobase_2.60.0              png_0.1-8                   knitr_1.44                  heatmaps_1.24.0             rstudioapi_0.15.0          
##  [97] tzdb_0.4.0                  reshape2_1.4.4              rjson_0.2.21                checkmate_2.2.0             nlme_3.1-162                nloptr_2.0.3               
## [103] cachem_1.0.8                stringr_1.5.0               KernSmooth_2.23-20          parallel_4.3.1              genoPlotR_0.8.11            foreign_0.8-84             
## [109] AnnotationDbi_1.62.2        restfulr_0.0.15             desc_1.4.2                  pillar_1.9.0                grid_4.3.1                  vctrs_0.6.3                
## [115] jomo_2.7-6                  xtable_1.8-4                cluster_2.1.4               htmlTable_2.4.1             evaluate_0.22               readr_2.1.4                
## [121] cli_3.6.1                   locfit_1.5-9.8              compiler_4.3.1              Rsamtools_2.16.0            rlang_1.1.1                 crayon_1.5.2               
## [127] plyr_1.8.8                  fs_1.6.3                    stringi_1.7.12              BiocParallel_1.34.2         munsell_0.5.0               tiff_0.1-11                
## [133] glmnet_4.1-7                Matrix_1.5-3                hms_1.1.3                   bit64_4.0.5                 KEGGREST_1.40.0             SummarizedExperiment_1.30.2
## [139] broom_1.0.5                 memoise_2.0.1               bslib_0.5.1                 bit_4.0.5