For scaffolding or plotting purposes, it may be useful to merge some sequences into larger ones.
Arguments
- gr
A
GenomicRanges::GRangesobject.- seqs
A character vector of
Seqinfo::seqlevelsfromgr- name
The name of the new sequence level to be added
Value
Returns a modified GRanges object in which the sequences have been
merged. Its Seqinfo::seqinfo has a new entry for the new level, and the old
levels are not removed. If no seqlengths were present in the original
object, they are arbitrarily set as the maximal end value for each seqlevel.
The mergeSeqLevels_to_DF function returns a DataFrame in which
the start and end columns are in numeric mode. This is to cirvumvent
the fact that GenomicRanges object hardcode the mode of start and end
positions to integer, which does not allow values larger than
2,147,483,647, which does not allow to merge sequence levels of mammalian
or larger-scale genomes.
Note
Be careful that in some cases it is needed to "flip" the sequence
feature with reverse before merging, for instance when colinearity is
with its reverse strand.
See also
Other modifier functions:
bridgeRegions(),
coalesce_contigs(),
flipInversions(),
forceSeqLengths(),
guessSeqLengths(),
keepLongestPair(),
matchPairs(),
removeDoubleInversions(),
removeInversions(),
removeTranslocations(),
reverse(),
splitSeqLevel(),
swap()
Other scaffolding functions:
flipStrandNames(),
longestMatchesInTarget(),
scaffoldByFlipAndMerge(),
splitSeqLevel(),
strandNames()
Examples
gb <- GRanges(c("XSR:101-180:+", "XSR:201-300:+", "XSR:320-400:+"))
gb$query <- GRanges(c( "S1:101-200", "S2:1-100", "S3:1-100"))
seqlengths(gb$query) <- c(200, 100, 100)
genome(gb$query) <- "GenomeX"
isCircular(gb$query) <- rep(FALSE, 3)
seqinfo(gb$query)
#> Seqinfo object with 3 sequences from GenomeX genome:
#> seqnames seqlengths isCircular genome
#> S1 200 FALSE GenomeX
#> S2 100 FALSE GenomeX
#> S3 100 FALSE GenomeX
gb <- GBreaks(gb)
gb$query <- mergeSeqLevels(gb$query, c("S2", "S3"), "Scaf1")
gb
#> GBreaks object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | query
#> <Rle> <IRanges> <Rle> | <GRanges>
#> [1] XSR 101-180 + | S1:101-200
#> [2] XSR 201-300 + | Scaf1:1-100
#> [3] XSR 320-400 + | Scaf1:101-200
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
seqinfo(gb$query)
#> Seqinfo object with 4 sequences from GenomeX genome:
#> seqnames seqlengths isCircular genome
#> Scaf1 200 FALSE GenomeX
#> S1 200 FALSE GenomeX
#> S2 100 FALSE GenomeX
#> S3 100 FALSE GenomeX
mergeSeqLevels(gb, seqlevelsInUse(gb), "AllMerged")
#> GBreaks object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | query
#> <Rle> <IRanges> <Rle> | <GRanges>
#> [1] AllMerged 101-180 + | S1:101-200
#> [2] AllMerged 201-300 + | Scaf1:1-100
#> [3] AllMerged 320-400 + | Scaf1:101-200
#> -------
#> seqinfo: 2 sequences from an unspecified genome