Skip to contents

For scaffolding or plotting purposes, it may be useful to merge some sequences into larger ones.

Usage

mergeSeqLevels(gr, seqs, name)

mergeSeqLevels_to_DF(gr, seqs, name)

Arguments

gr

A GRanges object.

seqs

A character vector of seqlevels from gr

name

The name of the new sequence level to be added

Value

Returns a modified GRanges object in which the sequences have been merged. Its seqinfo has a new entry for the new level, and the old levels are not removed. If no seqlengths were present in the original object, they are arbitrarily set as the maximal end value for each seqlevel.

The mergeSeqLevels_to_DF function returns a DataFrame in which the start and end columns are in numeric mode. This is to cirvumvent the fact that GenomicRanges object hardcode the mode of start and end

positions to integer, which does not allow values larger than 2,147,483,647, which does not allow to merge sequence levels of mammalian or larger-scale genomes.

Note

Be careful that in some cases it is needed to "flip" the sequence feature with reverse before merging, for instance when colinearity is with its reverse strand.

Author

Charles Plessy

Examples

gb       <- GRanges(c("XSR:101-180:+", "XSR:201-300:+",  "XSR:320-400:+"))
gb$query <- GRanges(c( "S1:101-200",      "S2:1-100",    "S3:1-100"))
seqlengths(gb$query) <- c(200, 100, 100)
genome(gb$query) <- "GenomeX"
isCircular(gb$query) <- rep(FALSE, 3)
seqinfo(gb$query)
#> Seqinfo object with 3 sequences from GenomeX genome:
#>   seqnames seqlengths isCircular  genome
#>   S1              200      FALSE GenomeX
#>   S2              100      FALSE GenomeX
#>   S3              100      FALSE GenomeX
gb <- GBreaks(gb)
gb$query <- mergeSeqLevels(gb$query, c("S2", "S3"), "Scaf1")
gb
#> GBreaks object with 3 ranges and 1 metadata column:
#>       seqnames    ranges strand |         query
#>          <Rle> <IRanges>  <Rle> |     <GRanges>
#>   [1]      XSR   101-180      + |    S1:101-200
#>   [2]      XSR   201-300      + |   Scaf1:1-100
#>   [3]      XSR   320-400      + | Scaf1:101-200
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
seqinfo(gb$query)
#> Seqinfo object with 4 sequences from GenomeX genome:
#>   seqnames seqlengths isCircular  genome
#>   Scaf1           200      FALSE GenomeX
#>   S1              200      FALSE GenomeX
#>   S2              100      FALSE GenomeX
#>   S3              100      FALSE GenomeX

mergeSeqLevels(gb, seqlevelsInUse(gb), "AllMerged")
#> GBreaks object with 3 ranges and 1 metadata column:
#>        seqnames    ranges strand |         query
#>           <Rle> <IRanges>  <Rle> |     <GRanges>
#>   [1] AllMerged   101-180      + |    S1:101-200
#>   [2] AllMerged   201-300      + |   Scaf1:1-100
#>   [3] AllMerged   320-400      + | Scaf1:101-200
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome