Takes a phylogenetic hierarchical orthogroup table from the output of OrthoFinder, optionally subsets it for some species, and returns the table containing one-to-one orthogroups.

load_one_to_ones(file, species = NULL)

Arguments

file

Path to the OrthoFinder table.

species

A character vector of species annotation names matching exactly the names of the columns in the OrthoFinder table.

Value

Returns a DataFrame with one column per species, containing protein identifiers, and the columns HOG, OG and Gene.Tree.Parent.Clade.

Details

Orthogroups that are not one-to-one have either a missing entry for one species, represented as an empty string, or multiple entries, separated with comma characters. Species that are not part of the clade covered by the table have NA in every row.

Examples

# Example for loading all orthogroups:
OikScrambling:::load_one_to_ones(system.file("extdata/OrthoFinder/N19.tsv", package = "BreakpointsData"))
#> DataFrame with 5162 rows and 9 columns
#>                 HOG          OG Gene Tree Parent Clade
#>         <character> <character>            <character>
#> 1    N19.HOG0000000   OG0000000                    n15
#> 2    N19.HOG0000002   OG0000000                    n88
#> 3    N19.HOG0000003   OG0000000                    n96
#> 4    N19.HOG0000004   OG0000000                   n103
#> 5    N19.HOG0000008   OG0000000                   n156
#> ...             ...         ...                    ...
#> 5158 N19.HOG0014641   OG0014948                     n0
#> 5159 N19.HOG0014642   OG0014949                     n0
#> 5160 N19.HOG0014644   OG0014951                     n0
#> 5161 N19.HOG0014646   OG0014953                     n0
#> 5162 N19.HOG0014647   OG0014954                     n0
#>      AOM-5-5f.prot.longest.fa_1 Bar2_p4.Flye.prot.longest.fa_1
#>                     <character>                    <character>
#> 1             AOM-5-5f.g9811.t1         Bar2_p4_Flye.g12146.t1
#> 2            AOM-5-5f.g10661.t1         Bar2_p4_Flye.g12951.t1
#> 3            AOM-5-5f.g10128.t1         Bar2_p4_Flye.g12454.t1
#> 4             AOM-5-5f.g9214.t1         Bar2_p4_Flye.g11186.t1
#> 5             AOM-5-5f.g2181.t1          Bar2_p4_Flye.g8280.t1
#> ...                         ...                            ...
#> 5158          AOM-5-5f.g9913.t1         Bar2_p4_Flye.g12245.t1
#> 5159          AOM-5-5f.g9950.t1         Bar2_p4_Flye.g12277.t1
#> 5160          AOM-5-5f.g9966.t1         Bar2_p4_Flye.g12292.t1
#> 5161          AOM-5-5f.g9981.t1         Bar2_p4_Flye.g12306.t1
#> 5162          AOM-5-5f.g9987.t1         Bar2_p4_Flye.g12313.t1
#>      KUM-M3-7f.prot.longest.fa_1 OKI2018_I69.v2.prot.longest.fa_1
#>                      <character>                      <character>
#> 1            KUM-M3-7f.g12730.t1           OKI2018_I69.v2.g1680..
#> 2            KUM-M3-7f.g14801.t1           OKI2018_I69.v2.g7899..
#> 3            KUM-M3-7f.g11014.t1           OKI2018_I69.v2.g1507..
#> 4            KUM-M3-7f.g12676.t1           OKI2018_I69.v2.g1675..
#> 5             KUM-M3-7f.g5128.t1           OKI2018_I69.v2.g1156..
#> ...                          ...                              ...
#> 5158          KUM-M3-7f.g9515.t1           OKI2018_I69.v2.g1353..
#> 5159         KUM-M3-7f.g12745.t1           OKI2018_I69.v2.g1681..
#> 5160         KUM-M3-7f.g12085.t1           OKI2018_I69.v2.g1614..
#> 5161         KUM-M3-7f.g12067.t1           OKI2018_I69.v2.g1613..
#> 5162         KUM-M3-7f.g12060.t1           OKI2018_I69.v2.g1612..
#>      OSKA2016v1.9.prot.longest.fa_1 OdB3.v1.0.prot.fa_1.nohaplo
#>                         <character>                 <character>
#> 1            OSKA2016v1.9.g12947.t1      OdB3.GSOIDT00001753001
#> 2            OSKA2016v1.9.g12103.t1      OdB3.GSOIDT00003131001
#> 3            OSKA2016v1.9.g12629.t1      OdB3.GSOIDT00010574001
#> 4            OSKA2016v1.9.g13527.t1      OdB3.GSOIDT00005080001
#> 5             OSKA2016v1.9.g9587.t1      OdB3.GSOIDT00003802001
#> ...                             ...                         ...
#> 5158         OSKA2016v1.9.g12845.t1      OdB3.GSOIDT00006454001
#> 5159         OSKA2016v1.9.g12805.t1      OdB3.GSOIDT00006471001
#> 5160         OSKA2016v1.9.g12790.t1      OdB3.GSOIDT00006489001
#> 5161         OSKA2016v1.9.g12773.t1      OdB3.GSOIDT00006508001
#> 5162         OSKA2016v1.9.g12766.t1      OdB3.GSOIDT00006515001

# Example for loading a pair
OikScrambling:::load_one_to_ones( system.file("extdata/OrthoFinder/N19.tsv", package = "BreakpointsData")
                                  , c("Bar2_p4.Flye.prot.longest.fa_1", "OSKA2016v1.9.prot.longest.fa_1"))
#> DataFrame with 9172 rows and 5 columns
#>                 HOG          OG Gene Tree Parent Clade
#>         <character> <character>            <character>
#> 1    N19.HOG0000000   OG0000000                    n15
#> 2    N19.HOG0000002   OG0000000                    n88
#> 3    N19.HOG0000003   OG0000000                    n96
#> 4    N19.HOG0000004   OG0000000                   n103
#> 5    N19.HOG0000005   OG0000000                   n122
#> ...             ...         ...                    ...
#> 9168 N19.HOG0017512   OG0028110                      -
#> 9169 N19.HOG0017517   OG0028116                      -
#> 9170 N19.HOG0017521   OG0028120                      -
#> 9171 N19.HOG0017528   OG0028128                      -
#> 9172 N19.HOG0017530   OG0028130                      -
#>      Bar2_p4.Flye.prot.longest.fa_1 OSKA2016v1.9.prot.longest.fa_1
#>                         <character>                    <character>
#> 1            Bar2_p4_Flye.g12146.t1         OSKA2016v1.9.g12947.t1
#> 2            Bar2_p4_Flye.g12951.t1         OSKA2016v1.9.g12103.t1
#> 3            Bar2_p4_Flye.g12454.t1         OSKA2016v1.9.g12629.t1
#> 4            Bar2_p4_Flye.g11186.t1         OSKA2016v1.9.g13527.t1
#> 5            Bar2_p4_Flye.g11443.t1         OSKA2016v1.9.g14131.t1
#> ...                             ...                            ...
#> 9168          Bar2_p4_Flye.g7540.t1          OSKA2016v1.9.g6477.t1
#> 9169           Bar2_p4_Flye.g801.t1          OSKA2016v1.9.g1518.t1
#> 9170          Bar2_p4_Flye.g8448.t1          OSKA2016v1.9.g9813.t1
#> 9171          Bar2_p4_Flye.g9382.t1         OSKA2016v1.9.g10500.t1
#> 9172          Bar2_p4_Flye.g9616.t1          OSKA2016v1.9.g9098.t1