Takes a phylogenetic hierarchical orthogroup table from the output of OrthoFinder, optionally subsets it for some species, and returns the table containing one-to-one orthogroups.
load_one_to_ones(file, species = NULL)
Path to the OrthoFinder table.
A character vector of species annotation names matching exactly the names of the columns in the OrthoFinder table.
Returns a DataFrame
with one column per species, containing
protein identifiers, and the columns HOG
, OG
and Gene.Tree.Parent.Clade
.
Orthogroups that are not one-to-one have either a missing entry for one
species, represented as an empty string, or multiple entries, separated
with comma characters. Species that are not part of the clade covered by
the table have NA
in every row.
# Example for loading all orthogroups:
OikScrambling:::load_one_to_ones(system.file("extdata/OrthoFinder/N19.tsv", package = "BreakpointsData"))
#> DataFrame with 5162 rows and 9 columns
#> HOG OG Gene Tree Parent Clade
#> <character> <character> <character>
#> 1 N19.HOG0000000 OG0000000 n15
#> 2 N19.HOG0000002 OG0000000 n88
#> 3 N19.HOG0000003 OG0000000 n96
#> 4 N19.HOG0000004 OG0000000 n103
#> 5 N19.HOG0000008 OG0000000 n156
#> ... ... ... ...
#> 5158 N19.HOG0014641 OG0014948 n0
#> 5159 N19.HOG0014642 OG0014949 n0
#> 5160 N19.HOG0014644 OG0014951 n0
#> 5161 N19.HOG0014646 OG0014953 n0
#> 5162 N19.HOG0014647 OG0014954 n0
#> AOM-5-5f.prot.longest.fa_1 Bar2_p4.Flye.prot.longest.fa_1
#> <character> <character>
#> 1 AOM-5-5f.g9811.t1 Bar2_p4_Flye.g12146.t1
#> 2 AOM-5-5f.g10661.t1 Bar2_p4_Flye.g12951.t1
#> 3 AOM-5-5f.g10128.t1 Bar2_p4_Flye.g12454.t1
#> 4 AOM-5-5f.g9214.t1 Bar2_p4_Flye.g11186.t1
#> 5 AOM-5-5f.g2181.t1 Bar2_p4_Flye.g8280.t1
#> ... ... ...
#> 5158 AOM-5-5f.g9913.t1 Bar2_p4_Flye.g12245.t1
#> 5159 AOM-5-5f.g9950.t1 Bar2_p4_Flye.g12277.t1
#> 5160 AOM-5-5f.g9966.t1 Bar2_p4_Flye.g12292.t1
#> 5161 AOM-5-5f.g9981.t1 Bar2_p4_Flye.g12306.t1
#> 5162 AOM-5-5f.g9987.t1 Bar2_p4_Flye.g12313.t1
#> KUM-M3-7f.prot.longest.fa_1 OKI2018_I69.v2.prot.longest.fa_1
#> <character> <character>
#> 1 KUM-M3-7f.g12730.t1 OKI2018_I69.v2.g1680..
#> 2 KUM-M3-7f.g14801.t1 OKI2018_I69.v2.g7899..
#> 3 KUM-M3-7f.g11014.t1 OKI2018_I69.v2.g1507..
#> 4 KUM-M3-7f.g12676.t1 OKI2018_I69.v2.g1675..
#> 5 KUM-M3-7f.g5128.t1 OKI2018_I69.v2.g1156..
#> ... ... ...
#> 5158 KUM-M3-7f.g9515.t1 OKI2018_I69.v2.g1353..
#> 5159 KUM-M3-7f.g12745.t1 OKI2018_I69.v2.g1681..
#> 5160 KUM-M3-7f.g12085.t1 OKI2018_I69.v2.g1614..
#> 5161 KUM-M3-7f.g12067.t1 OKI2018_I69.v2.g1613..
#> 5162 KUM-M3-7f.g12060.t1 OKI2018_I69.v2.g1612..
#> OSKA2016v1.9.prot.longest.fa_1 OdB3.v1.0.prot.fa_1.nohaplo
#> <character> <character>
#> 1 OSKA2016v1.9.g12947.t1 OdB3.GSOIDT00001753001
#> 2 OSKA2016v1.9.g12103.t1 OdB3.GSOIDT00003131001
#> 3 OSKA2016v1.9.g12629.t1 OdB3.GSOIDT00010574001
#> 4 OSKA2016v1.9.g13527.t1 OdB3.GSOIDT00005080001
#> 5 OSKA2016v1.9.g9587.t1 OdB3.GSOIDT00003802001
#> ... ... ...
#> 5158 OSKA2016v1.9.g12845.t1 OdB3.GSOIDT00006454001
#> 5159 OSKA2016v1.9.g12805.t1 OdB3.GSOIDT00006471001
#> 5160 OSKA2016v1.9.g12790.t1 OdB3.GSOIDT00006489001
#> 5161 OSKA2016v1.9.g12773.t1 OdB3.GSOIDT00006508001
#> 5162 OSKA2016v1.9.g12766.t1 OdB3.GSOIDT00006515001
# Example for loading a pair
OikScrambling:::load_one_to_ones( system.file("extdata/OrthoFinder/N19.tsv", package = "BreakpointsData")
, c("Bar2_p4.Flye.prot.longest.fa_1", "OSKA2016v1.9.prot.longest.fa_1"))
#> DataFrame with 9172 rows and 5 columns
#> HOG OG Gene Tree Parent Clade
#> <character> <character> <character>
#> 1 N19.HOG0000000 OG0000000 n15
#> 2 N19.HOG0000002 OG0000000 n88
#> 3 N19.HOG0000003 OG0000000 n96
#> 4 N19.HOG0000004 OG0000000 n103
#> 5 N19.HOG0000005 OG0000000 n122
#> ... ... ... ...
#> 9168 N19.HOG0017512 OG0028110 -
#> 9169 N19.HOG0017517 OG0028116 -
#> 9170 N19.HOG0017521 OG0028120 -
#> 9171 N19.HOG0017528 OG0028128 -
#> 9172 N19.HOG0017530 OG0028130 -
#> Bar2_p4.Flye.prot.longest.fa_1 OSKA2016v1.9.prot.longest.fa_1
#> <character> <character>
#> 1 Bar2_p4_Flye.g12146.t1 OSKA2016v1.9.g12947.t1
#> 2 Bar2_p4_Flye.g12951.t1 OSKA2016v1.9.g12103.t1
#> 3 Bar2_p4_Flye.g12454.t1 OSKA2016v1.9.g12629.t1
#> 4 Bar2_p4_Flye.g11186.t1 OSKA2016v1.9.g13527.t1
#> 5 Bar2_p4_Flye.g11443.t1 OSKA2016v1.9.g14131.t1
#> ... ... ...
#> 9168 Bar2_p4_Flye.g7540.t1 OSKA2016v1.9.g6477.t1
#> 9169 Bar2_p4_Flye.g801.t1 OSKA2016v1.9.g1518.t1
#> 9170 Bar2_p4_Flye.g8448.t1 OSKA2016v1.9.g9813.t1
#> 9171 Bar2_p4_Flye.g9382.t1 OSKA2016v1.9.g10500.t1
#> 9172 Bar2_p4_Flye.g9616.t1 OSKA2016v1.9.g9098.t1