Tamura 1992 distance

The Tamura 1992 (T92) distance extends the K80 distance by taking GC content into account. It is calculated as \(-h \ln \left(1 - \frac{p}{h} - q\right) - \frac{1}{2} \times (1 - h) \ln\left(1 - 2 q\right)\), where \(p\) is the probability of transition, \(q\) the probability of transversion, \(h = 2\theta (1 - \theta)\) and \(\theta\) is the GC content. See the Wikipedia for more details.

Usage

T92_distance(m, gc = c("average", "target", "query"))

Arguments

m: A matrix of counts or probabilities for bases of the target genome to be aligned to bases on the query genome. As a convenience it can also receive a list produced by the readTrainFile() function, containing this matrix.
gc: Calculate the GC content from the target, the query or average both?

Value

Returns a numeric value show the evolutionary distance between two genomes. the larger the value, the more different the two genomes are.

References

Tamura, K. (1992). "Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C-content biases." Molecular Biology and Evolution, 9(4), 678–687. DOI: 10.1093/oxfordjournals.molbev.a040752

Author

Zikun Yang

Charles Plessy

Examples

T92_distance(exampleSubstitutionMatrix)
#> [1] 0.2794627
T92_distance(exampleSubstitutionMatrix, gc="target")
#> [1] 0.2795033
T92_distance(exampleSubstitutionMatrix, gc="query")
#> [1] 0.2794238