Equilibrium GC fraction (GC*)

Computes the predicted equilibrium GC content GC* from a nucleotide substitution count matrix. GC* is the fraction of G/C toward which base composition will evolve under a two-class (weak vs. strong) mutation-bias model, using the asymmetry between AT→GC (W→S) and GC→AT (S→W) changes (Sueoka, 1962).

Usage

GCequilibrium(m)

Arguments

m: A matrix of counts or probabilities for bases of the target genome to be aligned to bases on the query genome. As a convenience it can also receive a list produced by the readTrainFile() function, containing this matrix.

Value

A numeric value between 0 and 1.

Details

The classic directional-mutation equilibrium for GC content is:

$$\mathrm{GC}^* = \frac{\text{W}\to\text{S}}{\text{W}\to\text{S} \quad+\quad \text{S}\to\text{W}}$$

Note

If the target genome is not the true ancestor (which is likely in simple pairwise comparisons of extant genomes), GC* should be interpreted cautiously. It does not predict future GC content, but its position relative to the GC of the target can still indicate the direction of substitution bias: if GC* is higher than the target GC, the bias favors G/C; if lower, it favors A/T. Transpose the input matrix to study the query genome the same way.

References

Noboru Sueoka. On the genetic basis of variation and heterogeneity of DNA base composition. (1962) Proc Natl Acad Sci U S A 48(4):582-92. doi:10.1073/pnas.48.4.582

Author

Charles Plessy

Examples

GCequilibrium(exampleSubstitutionMatrix)
#> [1] 0.5144417
GCequilibrium(t(exampleSubstitutionMatrix))
#> [1] 0.4855583