Simulate the coevolution of multiple variables in discrete time steps

This function simulates the coevolution of multiple continuous variables in discrete time steps following a simple VAR(1) autoregressive model. Users set the sample size, the variable names, the strength of selection and drift, and the probability of a speciation event in a given time step. The function returns a phylogeny, the results of the simulation run, and a dataset of contemporary trait values.

coev_simulate_coevolution(
  n,
  variables,
  selection_matrix,
  drift,
  prob_split,
  intercepts = NULL,
  ancestral_states = NULL
)

Arguments

n: Number of data points in the resulting data frame.
variables: A character vector of variable names (e.g., c("x","y"))
selection_matrix: A numeric matrix determining the strength of selection between variables. The matrix must have a number of rows and columns equal to the number of variables and its row and column names must contain all specified variables. Each cell determines the strength of selection from the column variable to the row variable. For example, the cell on the "x" column and the "x" row indicates how much previous values of x influence future values of x (autocorrelation). By contrast, the cell on the "x" column and the "y" row indicates how much previous values of x influence future values of y (cross-lagged effect).
drift: A named numeric vector specifying the strength of drift for different variables. Names must include all specified variables.
prob_split: A numeric probability of a species split in any given timestep.
intercepts: Intercepts for the VAR(1) model. If NULL (default), intercepts are set to zero for all variables. Otherwise, a named numeric vector specifying the intercepts for different variables. Names must include all specified variables.
ancestral_states: Ancestral states for different variables. If NULL (default), ancestral states are set to zero for all variables. Otherwise, a named numeric vector specifying the ancestral states for different variables. Names must include all specified variables.

Value

List with dataset at final timestep (data), full simulation log (simulation), and pruned phylogenetic tree (tree)

Details

The model underlying this simulation is a simple VAR(1) autoregressive model, where values of all variables at the previous timestep predict values at the current timestep. In the case of two variables, the model is as follows: $$Y_t = \alpha_{y}+\beta_{y,y}Y_{t-1}+\beta_{y,x}X_{t-1} + \mathcal{N}(0,\epsilon_{y})$$ $$X_t = \alpha_{x}+\beta_{x,x}X_{t-1}+\beta_{x,y}Y_{t-1} + \mathcal{N}(0,\epsilon_{x})$$ where $\alpha$ represents the intercepts, $\beta$ represents the selection matrix, and $\epsilon$ represents the vector of drift parameters. With some probability $p$, a speciation event creates two independent evolutionary branches. This simulation continues until the intended sample size of species has been reached.

Author

Scott Claessens scott.claessens@gmail.com, Erik Ringen erikjacob.ringen@uzh.ch

Examples

# simulate coevolution of x and y
n <- 100
variables <- c("x","y")
# x -> y but not vice versa
selection_matrix <- matrix(
  c(
    0.95, 0.00,
    0.80, 0.95
  ),
  nrow = 2,
  byrow = TRUE,
  dimnames = list(variables, variables)
)
drift <- c("x" = 0.05, "y" = 0.05)
prob_split <- 0.05
# run simulation
sim <-
  coev_simulate_coevolution(
    n, variables, selection_matrix,
    drift, prob_split
  )