1.7 Model equations

In all the examples so far, the predictors are simulated, multiplied by their respective beta value, and then added together. When simulating more complex models, we may want to prevent some of this behaviour or add in additional parameters, interactions or general complexity.

To introduce this increased complexity, we can specify a model formula. This explicitly tells simulate_population() how to put the simulated predictors together to form the response variable. We can first demonstrate this with a simple linear model.

squid_data <- simulate_population(
  parameters=list(
    observation= list(
      names = c("temperature", "rainfall"),
      beta =c(0.5,0.3)  
    ),
    residual = list(
      names="residual",
      vcov=1
    )
  ),
  n=2000,
  model = "y = temperature + rainfall + residual"
)

data <- get_population_data(squid_data)

coef(lm(y ~ temperature + rainfall, data))
## (Intercept) temperature    rainfall 
## 0.004605396 0.506914670 0.288952532

In the formula, we write out how the variables are added up. Everything that you want exported needs to be defined and named (e.g. y=...). By default, all predictors are multiplied by their respective beta values before this happens. Sometimes it is useful to prevent this multiplication (e.g. multiply two traits together without them being multiplied by their respective beta). We can do this by using I().

squid_data <- simulate_population(
  parameters=list(
    observation= list(
      names = c("temperature", "rainfall"),
      beta =c(0.5,0.3) 
    ),
    residual = list(
      names="residual",
      vcov=1
    )
  ),
  n=2000,
  model = "y = temperature + I(rainfall) + residual"
)

data <- get_population_data(squid_data)

coef(lm(y ~ temperature + rainfall, data))
##  (Intercept)  temperature     rainfall 
## -0.004985291  0.495841550  0.960772034

We can also add extra parameters to the parameter list, which we can call from within the function. In combination with I() we can then customise the model formula a bit

squid_data <- simulate_population(
  parameters=list(
    observation= list(
      names = c("temperature", "rainfall"),
      beta =c(0.5,0.3),
      extra_beta = 0.1  
      ),
    residual = list(
      names="residual",
      vcov=1
      )
  ),
  n=2000,
  model = "y = temperature + extra_beta*I(rainfall) + residual"
)

data <- get_population_data(squid_data)

coef(lm(y ~ temperature + rainfall, data))
##  (Intercept)  temperature     rainfall 
## -0.000221756  0.481460783  0.085752465

Finally, we can use [] to index the levels of the random effects within the formula.. An example of this is given Section 4.4, along with use of the index_link argument.