1.4 Transformations

We may want to simulate predictors that are not normally distributed. Although the underlying simulation procedure assumes multivariate normality, the predictors can be transformed, before they are multiplied by the beta values. To do this we can provide the transformation function to the functions option of a given parameter list, as a character vector. The given function needs to be a known function in R. The below code will exponentiate rainfall (using the exp function), before it is scaled by its beta (here 2).

squid_data <- simulate_population(
  n=2000,
  response_name = "body_mass",
  parameters=list(
    observation=list(
      names=c("temperature","rainfall"),
      functions=c(NA,"exp"),
      beta = c(0.5,0.3)
    ),
    residual=list(
      vcov=0.3
    )
  )
)

data <- get_population_data(squid_data)
head(data)
##     body_mass temperature  rainfall   residual squid_pop
## 1  0.17389503  -0.7657209 3.6142011 -0.5275048         1
## 2  0.31171085  -0.9411953 2.2069842  0.1202132         1
## 3  0.21164317  -0.8870048 1.0539333  0.3389656         1
## 4 -0.05429141   0.5958834 0.4053547 -0.4738395         1
## 5  0.63219255   1.7752999 1.4155625 -0.6801262         1
## 6  2.36817352   0.2084304 7.3877935  0.0476203         1
hist(data$rainfall, xlab="Rainfall",main="", breaks=100)

If a covariance between variables is specified, this covariance is on the untransformed (Gaussian) scale (as the variables are simulated as multivariate normal), NOT on the transformed scale, so care should be taken with this. For example:

squid_data <- simulate_population(
  n=2000,
  response_name = "body_mass",
  parameters=list(
    observation=list(
      names=c("temperature","rainfall"),
      vcov=matrix(c(1,0.7,0.7,1), nrow=2,byrow=TRUE),
      functions=c(NA,"exp"),
      beta = c(0.5,0.3)
    ),
    residual=list(
      vcov=0.3
    )
  )
)

data <- get_population_data(squid_data)

cov(data$temperature,data$rainfall)
## [1] 1.167142
cov(data$temperature,log(data$rainfall))
## [1] 0.6892072

The simulated covariance can be recovered on the back-transformed predictor.

The simulated_variance() function will also no longer be accurate, as the calculations are based on variables on the untransformed scale.