1.6 Non-Gaussian phenotypes

To simulate non-Gaussian data, we can specify a link function and a family as arguments to simulate_population(). Underneath the predictors are being simulated as multivariate normal (on the latent scale), and then the resulting phenotype is transformed (onto the expected scale) and then binomial or Poisson sampling is applied (the observed scale).

Here is an example to simulate Poisson distributed data: \[ y_i \sim Poisson(\lambda_i) \] \[ \lambda_i = exp( \beta_0 + \boldsymbol{x}_{i} \boldsymbol{\beta} + \epsilon_i ) \] \[ \boldsymbol{x}_i \sim \mathcal{N}(\boldsymbol{\mu}_x, \Sigma_x) \] \[ \epsilon_i \sim \mathcal{N}(0,\sigma^2_\epsilon) \]

The only change in the code that is needed is the addition of the link and family arguments.

squid_data <- simulate_population(
  parameters = list(
    observation = list(
      names = c("temperature","rainfall"),
      beta = c(0.2,0.1)
    ),
    residual = list(
      mean = 1.75,
      vcov = 0.2
    )
  ),
  n = 2000,
  family = "poisson", 
  link = "log"
)

data <- get_population_data(squid_data)
head(data)

##    y temperature   rainfall residual squid_pop
## 1  2  -0.5092091 -0.5464299 1.152254         1
## 2  8  -0.2682265  0.6459884 2.084300         1
## 3  4  -1.0337567 -0.1713375 1.579655         1
## 4  7   0.6838892  0.4909524 1.517778         1
## 5 16   1.0748364  0.4885968 2.185156         1
## 6 11  -0.9866906  0.1611376 1.788293         1

plot(table(data$y), ylab="Frequency", xlab="z")

glm(y ~ temperature + rainfall, data, family="poisson")

## 
## Call:  glm(formula = y ~ temperature + rainfall, family = "poisson", 
##     data = data)
## 
## Coefficients:
## (Intercept)  temperature     rainfall  
##      1.8464       0.1895       0.1207  
## 
## Degrees of Freedom: 1999 Total (i.e. Null);  1997 Residual
## Null Deviance:       5448 
## Residual Deviance: 4759  AIC: 11780

Available families are ‘gaussian’, ‘poisson’ or ‘binomial’ and link functions ‘identity’, ‘log’, ‘inverse’, ‘sqrt’, ‘logit’, ‘probit’.