1.6 Non-Gaussian phenotypes
To simulate non-Gaussian data, we can specify a link function and a family as arguments to simulate_population()
. Underneath the predictors are being simulated as multivariate normal (on the latent scale), and then the resulting phenotype is transformed (onto the expected scale) and then binomial or Poisson sampling is applied (the observed scale).
Here is an example to simulate Poisson distributed data: \[ y_i \sim Poisson(\lambda_i) \] \[ \lambda_i = exp( \beta_0 + \boldsymbol{x}_{i} \boldsymbol{\beta} + \epsilon_i ) \] \[ \boldsymbol{x}_i \sim \mathcal{N}(\boldsymbol{\mu}_x, \Sigma_x) \] \[ \epsilon_i \sim \mathcal{N}(0,\sigma^2_\epsilon) \]
The only change in the code that is needed is the addition of the link and family arguments.
<- simulate_population(
squid_data parameters = list(
observation = list(
names = c("temperature","rainfall"),
beta = c(0.2,0.1)
),residual = list(
mean = 1.75,
vcov = 0.2
)
),n = 2000,
family = "poisson",
link = "log"
)
<- get_population_data(squid_data)
data head(data)
## y temperature rainfall residual squid_pop
## 1 2 -0.5092091 -0.5464299 1.152254 1
## 2 8 -0.2682265 0.6459884 2.084300 1
## 3 4 -1.0337567 -0.1713375 1.579655 1
## 4 7 0.6838892 0.4909524 1.517778 1
## 5 16 1.0748364 0.4885968 2.185156 1
## 6 11 -0.9866906 0.1611376 1.788293 1
plot(table(data$y), ylab="Frequency", xlab="z")
glm(y ~ temperature + rainfall, data, family="poisson")
##
## Call: glm(formula = y ~ temperature + rainfall, family = "poisson",
## data = data)
##
## Coefficients:
## (Intercept) temperature rainfall
## 1.8464 0.1895 0.1207
##
## Degrees of Freedom: 1999 Total (i.e. Null); 1997 Residual
## Null Deviance: 5448
## Residual Deviance: 4759 AIC: 11780
Available families are ‘gaussian’, ‘poisson’ or ‘binomial’ and link functions ‘identity’, ‘log’, ‘inverse’, ‘sqrt’, ‘logit’, ‘probit’.