simulate_population()
function
The heart of the {squidSim} R package is the simulate_population()
function, which we can use to simulate hierarchical, population level data. We provide the function with a set of parameters, a hierarchical data structure (if we are simulating hierarchical data), and various other optional arguments, which are listed below.
The simulate_population()
function simulates predictors at each hierarchical level, using provided mean and variance-covariance (vcov) parameters, from a multivariate normal distribution. These predictors are then scaled by the beta parameters, and added together to create the response. The arguments that can be provided to the simulate_population()
function (along with their defaults) are:
simulate_population(
data_structure,
n,
parameters, n_response=1,
response_names,family="gaussian",
link="identity",
model,
known_predictors,
pedigree,
pedigree_type,
phylogeny,
phylogeny_type,
cov_str,
sample_type,
sample_param,n_pop=1
)
Each of these will be covered in more detail in the following sections. Briefly, n
and data_structure
refer to the size and structure of the data being simulated - data_structure
is covered in more detail in Section 2. parameters
is a list of parameters to be used in the simulation and is described in detail in Section 1. n_response
refers the number of response variable to be simulated and is covered in detail in the section on multivariate models (Section 3). response_names
controls what the simulated response variables are named, and is described in Sections 1 and 3. family
and link
refer to simulating non Gaussian response variables and are covered in Section 1.6. model
allows for the specification of more complex models and is covered in Section 1.7. known_predictors
allows for existing data to be incorporated into the simulations and is covered in 1.5.
pedigree
and pedigree_type
relate to simulating genetic effects and are covered in Section 4, phylogeny
and phylogeny_type
, relate to simulating phylogenetic effects and are covered in Section 5 and cov_str
relates to simulating a general covariance structure and is covered in multiple sections, including 4, 5, 6.3 and 6.4.
sample_type
and sample_param
relate to different sampling methods and are covered in Section 7
n_pop
relates to the number of populations, or datasets, that you want to simulate for each parameter set. This is covered in Section 1.8.