4.1 Additive genetics effects

In order to simulate breeding values (additive genetic effects), we can provide the simulate_population() function with the relatedness structure in the population. The simplest way to do this is providing a pedigree using the the pedigree argument (a genetic relatedness matrix could also be given to the cov_str argument). The input to this argument needs to be a list, and the name of the pedigree in the list links it with the item in the parameter list.

NOTE the simulate_population function has very little error checking of pedigree structure at the moment

When simulating breeding values, all individuals in pedigree need to be in the data_structure and vice versa. Having unsampled individuals (for example the base population) can be achieved in the sampling stage (not implemented yet).

Lets start by importing a pedigree from the pedtricks package

library(squidSim)
library(nadiv)
library(gremlin)
library(pedtricks)

data(gryphons)

ped <- fix_ped(gryphons[,1:3])
head(ped)
##    id  dam sire
## 1 204 <NA> <NA>
## 2 205 <NA> <NA>
## 3 206 <NA> <NA>
## 4 207 <NA> <NA>
## 5 208 <NA> <NA>
## 6 209 <NA> <NA>
names(ped)[1]<-"animal"

We can use this pedigree as a data_structure

squid_data <- simulate_population(
  data_structure = ped[,1:3], 
 
  pedigree = list(animal=ped[,1:3]),
  
  parameters =list(
    animal = list(
      vcov = 0.2
    ),
    residual = list(
      vcov = 0.5
    )
  )
)

data <- get_population_data(squid_data)
head(data)
##            y animal_effect   residual animal  dam sire squid_pop
## 1 -2.3553680    -0.2981992 -2.0571688    204 <NA> <NA>         1
## 2 -0.0831304     0.2218079 -0.3049383    205 <NA> <NA>         1
## 3 -0.1690798    -0.2820988  0.1130190    206 <NA> <NA>         1
## 4  1.6868459     0.6530291  1.0338168    207 <NA> <NA>         1
## 5 -0.4311466    -0.5338738  0.1027271    208 <NA> <NA>         1
## 6 -0.1819541    -0.4747195  0.2927654    209 <NA> <NA>         1

We can use the grelim pacakge to run a REML animal model

Ainv <- makeAinv(ped)$Ainv

mod <- gremlin(y~1, random=~ animal,data=data,ginverse=list(animal=Ainv))
## gremlin started:      18:14:56
## 'as(<dsyMatrix>, "dsCMatrix")' is deprecated.
## Use 'as(., "CsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
##   1 of max 20        lL:-7685.481081     took 0.0037 sec.
##   2 of max 20        lL:-1671.313492     took 0.0026 sec.
##   3 of max 20        lL:-1604.018169     took 0.0026 sec.
##   4 of max 20        lL:-1601.696981     took 0.0026 sec.
##   5 of max 20        lL:-1601.692845     took 0.0026 sec.
##   6 of max 20        lL:-1601.692845     took 0.0026 sec.
##   7 of max 20        lL:-1601.692845     took 0.0026 sec.
## 
## ***  REML converged  ***
## 
## gremlin ended:        18:14:56
summary(mod)
## 
##  Linear mixed model fit by REML [' gremlin ']
##  REML log-likelihood: -1601.693 
##   lambda: FALSE 
## 
##  elapsed time for model: 0.0564 
## 
##  Scaled residuals:
##      Min       1Q   Median       3Q      Max 
## -3.05039 -0.54998  0.00371  0.54487  3.14710 
## 
##  (co)variance parameters: ~animal 
##          rcov: ~units 
##          Estimate Std. Error
## G.animal   0.2854    0.02908
## ResVar1    0.4477    0.02554
## 
##  (co)variance parameter sampling correlations:
##          G.animal ResVar1
## G.animal   1.0000 -0.8469
## ResVar1   -0.8469  1.0000
## 
##  Fixed effects: y ~ 1 
##             Estimate Std. Error z value
## (Intercept) -0.02358    0.01526  -1.545

We don’t have to simulate a phenotype for everyone in the pedigree, so can include a subset of IDs in the data strcuture

ds <- data.frame(animal =ped[sample(1:nrow(ped),1000,replace=FALSE),1])

squid_data <- simulate_population(
  data_structure = ds, 
 
  pedigree = list(animal=ped[,1:3]),
  
  parameters =list(
    animal = list(
      vcov = 0.2
    ),
    residual = list(
      vcov = 0.5
    )
  )
)

data <- get_population_data(squid_data)
head(data)
##             y animal_effect    residual animal squid_pop
## 1 -0.46968357   -0.17338301 -0.29630056   3380         1
## 2 -0.08352477   -0.17213302  0.08860825    362         1
## 3 -1.48977040   -0.85945395 -0.63031645   1918         1
## 4 -0.20485162   -0.50879642  0.30394481   2708         1
## 5  0.39879740   -0.08200417  0.48080157   4287         1
## 6 -0.59120933   -0.19983313 -0.39137620   3418         1
Ainv <- makeAinv(ped)$Ainv

mod <- gremlin(y~1, random=~ animal,data=data,ginverse=list(animal=Ainv))
## gremlin started:      11:24:44
## 'as(<dsyMatrix>, "dsCMatrix")' is deprecated.
## Use 'as(., "CsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
##   1 of max 20        lL:-1551.356413     took 0.0034 sec.
##   2 of max 20        lL:-418.535368      took 0.0023 sec.
##   3 of max 20        lL:-365.175306      took 0.0022 sec.
##   4 of max 20        lL:-359.174824      took 0.0022 sec.
##   5 of max 20        lL:-359.005605      took 0.0022 sec.
##   6 of max 20        lL:-359.005415      took 0.0022 sec.
##   7 of max 20        lL:-359.005415      took 0.0022 sec.
## 
## ***  REML converged  ***
## 
## gremlin ended:        11:24:44
summary(mod)
## 
##  Linear mixed model fit by REML [' gremlin ']
##  REML log-likelihood: -359.0054 
##   lambda: FALSE 
## 
##  elapsed time for model: 0.0457 
## 
##  Scaled residuals:
##      Min       1Q   Median       3Q      Max 
## -2.43366 -0.45542  0.02354  0.43253  2.86515 
## 
##  (co)variance parameters: ~animal 
##          rcov: ~units 
##          Estimate Std. Error
## G.animal   0.4207    0.09582
## ResVar1    0.3480    0.08755
## 
##  (co)variance parameter sampling correlations:
##          G.animal ResVar1
## G.animal   1.0000 -0.9299
## ResVar1   -0.9299  1.0000
## 
##  Fixed effects: y ~ 1 
##             Estimate Std. Error z value
## (Intercept) -0.04197    0.03176  -1.321

We might want to simulate repeated measurements to allow estimation of permanent environment effects. The simplest way to do this is to create a duplicated column in the data structure of the individual IDs. Permanent environment effects that are not linked to the pedigree can then be simulated.

NOTICE!

The instructions given for simulating permanent environment effects using squidSim were incorrect in the vignette prior to version 0.2.0 (updated in September 2025).

## make data structure with two observations per individual, with ID duplicated in two columns, animal and individual, and link the animal column in the data structure to the pedigree
ds <- data.frame(animal=rep(ped[,1], 2),individual=rep(ped[,1], 2))

squid_data <- simulate_population(
  data_structure = ds, 
  pedigree=list(animal=ped),
  parameters = list(
    animal = list(
      vcov = 0.2
    ),
    individual = list(
      vcov = 0.3
    ),
    residual = list(
      vcov = 0.5
    )
  )
)

data <- get_population_data(squid_data)
head(data)
##            y animal_effect individual_effect   residual animal individual
## 1 -0.7812849     0.4665620        -1.0389875 -0.2088594    204        204
## 2 -0.3499417    -0.9695562         0.2860528  0.3335617    205        205
## 3 -0.7490084    -0.5394729         0.2739732 -0.4835087    206        206
## 4 -1.1704516    -0.2701808         0.1314451 -1.0317159    207        207
## 5  0.2931743     0.1025247        -0.6129429  0.8035925    208        208
## 6 -0.1245506     0.7061392        -0.1828131 -0.6478767    209        209
##   squid_pop
## 1         1
## 2         1
## 3         1
## 4         1
## 5         1
## 6         1
Ainv <- makeAinv(ped)$Ainv

mod_pe <- gremlin(y~1, random=~ animal + individual,data=data,ginverse=list(animal=Ainv))
## gremlin started:      11:24:44 
##   1 of max 20        lL:-12123.358564        took 0.0051 sec.
##   2 of max 20        lL:-4380.793795     took 0.0043 sec.
##   3 of max 20        lL:-4212.077157     took 0.0042 sec.
##   4 of max 20        lL:-4204.524508     took 0.0042 sec.
##   5 of max 20        lL:-4204.496825     took 0.0042 sec.
##   6 of max 20        lL:-4204.496825     took 0.0042 sec.
##   7 of max 20        lL:-4204.496825     took 0.0041 sec.
## 
## ***  REML converged  ***
## 
## gremlin ended:        11:24:44
summary(mod_pe)
## 
##  Linear mixed model fit by REML [' gremlin ']
##  REML log-likelihood: -4204.497 
##   lambda: FALSE 
## 
##  elapsed time for model: 0.0596 
## 
##  Scaled residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8558 -0.5524 -0.0036  0.5431  3.0406 
## 
##  (co)variance parameters: ~animal + individual 
##          rcov: ~units 
##              Estimate Std. Error
## G.animal       0.2237   0.029221
## G.individual   0.3022   0.027781
## ResVar1        0.4947   0.009977
## 
##  (co)variance parameter sampling correlations:
##                G.animal G.individual    ResVar1
## G.animal      1.000e+00      -0.8268 -1.794e-15
## G.individual -8.268e-01       1.0000 -1.796e-01
## ResVar1      -1.977e-15      -0.1796  1.000e+00
## 
##  Fixed effects: y ~ 1 
##             Estimate Std. Error z value
## (Intercept) 0.002897    0.01537  0.1885