Title: | Simulate Forensic DNA Mixtures |
---|---|
Description: | Mixed DNA profiles can be sampled according to models for probabilistic genotyping. Peak height variability is modelled using a log normal distribution or a gamma distribution. Sample contributors may be related according to a pedigree. |
Authors: | Maarten Kruijver [aut, cre] |
Maintainer: | Maarten Kruijver <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.1 |
Built: | 2025-02-16 05:32:38 UTC |
Source: | https://github.com/mkruijver/simdnamixtures |
Stutter model where the expected stutter rate depends on the allele and locus
allele_specific_stutter_model( stutter_types, size_regression, sex_locus_name = "AMEL" )
allele_specific_stutter_model( stutter_types, size_regression, sex_locus_name = "AMEL" )
stutter_types |
List. See stutter_type. |
size_regression |
Function, see read_size_regression. |
sex_locus_name |
Character vector, defaults to "AMEL". |
When a pg_model is constructed (see gamma_model), a stutter model can optionally be applied. The allele specific stutter model is commonly used with a log normal model. The expected stutter ratio for a parent allele at a locus is obtained from a linear regression of observed stutter ratios against allele length. For some loci or alleles the linear model may not be satisfactory. To override the expected stutter rates for specific alleles, a list of exceptions can be used. See stutter_type for more detail.
Object of class stutter_model
to be used by e.g. log_normal_model.
global_stutter_model for a stutter model where the expected stutter ratio does not depend on the locus or parent allele.
# we will define an allele specific stutter model for back stutter only # prepare stutter regression filename_bs_regression <- system.file("extdata", "GlobalFiler_Stutter_3500.txt",package = "simDNAmixtures") bs_regression <- read_stutter_regression(filename_bs_regression) # prepare exceptions, i.e. where does the regression not apply? filename_bs_exceptions <- system.file("extdata", "GlobalFiler_Stutter_Exceptions_3500.csv",package = "simDNAmixtures") bs_exceptions <- read_stutter_exceptions(filename_bs_exceptions) # prepare a stutter type backstutter <- stutter_type(name = "BackStutter", delta = -1, stutter_regression = bs_regression, stutter_exceptions = bs_exceptions) # assign stutter model size_regression <- read_size_regression(system.file("extdata", "GlobalFiler_SizeRegression.csv",package = "simDNAmixtures")) bs_model <- allele_specific_stutter_model(list(backstutter), size_regression) bs_model
# we will define an allele specific stutter model for back stutter only # prepare stutter regression filename_bs_regression <- system.file("extdata", "GlobalFiler_Stutter_3500.txt",package = "simDNAmixtures") bs_regression <- read_stutter_regression(filename_bs_regression) # prepare exceptions, i.e. where does the regression not apply? filename_bs_exceptions <- system.file("extdata", "GlobalFiler_Stutter_Exceptions_3500.csv",package = "simDNAmixtures") bs_exceptions <- read_stutter_exceptions(filename_bs_exceptions) # prepare a stutter type backstutter <- stutter_type(name = "BackStutter", delta = -1, stutter_regression = bs_regression, stutter_exceptions = bs_exceptions) # assign stutter model size_regression <- read_size_regression(system.file("extdata", "GlobalFiler_SizeRegression.csv",package = "simDNAmixtures")) bs_model <- allele_specific_stutter_model(list(backstutter), size_regression) bs_model
Defines a (semi-continuous) drop model
drop_model(dropout_probabilities, drop_in_rate = 0, freqs, model_settings)
drop_model(dropout_probabilities, drop_in_rate = 0, freqs, model_settings)
dropout_probabilities |
Numeric vector with values between 0 and 1. Dropout probabilities for each contributor. |
drop_in_rate |
Numeric vector of length one. Expected number of drop-ins per locus. Default is 0. |
freqs |
Optionally a list with allele frequencies (needed when drop_in_rate > 0). See read_allele_freqs. |
model_settings |
List. Possible parameters:
|
Define the classes semi-continuous drop-model. The model may then be used to sample DNA profiles using the sample_mixture_from_genotypes function. Alternatively, to sample many models and profiles in one go with parameters according to a specified distribution, the sample_mixtures function can be used.
Object of class pg_model
.
Slooten, K. (2017). Accurate assessment of the weight of evidence for DNA mixtures by integrating the likelihood ratio. Forensic Science International: Genetics, 27, 1-16. doi:10.1016/j.fsigen.2016.11.001
sample_mixture_from_genotypes, sample_mixtures, gamma_model, log_normal_model.
gf <- gf_configuration() freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) k2 <- sample_log_normal_stutter_variance(gf$log_normal_settings$stutter_variability) model <- log_normal_model(template = 1e3, c2 = 15, k2 = k2, model_settings = gf$log_normal_settings) model
gf <- gf_configuration() freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) k2 <- sample_log_normal_stutter_variance(gf$log_normal_settings$stutter_variability) model <- log_normal_model(template = 1e3, c2 = 15, k2 = k2, model_settings = gf$log_normal_settings) model
Defines a gamma model for peak height variability
gamma_model( mixture_proportions, mu, cv, degradation_beta = rep(1, length(mixture_proportions)), LSAE = stats::setNames(rep(1, length(model_settings$locus_names)), model_settings$locus_names), model_settings )
gamma_model( mixture_proportions, mu, cv, degradation_beta = rep(1, length(mixture_proportions)), LSAE = stats::setNames(rep(1, length(model_settings$locus_names)), model_settings$locus_names), model_settings )
mixture_proportions |
Numeric vector with the mixture proportion for each contributor. |
mu |
Numeric. Expectation of a full heterozygote contributing allele peak height. |
cv |
Numeric. Coefficient of variation of a full heterozygote contributing allele peak height |
degradation_beta |
Numeric Vector of same length as mixture_proportions. Degradation slope parameters for each contributor. Defaults to 1 for each contributor (i.e. not degraded) |
LSAE |
Numeric vector (named) with Locus Specific Amplification Efficiencies. See sample_LSAE. Defaults to 1 for each locus. |
model_settings |
List. Possible parameters:
|
Define a gamma model for peak height variability with the parametrisation as described by Bleka et al. The model may then be used to sample DNA profiles using the sample_mixture_from_genotypes function. Alternatively, to sample many models and profiles in one go with parameters according to a specified distribution, the sample_mixtures function can be used.
Object of class pg_model
.
Bleka, Ø., Storvik, G., & Gill, P. (2016). EuroForMix: An open source software based on a continuous model to evaluate STR DNA profiles from a mixture of contributors with artefacts. Forensic Science International: Genetics, 21, 35-44. doi:10.1016/j.fsigen.2015.11.008
# read allele frequencies freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() # define the gamma model for peak heights model <- gamma_model(mixture_proportions = 1, mu = 1000., cv = 0.1, model_settings = gf$gamma_settings_no_stutter) # sample a single source profile (1-person 'mixture') u1 <- sample_contributor_genotypes("U1", freqs, loci = gf$autosomal_markers) sample <- sample_mixture_from_genotypes(u1, model) # peaks follow a gamma distribution with an expected height of # 1,000 for heterozygous alleles; 2,000 for homozygotes hist(sample$Height) # the gamma distribution is more obvious if many samples are taken many_samples <- replicate(n = 1e2, sample_mixture_from_genotypes(u1, model), simplify = FALSE) hist(sapply(many_samples, function(x) x$Height))
# read allele frequencies freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() # define the gamma model for peak heights model <- gamma_model(mixture_proportions = 1, mu = 1000., cv = 0.1, model_settings = gf$gamma_settings_no_stutter) # sample a single source profile (1-person 'mixture') u1 <- sample_contributor_genotypes("U1", freqs, loci = gf$autosomal_markers) sample <- sample_mixture_from_genotypes(u1, model) # peaks follow a gamma distribution with an expected height of # 1,000 for heterozygous alleles; 2,000 for homozygotes hist(sample$Height) # the gamma distribution is more obvious if many samples are taken many_samples <- replicate(n = 1e2, sample_mixture_from_genotypes(u1, model), simplify = FALSE) hist(sapply(many_samples, function(x) x$Height))
A dataset containing default parameters and settings for a GlobalFiler 3500 kit.
gf_configuration()
gf_configuration()
A list of:
Names of autosomal markers in the GlobalFiler kit
Named numeric with STR repeat length by locus name
List of 4 stutter types, to be used with allele_specific_stutter_model
For convenience, a pre-defined allele_specific_stutter_model
Settings corresponding to a log normal model with all stutter types
Settings corresponding to a log normal model with backward and forward stutter only
Settings corresponding to a gamma model with all stutter types
Settings for a gamma model without stutter
Global stutter model where the expected stutter rate is constant across alleles and loci
global_stutter_model( back_stutter_rate, forward_stutter_rate, size_regression, sex_locus_name = "AMEL" )
global_stutter_model( back_stutter_rate, forward_stutter_rate, size_regression, sex_locus_name = "AMEL" )
back_stutter_rate |
Numeric. (Optional) |
forward_stutter_rate |
Numeric. (Optional) |
size_regression |
Function, see read_size_regression. |
sex_locus_name |
Character vector, defaults to "AMEL". |
When a pg_model is constructed (see gamma_model), a stutter model can optionally be applied. In the global stutter model, the expected stutter rate is constant across all loci and for all parent alleles.
Object of class stutter_model
to be used by e.g. gamma_model.
allele_specific_stutter_model for a stutter model where the expected stutter rate depends on the allele and locus.
# the stutter model needs a size regression to determine fragment length # of stutter products size_regression <- read_size_regression(system.file("extdata", "GlobalFiler_SizeRegression.csv",package = "simDNAmixtures")) # define a stutter model with an expected back stutter rate of 10% stutter_model <- global_stutter_model(back_stutter_rate = 0.1, size_regression = size_regression) stutter_model
# the stutter model needs a size regression to determine fragment length # of stutter products size_regression <- read_size_regression(system.file("extdata", "GlobalFiler_SizeRegression.csv",package = "simDNAmixtures")) # define a stutter model with an expected back stutter rate of 10% stutter_model <- global_stutter_model(back_stutter_rate = 0.1, size_regression = size_regression) stutter_model
A dataset containing the properties of forensic DNA kits.
kits
kits
A list of data frames containing variables such as:
Kit name
Fragment length
Dye colour
https://github.com/oyvble/euroformix/blob/master/inst/extdata/kit.txt
Defines a log normal model for peak height variability
log_normal_model( template, degradation = rep(0, length(template)), LSAE = stats::setNames(rep(1, length(model_settings$locus_names)), model_settings$locus_names), c2, k2, model_settings )
log_normal_model( template, degradation = rep(0, length(template)), LSAE = stats::setNames(rep(1, length(model_settings$locus_names)), model_settings$locus_names), c2, k2, model_settings )
template |
Numeric vector |
degradation |
Numeric vector of same length as template. Degradation parameters for each contributor. |
LSAE |
Numeric vector (named) with Locus Specific Amplification Efficiencies. See sample_LSAE. Defaults to 1 for each locus. |
c2 |
Numeric. Allele variance parameter. |
k2 |
Optionally a numeric vector with stutter variance parameters. See sample_log_normal_stutter_variance. |
model_settings |
List. Possible parameters:
|
Define a log normal model for peak height variability with the parametrisation as described by Bright et al. The model may then be used to sample DNA profiles using the sample_mixture_from_genotypes function. Alternatively, to sample many models and profiles in one go with parameters according to a specified distribution, the sample_mixtures function can be used.
Object of class pg_model
.
Bright, J.A. et al. (2016). Developmental validation of STRmix™, expert software for the interpretation of forensic DNA profiles. Forensic Science International: Genetics, 23, 226-239. doi:10.1016/j.fsigen.2016.05.007
gf <- gf_configuration() freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) k2 <- sample_log_normal_stutter_variance(gf$log_normal_settings$stutter_variability) model <- log_normal_model(template = 1e3, c2 = 15, k2 = k2, model_settings = gf$log_normal_settings) model
gf <- gf_configuration() freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) k2 <- sample_log_normal_stutter_variance(gf$log_normal_settings$stutter_variability) model <- log_normal_model(template = 1e3, c2 = 15, k2 = k2, model_settings = gf$log_normal_settings) model
Read allele frequencies in FSIgen format (.csv)
read_allele_freqs(filename, remove_zeroes = TRUE, normalise = TRUE)
read_allele_freqs(filename, remove_zeroes = TRUE, normalise = TRUE)
filename |
Path to csv file. |
remove_zeroes |
Logical. Should frequencies of 0 be removed from the return value? Default is TRUE. |
normalise |
Logical. Should frequencies be normalised to sum to 1? Default is TRUE. |
Reads allele frequencies from a .csv file. The file should be in FSIgen format, i.e. comma separated with the first column specifying the allele labels and one column per locus. The last row should be the number of observations. No error checking is done since the file format is only loosely defined, e.g. we do not restrict the first column name or the last row name.
Named list with frequencies by locus. The frequencies at a locus are returned as a named numeric vector with names corresponding to alleles.
# below we read an allele freqs file that comes with the package filename <- system.file("extdata","FBI_extended_Cauc_022024.csv",package = "simDNAmixtures") freqs <- read_allele_freqs(filename) freqs # the output is a list with an attribute named \code{N} giving the sample size.
# below we read an allele freqs file that comes with the package filename <- system.file("extdata","FBI_extended_Cauc_022024.csv",package = "simDNAmixtures") freqs <- read_allele_freqs(filename) freqs # the output is a list with an attribute named \code{N} giving the sample size.
Reads a size regression file
read_size_regression(filename, exceptions, repeat_length_by_marker)
read_size_regression(filename, exceptions, repeat_length_by_marker)
filename |
Path to file (character). |
exceptions |
Optionally a list providing sizes for alleles not covered by the regression. See examples for how this can be used to assign sizes to X and Y at the Amelogenin locus. |
repeat_length_by_marker |
Optionally a named integer vector with repeat lengths by marker. If not provided, then a .3 allele will not convert to e.g. .75 for a tetranucleotide. |
Read a regression file from disk and returns a function that provides the fragment length (bp) for a given locus and allele.
DNA profiles consist of the observed peaks (alleles or stutter products) at several loci as well as the peak heights and sizes. The size refers to the fragment length (bp). A linear relationship exists between the size of a peak and the size. When peaks are sampled in the sample_mixture_from_genotypes function, a size is assigned using a size regression. The read_size_regression
function reads such a regression from disk.
A function that takes a locus name and allele as arguments and returns the size.
filename <- system.file("extdata", "GlobalFiler_SizeRegression.csv", package = "simDNAmixtures") regression <- read_size_regression(filename) # obtain size for the 12 allele at the vWA locus regression("vWA", 12) # now add AMEL sizes regression_with_AMEL <- read_size_regression(filename, exceptions = list( AMEL = setNames(c(98.5, 104.5), nm = c("X", "Y")))) # check that we can obtain size for X at AMEL stopifnot(regression_with_AMEL("AMEL", "X") == 98.5) # pass in repeat_length_by_marker for more precise estimates gf <- gf_configuration() regression_with_repeat_length <- read_size_regression(filename, repeat_length_by_marker = gf$repeat_length_by_marker) # obtain size for the 15.3 allele at the D1S1656 locus stopifnot(regression_with_repeat_length("D1S1656", 15.3) == 121.628203912362 + 15.75 * 4.2170043570021)
filename <- system.file("extdata", "GlobalFiler_SizeRegression.csv", package = "simDNAmixtures") regression <- read_size_regression(filename) # obtain size for the 12 allele at the vWA locus regression("vWA", 12) # now add AMEL sizes regression_with_AMEL <- read_size_regression(filename, exceptions = list( AMEL = setNames(c(98.5, 104.5), nm = c("X", "Y")))) # check that we can obtain size for X at AMEL stopifnot(regression_with_AMEL("AMEL", "X") == 98.5) # pass in repeat_length_by_marker for more precise estimates gf <- gf_configuration() regression_with_repeat_length <- read_size_regression(filename, repeat_length_by_marker = gf$repeat_length_by_marker) # obtain size for the 15.3 allele at the D1S1656 locus stopifnot(regression_with_repeat_length("D1S1656", 15.3) == 121.628203912362 + 15.75 * 4.2170043570021)
Read STRmix kit settings from an XML file and associated stutter directory.
read_STRmix_kit_settings(filename, stutters_dir, include_y_loci = FALSE)
read_STRmix_kit_settings(filename, stutters_dir, include_y_loci = FALSE)
filename |
Character vector specifying the path to the XML file containing the profiling kit settings. |
stutters_dir |
Character vector specifying the directory path where stutter settings files are located. |
include_y_loci |
Should Y loci be included in the list of locus names? Default is |
A list containing the following components:
locus_names. Character vector.
degradation_parameter_cap. Numeric.
c2_prior. Numeric of length two with shape and scale.
LSAE_variance_prior. Numeric of length one.
detection_threshold. Numeric vector (named) with Detection Thresholds. Defaults to 50 for each locus.
size_regression. Function, see read_size_regression.
stutter_model. A list representing the stutter model.
stutter_variability. A list representing the stutter variability.
Reads a stutter exceptions file with overrides for expected stutter ratios
read_stutter_exceptions(filename)
read_stutter_exceptions(filename)
filename |
Character. Path to file. |
Reads the file from disk and returns a named numeric vector with stutter ratio exceptions for a given locus and allele.
A named list with the stutter exceptions by locus. For each loucs, the exceptions are given as a named numeric with the names corresponding to the parent alleles and the expected stutter rates given as the values.
filename <- system.file("extdata","GlobalFiler_Stutter_Exceptions_3500.csv", package = "simDNAmixtures") exceptions <- read_stutter_exceptions(filename) exceptions$TH01["9.3"]
filename <- system.file("extdata","GlobalFiler_Stutter_Exceptions_3500.csv", package = "simDNAmixtures") exceptions <- read_stutter_exceptions(filename) exceptions$TH01["9.3"]
Reads a stutter regression file
read_stutter_regression(filename, min_stutter_ratio = 0.001)
read_stutter_regression(filename, min_stutter_ratio = 0.001)
filename |
Character. Path to file. |
min_stutter_ratio |
Numeric. |
Reads the file from disk and returns a function that provides the expected stutter ratio for a given locus and allele.
A function that takes a locus name and allele as arguments and returns the expected stutter ratio.
filename <- system.file("extdata","GlobalFiler_Stutter_3500.txt", package = "simDNAmixtures") regression <- read_stutter_regression(filename) # obtain the expected stutter ratio for a 12 parent allele at the vWA locus regression("vWA", 12)
filename <- system.file("extdata","GlobalFiler_Stutter_3500.txt", package = "simDNAmixtures") regression <- read_stutter_regression(filename) # obtain the expected stutter ratio for a 12 parent allele at the vWA locus regression("vWA", 12)
Read wide table (.txt) with Allele1, Allele2, ... columns as is
read_wide_table(filename)
read_wide_table(filename)
filename |
Path to txt file. |
Dataframe
Sample genotypes for mixture contributors according to allele frequencies
sample_contributor_genotypes( contributors, freqs, linkage_map, pedigree, loci = names(freqs), return_non_contributors = FALSE, sex_locus_name = "AMEL" )
sample_contributor_genotypes( contributors, freqs, linkage_map, pedigree, loci = names(freqs), return_non_contributors = FALSE, sex_locus_name = "AMEL" )
contributors |
Character vector with unique names of contributors. Valid names are "U1", "U2", ... for unrelated contributors or the names of pedigree members for related contributors. |
freqs |
Allele frequencies (see read_allele_freqs) |
linkage_map |
(optional) A linkage map specifying the recombination fractions between loci. If missing, loci are assumed to be independent. See also sample_many_pedigree_genotypes. |
pedigree |
(optional) ped object |
loci |
Character vector of locus names (defaults to |
return_non_contributors |
Logical. Should genotypes of non-contributing pedigree members also be returned? |
sex_locus_name |
Character vector, defaults to "AMEL" |
For each founder or unrelated person, a genotype is sampled randomly by drawing two alleles from allele frequencies. The non-founders get genotypes by allele dropping, see sample_pedigree_genotypes for details.
List of DataFrames with genotypes for each pedigree member. See sample_genotype for the DataFrame format.
# read allele frequencies freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) # define a pedigree of siblings S1 and S2 (and their parents) ped_sibs <- pedtools::nuclearPed(children = c("S1", "S2")) # sample genotypes for a mixture of S1 + U1 + S2 # where U1 is an unrelated person sample_contributor_genotypes(contributors = c("S1","U1","S2"), freqs, pedigree = ped_sibs, loci = gf_configuration()$autosomal_markers) # now also include AMEL sample_contributor_genotypes(contributors = c("S1","S2", "U1"), freqs, pedigree = ped_sibs, loci = c(gf_configuration()$autosomal_markers, "AMEL"))
# read allele frequencies freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) # define a pedigree of siblings S1 and S2 (and their parents) ped_sibs <- pedtools::nuclearPed(children = c("S1", "S2")) # sample genotypes for a mixture of S1 + U1 + S2 # where U1 is an unrelated person sample_contributor_genotypes(contributors = c("S1","U1","S2"), freqs, pedigree = ped_sibs, loci = gf_configuration()$autosomal_markers) # now also include AMEL sample_contributor_genotypes(contributors = c("S1","S2", "U1"), freqs, pedigree = ped_sibs, loci = c(gf_configuration()$autosomal_markers, "AMEL"))
Sample drop model(s) with parameters according to priors
sample_drop_model( number_of_contributors, sampling_parameters, drop_in_rate = 0, model_settings )
sample_drop_model( number_of_contributors, sampling_parameters, drop_in_rate = 0, model_settings )
number_of_contributors |
Integer |
sampling_parameters |
List. Needs to contain:
|
drop_in_rate |
Numeric vector of length one. Expected number of drop-ins per locus. Default is 0. |
model_settings |
List. See drop_model. |
In simulation studies involving many mixed DNA profiles, one often needs to generate various samples with different model parameters. This function samples a drop model with parameters according to prior distributions. The dropout probability for each contributor is sampled uniformly between min_dropout_probability.
and max_dropout_probability
.
When length(number_of_contributors)==1
, a single drop_model of class pg_model
. Otherwise, a list of these.
sample_mixtures_fixed_parameters to directly supply parameters of choice for more control
gf <- gf_configuration() sampling_parameters <- list(min_dropout_probability. = 0., max_dropout_probability. = 0.5) model <- sample_drop_model(number_of_contributors = 1, sampling_parameters = sampling_parameters, model_settings = list(locus_names = gf$autosomal_markers, size_regression = gf$size_regression))
gf <- gf_configuration() sampling_parameters <- list(min_dropout_probability. = 0., max_dropout_probability. = 0.5) model <- sample_drop_model(number_of_contributors = 1, sampling_parameters = sampling_parameters, model_settings = list(locus_names = gf$autosomal_markers, size_regression = gf$size_regression))
Sample gamma model(s) with parameters according to priors
sample_gamma_model(number_of_contributors, sampling_parameters, model_settings)
sample_gamma_model(number_of_contributors, sampling_parameters, model_settings)
number_of_contributors |
Integer |
sampling_parameters |
List. Needs to contain:
|
model_settings |
List. See gamma_model. |
In simulation studies involving many mixed DNA profiles, one often needs to generate various samples with different model parameters. This function samples a gamma model with parameters according to prior distributions. The mean peak height parameter mu
is sampled uniformly between min_mu
and max_mu
. Likewise, the variability parameter cv
is sampled uniformly between min_cv
and max_cv
. The degradation slope parameter beta
is sampled according to a Beta distribution with parameters degradation_shape1
and degradation_shape2
.
When length(number_of_contributors)==1
, a single gamma_model of class pg_model
. Otherwise, a list of these.
gf <- gf_configuration() sampling_parameters <- list(min_mu = 50., max_mu = 5e3, min_cv = 0.05, max_cv = 0.35, degradation_shape1 = 10, degradation_shape2 = 1) model_no_stutter <- sample_gamma_model(number_of_contributors = 2, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter) model_no_stutter$parameters
gf <- gf_configuration() sampling_parameters <- list(min_mu = 50., max_mu = 5e3, min_cv = 0.05, max_cv = 0.35, degradation_shape1 = 10, degradation_shape2 = 1) model_no_stutter <- sample_gamma_model(number_of_contributors = 2, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter) model_no_stutter$parameters
Sample a genotype according to allele frequencies
sample_genotype(freqs, loci = names(freqs), label = "U")
sample_genotype(freqs, loci = names(freqs), label = "U")
freqs |
Allele frequencies (see read_allele_freqs) |
loci |
Character vector of locus names (defaults to |
label |
Sample name |
A genotype is sampled randomly by drawing two alleles from allele frequencies for each locus.
DataFrame with columns Sample Name
, Locus
, Allele1
and Allele2
.
# below we read an allele freqs and sample a genotype filename <- system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures") freqs <- read_allele_freqs(filename) sample_genotype(freqs, loci = c("D3S1358", "vWA"))
# below we read an allele freqs and sample a genotype filename <- system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures") freqs <- read_allele_freqs(filename) sample_genotype(freqs, loci = c("D3S1358", "vWA"))
Sample log normal model(s) with parameters according to priors
sample_log_normal_model( number_of_contributors, sampling_parameters, model_settings )
sample_log_normal_model( number_of_contributors, sampling_parameters, model_settings )
number_of_contributors |
Integer |
sampling_parameters |
List. Needs to contain:
|
model_settings |
List. See log_normal_model. |
In simulation studies involving many mixed DNA profiles, one often needs to generate various samples with different model parameters. This function samples a log normal model with parameters according to prior distributions. The template parameter for each contributor is sampled uniformly between min_template
and max_template
. The degradation parameter for each contributor is sampled from a gamma distribution with parameters degradation_shape
and degradation_scale
.
When length(number_of_contributors)==1
, a single log_normal_model of class pg_model
. Otherwise, a list of these.
gf <- gf_configuration() sampling_parameters <- list(min_template = 50., max_template = 1000., degradation_shape = 2.5, degradation_scale = 1e-3) model_no_stutter <- sample_log_normal_model(number_of_contributors = 1, sampling_parameters = sampling_parameters, model_settings = gf$log_normal_settings)
gf <- gf_configuration() sampling_parameters <- list(min_template = 50., max_template = 1000., degradation_shape = 2.5, degradation_scale = 1e-3) model_no_stutter <- sample_log_normal_model(number_of_contributors = 1, sampling_parameters = sampling_parameters, model_settings = gf$log_normal_settings)
Sample log normal model parameters according to priors
sample_log_normal_parameters( number_of_contributors, sampling_parameters, model_settings )
sample_log_normal_parameters( number_of_contributors, sampling_parameters, model_settings )
number_of_contributors |
Integer |
sampling_parameters |
List. Needs to contain:
|
model_settings |
List. See log_normal_model. |
In simulation studies involving many mixed DNA profiles, one often needs to generate various samples with different model parameters. This function samples a log normal model with parameters according to prior distributions. The template parameter for each contributor is sampled uniformly between min_template
and max_template
. The degradation parameter for each contributor is sampled from a gamma distribution with parameters degradation_shape
and degradation_scale
.
When length(number_of_contributors)==1
, a single log_normal_model of class pg_model
. Otherwise, a list of these.
gf <- gf_configuration() sampling_parameters <- list(min_template = 50., max_template = 1000., degradation_shape = 2.5, degradation_scale = 1e-3) parameters <- sample_log_normal_parameters(number_of_contributors = 1, sampling_parameters = sampling_parameters, model_settings = gf$log_normal_settings)
gf <- gf_configuration() sampling_parameters <- list(min_template = 50., max_template = 1000., degradation_shape = 2.5, degradation_scale = 1e-3) parameters <- sample_log_normal_parameters(number_of_contributors = 1, sampling_parameters = sampling_parameters, model_settings = gf$log_normal_settings)
Sample log normal stutter variance parameters according to priors
sample_log_normal_stutter_variance(log_normal_stutter_variability)
sample_log_normal_stutter_variance(log_normal_stutter_variability)
log_normal_stutter_variability |
List of variability parameters. See gf_configuration for an example. |
Named numeric with stutter variance parameter for all stutter types. Names are k2
concatenated with the name of the stuter type. See example.
gf <- gf_configuration() log_normal_stutter_variability <- gf$log_normal_settings$stutter_variability k2 <- sample_log_normal_stutter_variance(log_normal_stutter_variability)
gf <- gf_configuration() log_normal_stutter_variability <- gf$log_normal_settings$stutter_variability k2 <- sample_log_normal_stutter_variance(log_normal_stutter_variability)
Sample Locus Specific Amplification Efficiency (LSAE) according to prior
sample_LSAE(LSAE_variance, locus_names)
sample_LSAE(LSAE_variance, locus_names)
LSAE_variance |
Numeric. See gf_configuration for an example. |
locus_names |
Character vector. |
In the Bright et al. log normal model, the expected peak height includes a multiplicative factor for the locus (marker). These factors are called the LSAEs (Locus Specific Amplification Efficiencies). In the model, the prior for the log10 of LSAE is normal with mean 0. The variance can be specified.
Named numeric with LSAEs for each locus (names).
gf <- gf_configuration() lsae <- sample_LSAE(LSAE_variance = gf$log_normal_settings$LSAE_variance_prior, locus_names = gf$autosomal_markers) # the barplot shows that some loci amplify better than others barplot(lsae, las=2)
gf <- gf_configuration() lsae <- sample_LSAE(LSAE_variance = gf$log_normal_settings$LSAE_variance_prior, locus_names = gf$autosomal_markers) # the barplot shows that some loci amplify better than others barplot(lsae, las=2)
Sample (many) genotypes for pedigree members according to allele frequencies by allele dropping and possibly taking linkage into account by simulating recombination.
sample_many_pedigree_genotypes( pedigree, freqs, loci = names(freqs), unrelated_names = character(), linkage_map, number_of_replicates = 1L, sex_locus_name = "AMEL" )
sample_many_pedigree_genotypes( pedigree, freqs, loci = names(freqs), unrelated_names = character(), linkage_map, number_of_replicates = 1L, sex_locus_name = "AMEL" )
pedigree |
ped object |
freqs |
Allele frequencies (see read_allele_freqs) |
loci |
Character vector of locus names (defaults to |
unrelated_names |
Character vector with names of any additional unrelated persons. Defaults to length zero. |
linkage_map |
A linkage map specifying the recombination fractions between loci. If NULL, loci are assumed to be independent. |
number_of_replicates |
An integer specifying the number of replicate genotype samples to generate. Defaults to 1. |
sex_locus_name |
Character vector, defaults to "AMEL" |
Sample mixture profile with provided genotypes
sample_mixture_from_genotypes(genotypes, model, sample_name = "mixture")
sample_mixture_from_genotypes(genotypes, model, sample_name = "mixture")
genotypes |
List of contributor genotypes. See sample_contributor_genotypes. |
model |
pg_model object. |
sample_name |
Character. Defaults to "mixture". |
A mixture profile is sampled according to the provided pg_model
(see gamma_model, log_normal_model and genotypes (see sample_contributor_genotypes).
DataFrame with at least SMASH columns (see SMASH_to_wide_table). Depending on the chosen pg_model
(e.g. gamma_model or log_normal_model), other columns with further details about the simulation are returned as well.
sample_mixtures for a function that samples many mixtures in one go.
# read allele frequencies and kit data freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() # define a pedigree of siblings S1 and S2 (and their parents) ped_sibs <- pedtools::nuclearPed(children = c("S1", "S2")) # sample genotypes for a mixture of S1 + U1 + S2 # where U1 is an unrelated person genotypes <- sample_contributor_genotypes(contributors = c("S1","U1","S2"), freqs, pedigree = ped_sibs, loci = gf$autosomal_markers) # define a gamma model for peak heights gamma_model <- gamma_model(mixture_proportions = c(0.5, 0.3, 0.2), mu = 1000., cv = 0.1, model_settings = gf$gamma_settings_no_stutter) # sample mixture from genotypes mix <- sample_mixture_from_genotypes(genotypes, gamma_model)
# read allele frequencies and kit data freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() # define a pedigree of siblings S1 and S2 (and their parents) ped_sibs <- pedtools::nuclearPed(children = c("S1", "S2")) # sample genotypes for a mixture of S1 + U1 + S2 # where U1 is an unrelated person genotypes <- sample_contributor_genotypes(contributors = c("S1","U1","S2"), freqs, pedigree = ped_sibs, loci = gf$autosomal_markers) # define a gamma model for peak heights gamma_model <- gamma_model(mixture_proportions = c(0.5, 0.3, 0.2), mu = 1000., cv = 0.1, model_settings = gf$gamma_settings_no_stutter) # sample mixture from genotypes mix <- sample_mixture_from_genotypes(genotypes, gamma_model)
Sample mixtures with random genotypes and random parameters according to priors
sample_mixtures( n, contributors, freqs, linkage_map, sampling_parameters, model_settings, sample_model, pedigree, results_directory, seed, number_of_replicates = 1L, write_non_contributors = FALSE, tag = "simulation", silent = FALSE )
sample_mixtures( n, contributors, freqs, linkage_map, sampling_parameters, model_settings, sample_model, pedigree, results_directory, seed, number_of_replicates = 1L, write_non_contributors = FALSE, tag = "simulation", silent = FALSE )
n |
Integer. Number of samples. |
contributors |
Character vector with unique names of contributors. Valid names are "U1", "U2", ... for unrelated contributors or the names of pedigree members for related contributors. |
freqs |
Allele frequencies (see read_allele_freqs) |
linkage_map |
(optional) A linkage map specifying the recombination fractions between loci. If missing, loci are assumed to be independent. See also sample_many_pedigree_genotypes. |
sampling_parameters |
List. Passed to the sample_model function. |
model_settings |
List. Passed to the sample_model function. |
sample_model |
Function such as sample_log_normal_model. |
pedigree |
(optionally) ped object. Contributors can be named pedigree members. |
results_directory |
(optionally) Character with path to directory where results are written to disk. |
seed |
(optionally) Integer seed value that can be used to get reproducible runs. If results are written to disk, the 'Run details.txt' file will contain a seed that can be used for reproducing the result. |
number_of_replicates |
Integer. Number of replicate simulations for each sample. |
write_non_contributors |
Logical. If TRUE, sampled genotypes for non-contributing pedigree members will also be written to disk. Defaults to FALSE. |
tag |
Character. Used for sub directory name when results_directory is provided. |
silent |
Logical. If TRUE, then no message will be printed about where the output (if any) was written to disk. |
If results_directory
is provided, this function has the side effect of writing results to disk.
Return value is a list with simulation results:
call
matched call
smash
DataFrame with all samples in SMASH format (see SMASH_to_wide_table)
samples
Detailed results for each sample
parameter_summary
DataFrame with parameters for each sample
freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() sampling_parameters <- list(min_mu = 50., max_mu = 5e3, min_cv = 0.05, max_cv = 0.35, degradation_shape1 = 10, degradation_shape2 = 1) mixtures <- sample_mixtures(n = 2, contributors = c("U1", "U2"), freqs = freqs, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter, sample_model = sample_gamma_model) # sample a mixture of two siblings taking into account linkage_map <- data.frame(chromosome = c("12","12"), locus = c("vWA", "D12391"), position = c(16.56662766, 29.48590551)) ped_sibs <- pedtools::nuclearPed(children = c("Sib1", "Sib2")) sibs_mix <- sample_mixtures(n = 1, contributors = c("Sib1", "Sib2"), freqs = freqs, linkage_map = linkage_map, pedigree = ped_sibs, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter, sample_model = sample_gamma_model) # an example using the semi-continuous drop model drop_model_sampling_parameters <- list(min_dropout_probability. = 0., max_dropout_probability. = 0.5) drop_model_settings <- list(locus_names = gf$autosomal_markers, size_regression = gf$size_regression) mixtures <- sample_mixtures(n = 2, contributors = c("U1", "U2"), freqs = freqs, sampling_parameters = drop_model_sampling_parameters, model_settings = drop_model_settings, sample_model = sample_drop_model)
freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() sampling_parameters <- list(min_mu = 50., max_mu = 5e3, min_cv = 0.05, max_cv = 0.35, degradation_shape1 = 10, degradation_shape2 = 1) mixtures <- sample_mixtures(n = 2, contributors = c("U1", "U2"), freqs = freqs, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter, sample_model = sample_gamma_model) # sample a mixture of two siblings taking into account linkage_map <- data.frame(chromosome = c("12","12"), locus = c("vWA", "D12391"), position = c(16.56662766, 29.48590551)) ped_sibs <- pedtools::nuclearPed(children = c("Sib1", "Sib2")) sibs_mix <- sample_mixtures(n = 1, contributors = c("Sib1", "Sib2"), freqs = freqs, linkage_map = linkage_map, pedigree = ped_sibs, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter, sample_model = sample_gamma_model) # an example using the semi-continuous drop model drop_model_sampling_parameters <- list(min_dropout_probability. = 0., max_dropout_probability. = 0.5) drop_model_settings <- list(locus_names = gf$autosomal_markers, size_regression = gf$size_regression) mixtures <- sample_mixtures(n = 2, contributors = c("U1", "U2"), freqs = freqs, sampling_parameters = drop_model_sampling_parameters, model_settings = drop_model_settings, sample_model = sample_drop_model)
Sample mixtures with known genotypes and fixed parameters
sample_mixtures_fixed_parameters( genotypes, parameter_summary, model_settings, results_directory, seed, tag = "simulation", silent = FALSE )
sample_mixtures_fixed_parameters( genotypes, parameter_summary, model_settings, results_directory, seed, tag = "simulation", silent = FALSE )
genotypes |
List of contributor genotypes. See sample_contributor_genotypes. |
parameter_summary |
DataFrame with parameters for each sample. |
model_settings |
List. Passed to the sample_model function. |
results_directory |
(optionally) Character with path to directory where results are written to disk. |
seed |
(optionally) Integer seed value that can be used to get reproducible runs. If results are written to disk, the 'Run details.txt' file will contain a seed that can be used for reproducing the result. |
tag |
Character. Used for sub directory name when results_directory is provided. |
silent |
Logical. If TRUE, then no message will be printed about where the output (if any) was written to disk. |
If results_directory
is provided, this function has the side effect of writing results to disk.
Return value is a list with simulation results:
call
matched call
smash
DataFrame with all samples in SMASH format (see SMASH_to_wide_table)
samples
Detailed results for each sample
parameter_summary
DataFrame with parameters for each sample
# simulate autosomal samples with fixed parameters (from csv) and refs (from csv) parameter_summary <- utils::read.csv(system.file( "extdata","Example_2p_Parameter_Summary.csv", package = "simDNAmixtures")) gf <- gf_configuration() filename_refs <- system.file("extdata","Example_References_DB.csv", package = "simDNAmixtures") db_refs <- utils::read.csv(filename_refs, check.names = FALSE) genotypes <- simDNAmixtures:::.wide_references_to_allele_tables(db_refs) samples <- sample_mixtures_fixed_parameters(genotypes = genotypes, parameter_summary = parameter_summary, model_settings = gf$log_normal_bwfw_settings, seed = 1)
# simulate autosomal samples with fixed parameters (from csv) and refs (from csv) parameter_summary <- utils::read.csv(system.file( "extdata","Example_2p_Parameter_Summary.csv", package = "simDNAmixtures")) gf <- gf_configuration() filename_refs <- system.file("extdata","Example_References_DB.csv", package = "simDNAmixtures") db_refs <- utils::read.csv(filename_refs, check.names = FALSE) genotypes <- simDNAmixtures:::.wide_references_to_allele_tables(db_refs) samples <- sample_mixtures_fixed_parameters(genotypes = genotypes, parameter_summary = parameter_summary, model_settings = gf$log_normal_bwfw_settings, seed = 1)
Sample mixtures with known genotypes and random parameters according to priors
sample_mixtures_from_genotypes( n, genotypes, sampling_parameters, model_settings, sample_model, results_directory, seed, number_of_replicates = 1L, tag = "simulation", silent = FALSE )
sample_mixtures_from_genotypes( n, genotypes, sampling_parameters, model_settings, sample_model, results_directory, seed, number_of_replicates = 1L, tag = "simulation", silent = FALSE )
n |
Integer. Number of samples. |
genotypes |
List of contributor genotypes. See sample_contributor_genotypes. |
sampling_parameters |
List. Passed to the sample_model function. |
model_settings |
List. Passed to the sample_model function. |
sample_model |
Function such as sample_log_normal_model. |
results_directory |
(optionally) Character with path to directory where results are written to disk. |
seed |
(optionally) Integer seed value that can be used to get reproducible runs. If results are written to disk, the 'Run details.txt' file will contain a seed that can be used for reproducing the result. |
number_of_replicates |
Integer. Number of replicate simulations for each sample. |
tag |
Character. Used for sub directory name when results_directory is provided. |
silent |
Logical. If TRUE, then no message will be printed about where the output (if any) was written to disk. |
If results_directory
is provided, this function has the side effect of writing results to disk.
Return value is a list with simulation results:
call
matched call
smash
DataFrame with all samples in SMASH format (see SMASH_to_wide_table)
samples
Detailed results for each sample
parameter_summary
DataFrame with parameters for each sample
# define two known GlobalFiler genotypes K1 <- data.frame( `Sample Name` = "K1", Locus = c("D3S1358", "vWA", "D16S539", "CSF1PO", "TPOX", "D8S1179", "D21S11", "D18S51", "D2S441", "D19S433", "TH01", "FGA", "D22S1045", "D5S818", "D13S317", "D7S820", "SE33", "D10S1248", "D1S1656", "D12S391", "D2S1338"), Allele1 = c("14", "15", "11", "11", "8", "11", "28", "14", "11", "14", "9.3", "21", "15", "9", "9", "10", "14", "14", "12", "21", "20"), Allele2 = c("16", "18", "13", "11", "11", "12", "30", "16", "12", "14", "9.3", "24", "16", "11", "12", "10", "18", "14", "15", "22", "24"), check.names = FALSE ) K2 <- data.frame( `Sample Name` = "K2", Locus = c("D3S1358", "vWA", "D16S539", "CSF1PO", "TPOX", "D8S1179", "D21S11", "D18S51", "D2S441", "D19S433", "TH01", "FGA", "D22S1045", "D5S818", "D13S317", "D7S820", "SE33", "D10S1248", "D1S1656", "D12S391", "D2S1338"), Allele1 = c("16", "16", "10", "13", "8", "11", "27", "17", "10", "13", "6", "23", "16", "11", "11", "9", "22.2", "14", "12", "22", "21"), Allele2 = c("16", "17", "12", "14", "8", "13", "30", "18", "11", "14", "6", "25", "17", "11", "11", "12", "28.2", "15", "14", "22", "23"), check.names = FALSE ) # first sample three replicates of a low-level profile of K1 only gf <- gf_configuration() sampling_parameters <- list(min_template = 75., max_template = 75, degradation_shape = 2.5, degradation_scale = 1e-3) single_source_results <- sample_mixtures_from_genotypes(n = 1, genotypes = list(K1), sampling_parameters = sampling_parameters, number_of_replicates = 3, sample_model = sample_log_normal_model, model_settings = gf$log_normal_bwfw_settings) # now sample two mixtures of K1 and K2 with two replicates each mix_results <- sample_mixtures_from_genotypes(n = 2, genotypes = list(K1, K2), sampling_parameters = sampling_parameters, number_of_replicates = 3, sample_model = sample_log_normal_model, model_settings = gf$log_normal_bwfw_settings)
# define two known GlobalFiler genotypes K1 <- data.frame( `Sample Name` = "K1", Locus = c("D3S1358", "vWA", "D16S539", "CSF1PO", "TPOX", "D8S1179", "D21S11", "D18S51", "D2S441", "D19S433", "TH01", "FGA", "D22S1045", "D5S818", "D13S317", "D7S820", "SE33", "D10S1248", "D1S1656", "D12S391", "D2S1338"), Allele1 = c("14", "15", "11", "11", "8", "11", "28", "14", "11", "14", "9.3", "21", "15", "9", "9", "10", "14", "14", "12", "21", "20"), Allele2 = c("16", "18", "13", "11", "11", "12", "30", "16", "12", "14", "9.3", "24", "16", "11", "12", "10", "18", "14", "15", "22", "24"), check.names = FALSE ) K2 <- data.frame( `Sample Name` = "K2", Locus = c("D3S1358", "vWA", "D16S539", "CSF1PO", "TPOX", "D8S1179", "D21S11", "D18S51", "D2S441", "D19S433", "TH01", "FGA", "D22S1045", "D5S818", "D13S317", "D7S820", "SE33", "D10S1248", "D1S1656", "D12S391", "D2S1338"), Allele1 = c("16", "16", "10", "13", "8", "11", "27", "17", "10", "13", "6", "23", "16", "11", "11", "9", "22.2", "14", "12", "22", "21"), Allele2 = c("16", "17", "12", "14", "8", "13", "30", "18", "11", "14", "6", "25", "17", "11", "11", "12", "28.2", "15", "14", "22", "23"), check.names = FALSE ) # first sample three replicates of a low-level profile of K1 only gf <- gf_configuration() sampling_parameters <- list(min_template = 75., max_template = 75, degradation_shape = 2.5, degradation_scale = 1e-3) single_source_results <- sample_mixtures_from_genotypes(n = 1, genotypes = list(K1), sampling_parameters = sampling_parameters, number_of_replicates = 3, sample_model = sample_log_normal_model, model_settings = gf$log_normal_bwfw_settings) # now sample two mixtures of K1 and K2 with two replicates each mix_results <- sample_mixtures_from_genotypes(n = 2, genotypes = list(K1, K2), sampling_parameters = sampling_parameters, number_of_replicates = 3, sample_model = sample_log_normal_model, model_settings = gf$log_normal_bwfw_settings)
Sample offspring from two parental genotypes
sample_offspring(father, mother, label = "Child")
sample_offspring(father, mother, label = "Child")
father |
DataFrame (see sample_genotype) |
mother |
DataFrame (see sample_genotype) |
label |
SampleName of child (character) |
A genotype is sampled according to Mendelian inheritance. That is, one of two alleles of a parent is passed down to the offspring.
DataFrame (see sample_genotype)
# below we read an allele freqs and sample a genotype filename <- system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures") freqs <- read_allele_freqs(filename) # sample parents father <- sample_genotype(freqs, loci = c("D3S1358", "vWA")) mother <- sample_genotype(freqs, loci = c("D3S1358", "vWA")) # sample child child <- sample_offspring(father, mother)
# below we read an allele freqs and sample a genotype filename <- system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures") freqs <- read_allele_freqs(filename) # sample parents father <- sample_genotype(freqs, loci = c("D3S1358", "vWA")) mother <- sample_genotype(freqs, loci = c("D3S1358", "vWA")) # sample child child <- sample_offspring(father, mother)
Sample genotypes for pedigree according to allele frequencies by allele dropping.
sample_pedigree_genotypes(pedigree, freqs, loci = names(freqs))
sample_pedigree_genotypes(pedigree, freqs, loci = names(freqs))
pedigree |
ped object |
freqs |
Allele frequencies (see read_allele_freqs) |
loci |
Character vector of locus names (defaults to |
For each founder, a genotype is sampled randomly by drawing two alleles according to allele frequencies. Alleles for the rest of the pedigree are then obtained by allele dropping: sample_offspring is invoked for each non-founder.
List of DataFrames with genotypes for each pedigree member. See sample_genotype for the DataFrame format.
sample_many_pedigree_genotypes for a function that takes linkage into account.
freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() ped_sibs <- pedtools::nuclearPed(children = c("S1", "S2")) sibs_genotypes <- sample_pedigree_genotypes(ped = ped_sibs, freqs = freqs, loci = gf$autosomal_markers)
freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() ped_sibs <- pedtools::nuclearPed(children = c("S1", "S2")) sibs_genotypes <- sample_pedigree_genotypes(ped = ped_sibs, freqs = freqs, loci = gf$autosomal_markers)
Converts SMASH (SampleName, Marker, Allele, Size, Height) data to a wide table
SMASH_to_wide_table(x)
SMASH_to_wide_table(x)
x |
DataFrame with SampleName, Marker, Allele, Size, Height columns |
DataFrame with columns: Sample Name, Marker, Allele 1, Allele 2, ..., Size 1, Size 2, ..., Height 1, Height 2, ...
# generate example data in SMASH form freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() sampling_parameters <- list(min_mu = 50., max_mu = 5e3, min_cv = 0.05, max_cv = 0.35, degradation_shape1 = 10, degradation_shape2 = 1) mixtures <- sample_mixtures(n = 2, contributors = c("U1"), freqs = freqs, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter, sample_model = sample_gamma_model) # convert from SMASH to wide table wide_table <- SMASH_to_wide_table(mixtures$smash)
# generate example data in SMASH form freqs <- read_allele_freqs(system.file("extdata","FBI_extended_Cauc_022024.csv", package = "simDNAmixtures")) gf <- gf_configuration() sampling_parameters <- list(min_mu = 50., max_mu = 5e3, min_cv = 0.05, max_cv = 0.35, degradation_shape1 = 10, degradation_shape2 = 1) mixtures <- sample_mixtures(n = 2, contributors = c("U1"), freqs = freqs, sampling_parameters = sampling_parameters, model_settings = gf$gamma_settings_no_stutter, sample_model = sample_gamma_model) # convert from SMASH to wide table wide_table <- SMASH_to_wide_table(mixtures$smash)
Defines a stutter type to be used in the allele specific stutter model.
stutter_type( name, delta, applies_to_all_loci = TRUE, stutter_regression, stutter_exceptions, applies_to_loci, repeat_length_by_marker )
stutter_type( name, delta, applies_to_all_loci = TRUE, stutter_regression, stutter_exceptions, applies_to_loci, repeat_length_by_marker )
name |
Character. Name of the stutter, e.g. "BackStutter" |
delta |
Numeric. When length one, repeat units gained (lost when negative). When length two, the second element is the number of base pairs gained (lost). |
applies_to_all_loci |
Logical. Defaults to TRUE. |
stutter_regression |
Function. See read_stutter_regression. |
stutter_exceptions |
Optionally a list. See read_stutter_exceptions. |
applies_to_loci |
Optionally a character vector of locus names to which this stutter type applies. |
repeat_length_by_marker |
Optionally a named integer vector with repeat lengths by marker. Only needed when delta is of length two. |
When a pg_model is constructed (see log_normal_model), a stutter model can optionally be applied.
Object of class stutter_type
to be passed to allele_specific_stutter_model.
filename_bs_exceptions <- system.file("extdata", "GlobalFiler_Stutter_Exceptions_3500.csv",package = "simDNAmixtures") bs_exceptions <- read_stutter_exceptions(filename_bs_exceptions) filename_bs_regression <- system.file("extdata", "GlobalFiler_Stutter_3500.txt",package = "simDNAmixtures") bs_regression <- read_stutter_regression(filename_bs_regression) backstutter <- stutter_type(name = "BackStutter", delta = -1, stutter_regression = bs_regression, stutter_exceptions = bs_exceptions)
filename_bs_exceptions <- system.file("extdata", "GlobalFiler_Stutter_Exceptions_3500.csv",package = "simDNAmixtures") bs_exceptions <- read_stutter_exceptions(filename_bs_exceptions) filename_bs_regression <- system.file("extdata", "GlobalFiler_Stutter_3500.txt",package = "simDNAmixtures") bs_regression <- read_stutter_regression(filename_bs_regression) backstutter <- stutter_type(name = "BackStutter", delta = -1, stutter_regression = bs_regression, stutter_exceptions = bs_exceptions)