| Title: | Interpretation of Forensic DNA Mixtures |
|---|---|
| Description: | Statistical methods and simulation tools for the interpretation of forensic DNA mixtures. The methods implemented are described in Haned et al. (2011) <doi:10.1111/j.1556-4029.2010.01550.x>, Haned et al. (2012) <doi:10.1016/j.fsigen.2012.11.002> and Gill & Haned (2013) <doi:10.1016/j.fsigen.2012.08.008>. |
| Authors: | Hinda Haned [aut], Oyvind Bleka [ctb], Maarten Kruijver [cre] (ORCID: <https://orcid.org/0000-0002-6890-7632>) |
| Maintainer: | Maarten Kruijver <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 4.3.3 |
| Built: | 2026-06-04 09:04:08 UTC |
| Source: | https://github.com/mkruijver/forensim |
forensim is dedicated to the interpretation of forensic DNA mixtures through statistical methods.
It relies on three S4 classes that facilitate the manipulation and the storage of genetic data produced in
forensic casework: tabfreq, simugeno and simumix.
tabfreq objects are used to store allele frequencies, simugeno objects are
used to store genotypes and
simumix objects are used to store DNA mixtures.
For more information about these classes type 'class ?tabfreq', 'class ?simugeno' and 'class ?simumix'.
Hinda Haned <[email protected]>
The A2.simu function launches a Tcl/Tk graphical interface with functionalities devoted to two-person DNA mixtures
resolution, when two alleles are observed at a given locus.
A2.simu()A2.simu()
When two alleles are observed at a given locus in the DNA stain, seven genotype combinations
are possible for the two contributors: (AA,AB), (AB,AB), (AA,BB), (AB,AA), (BB,AA), (AB,BB) and (BB,AB), where A and B are
the two observed alleles (in ascending order of molecular weight).
Having previously obtained an estimation for the mixture proportion,
it is possible to reduce the number of possible genotype combinations by keeping those only supported by the
observed data. This is achieved by computing the sum of square differences between the expected
allelic ratio and the observed allelic ratio, for all possible mixture combinations.
The likelihood of peak heights (or areas), given the combination of genotypes, is
high if the residuals are low.
Genotype combinations are thus selected according to the peak heights with the highest likelihoods.
The A2.simu() function launches a dialog window with three buttons:
-Plot simulations: plot of the residuals of each possible genotype combination for varying values of the mixture proportion across the interval [0.1, 0.9].
The observed mixture proportion is also reported on the plot.
-Simulation details: a matrix containing the simulation results. Simulation details and genotype combinations
with the lowest residuals can be saved as a text file by clicking the
“Save" button. It is also possible to choose specific paths and names for the save files.
-Genotypes filter: a matrix giving the mixture proportion conditional on the genotype combination. This conditional mixture proportion helps filter the most
plausible genotypes among the seven possible combinations. The matrix can be saved as a text file by clicking the “Save" button.
It is also possible to choose a specific path and a name for the save file.
No return value, called to show GUI.
-Linux users may have to download the libtktable package to their system before using the A2.simu function.
This is due to the Tktable widget, used in forensim, which is not (always)
downloaded with the Tcl/Tk package.
-For the computational details, please see forensim tutorial at http://forensim.r-forge.r-project.org/misc/forensim-tutorial.pdf.
Hinda Haned [email protected]
Gill P, Sparkes P, Pinchin R, Clayton, Whitaker J, Buckleton J. Interpreting simple STR mixtures using allele peak areas. Forensic Sci Int 1998;91:41-53.
A3.simu: the three-allele model, and A4.simu: the four-allele model
A2.simu()A2.simu()
The A3.simu function launches a Tcl/Tk graphical interface with functionalities devoted to two-person
DNA mixtures resolution, when three alleles are observed at a given locus.
A3.simu()A3.simu()
When three alleles are observed at a given locus in the DNA stain, twelve genotype combinations
are possible for the two contributors: (AA,BC), (BB,AC), (CC,AB), (AB,AC), (BC,AC), (AB,BC), (BC,AA), (AC,BB), (AB,CC), (AC,AB), (AC,BC) and (BC,AB) where A, B and C are the three
observed alleles (in ascending order of molecular weights).
Having previously obtained an estimation for the mixture proportion,
it is possible to reduce the number of possible genotype combinations by keeping those only supported by the
observed data. This is achieved by computing the sum of square differences between the expected
allelic ratio and the observed allelic ratio, for all possible mixture combinations.
The likelihood of peak heights (or areas), given the combination of genotypes, is
high if the residuals are low.
Genotype combinations are thus selected according to the peak heights with the highest likelihoods.
The A3.simu() function launches a dialog window with three buttons:
-Plot simulations: plot of the residuals of each possible genotype combination for varying values of the mixture proportion across the interval [0.1, 0.9].
The observed mixture proportion is also reported on the plot.
-Simulation details: a matrix containing the simulation results. Simulation details and genotype combinations
with the lowest residuals can be saved as a text file by clicking the
“Save" button. It is also possible to choose specific paths and names for the save files.
-Genotypes filter: a matrix giving the mixture proportion conditional on the genotype combination. This conditional mixture proportion helps filter the most
plausible genotypes among the twelve possible combinations. The matrix can be saved as a text file by clicking the “Save" button.
It is also possible to choose a specific path and a name for the save file.
No return value, called to show GUI.
-Linux users may have to download the libtktable package to their system before using the A3.simu function.
This is due to the Tktable widget, used in forensim, which is not (always)
downloaded with the Tcl/Tk package.
-For the computational details, please see forensim tutorial at http://forensim.r-forge.r-project.org/misc/forensim-tutorial.pdf.
Hinda Haned [email protected]
Gill P, Sparkes P, Pinchin R, Clayton, Whitaker J, Buckleton J. Interpreting simple STR mixtures using allele peak areas. Forensic Sci Int 1998;91:41-53.
A2.simu: the two-allele model, and A4.simu: the four-allele model
A3.simu()A3.simu()
The A4.simu function launches a Tcl/Tk graphical interface with functionalities devoted to two-person DNA mixtures
resolution, when four alleles are observed at a given locus.
A4.simu()A4.simu()
When four alleles are observed at a given locus in the DNA stain, six genotype combinations
are possible for the two contributors: (AB,CD),(AC,BD),(AD,BC),(BC,AD),(BD,AC) and (CD,AB) where A, B, C and D are the four
observed alleles
(in ascending order of molecular weights). Having previously obtained an estimation for the mixture proportion,
it is possible to reduce the number of possible genotype combinations by keeping those only supported by the
observed data. This is achieved by computing the sum of square differences between the expected
allelic ratio and the observed allelic ratio, for all possible mixture combinations.
The likelihood of peak heights (or areas), given the combination of genotypes, is
high if the residuals are low.
Genotype combinations are thus selected according to the peak heights with the highest likelihoods.
The A4.simu() function launches a dialog window with three buttons:
-Plot simulations: plot of the residuals of each possible genotype combination for varying values of the mixture proportion across the interval [0.1, 0.9].
The observed mixture proportion is also reported on the plot.
-Simulation details: a matrix containing the simulation results. Simulation details and genotype combinations
with the lowest residuals can be saved as a text file by clicking the
“Save" button. It is also possible to choose specific paths and names for the save files.
-Genotypes filter: a matrix giving the mixture proportion conditional on the genotype combination. This conditional mixture proportion helps filter the most
plausible genotypes among the six possible combinations. The matrix can be saved as a text file by clicking the “Save" button.
It is also possible to choose a specific path and a name for the save file.
No return value, called to show GUI.
-Linux users may have to download the libtktable package to their system before using the A4.simu function.
This is due to the Tktable widget, used in forensim, which is not (always)
downloaded with the Tcl/Tk package.
-For the computational details, please see forensim tutorial at http://forensim.r-forge.r-project.org/misc/forensim-tutorial.pdf.
Hinda Haned [email protected]
Gill P, Sparkes P, Pinchin R, Clayton, Whitaker J, Buckleton J. Interpreting simple STR mixtures using allele peak areas. Forensic Sci Int 1998;91:41-53.
A2.simu: the two-allele model, and A3.simu: the three-allele model
A4.simu()A4.simu()
Accessors for forensim objects: simugeno, simumix and tabfreq.
"$" and "$<-" are used to access the slots of an object, they are equivalent
to "@" and "@<-".
A simugeno, a simumix or a tabfreq object.
Hinda Haned [email protected]
data(strusa) class(strusa) [email protected] #equivalent strusa$pop.namesdata(strusa) class(strusa) strusa@pop.names #equivalent strusa$pop.names
The changepop function changes population-related information in tabfreq, simugeno and
simumix objects
changepop(obj, oldpop, newpop)changepop(obj, oldpop, newpop)
obj |
a forensim object, either a tabfreq, a simugeno or a simumix object |
oldpop |
a character vector giving the population names to be changed |
newpop |
a character vector giving the new population names |
a forensim object where the slots containing population-related information have been modified
Hinda Haned [email protected]
data(strveneto) tab1 <- simugeno(strveneto,n=100) tab2 <- changepop(tab1,"Veneto","VENE") tab1$pop.names tab2$pop.namesdata(strveneto) tab1 <- simugeno(strveneto,n=100) tab2 <- changepop(tab1,"Veneto","VENE") tab1$pop.names tab2$pop.names
The number of all possible combinations of m elements among n with repetitions.
Cmn(m, n)Cmn(m, n)
m |
the |
n |
the |
There are (n+m-1)!/(m!(n-1)!) ways to combine m elements among n with repetitions.
Numeric with the number of combinations.
Cmn was implemented as an auxiliary function for the dataL function which computes the
likelihood of the observed alleles in a mixed DNA stain conditional on the number of contributors.
Hinda Haned <[email protected]>
comb for all possible combinations of m elements among n with repetitions
Cmn(2,3) comb(2,3)Cmn(2,3) comb(2,3)
Generate all possible combinations of m elements among n with repetitions.
comb(m, n)comb(m, n)
m |
the number of elements to combine |
n |
the number of elements from which to combine the |
There are (n+m-1)!/(m!(n-1)!) ways to combine m elements among n with repetitions, combn generates
all these possible combinations.
A matrix of (n+m-1)!/(m!(n-1)!) rows, and n columns, each row is a possible combination of m elements among n .
Hinda Haned [email protected]
Cmn for the calculation of the number of all possible combinations of m elements among n with repetitions
#combine 2 objects among 3 with repetitions Cmn(2,3) comb(2,3)#combine 2 objects among 3 with repetitions Cmn(2,3) comb(2,3)
The function dataL gives the likelihood of a set of alleles observed at a specific locus conditional
on the number of contributors that gave these alleles. Calculation is based upon the frequencies
of the observed alleles.
dataL(x = 1, p, theta = 0)dataL(x = 1, p, theta = 0)
x |
an integer giving the number of contributors |
p |
a numeric vector giving the frequencies of the observed alleles in the mixture |
theta |
a float in [0,1[. |
Numeric likelihood value.
dataL function has several similarities with the Pevid.gen function
of the forensic package which computes the probability of the DNA evidence, dataL
implements a particular case of this probability. Please see https://cran.r-project.org/package=forensic
Hinda Haned [email protected]
Haned H, Pene L, Lobry JR, Dufour AB, Pontier D.
Estimating the number of contributors to forensic DNA mixtures: Does maximum likelihood
perform better than maximum allele count? J Forensic Sci, accepted 2010.
Curran JM, Triggs CM, Buckleton J, Weir BS. Interpreting DNA Mixtures in Structured Populations. J Forensic Sci 1999;44(5): 987-995
lik.loc and lik for calculating the likelihood of a given simumix object
#likelihood of observing two alleles at frequencies 0.1 and 0.01 when the number of #contributors is 2, in two cases: theta=0 and theta=0.03 dataL(x=2,p=c(0.1,0.01), theta=0) dataL(x=2,p=c(0.1,0.01), theta=0.03)#likelihood of observing two alleles at frequencies 0.1 and 0.01 when the number of #contributors is 2, in two cases: theta=0 and theta=0.03 dataL(x=2,p=c(0.1,0.01), theta=0) dataL(x=2,p=c(0.1,0.01), theta=0.03)
The findfreq function finds the allele frequencies of a mixture stored in a simumix object, form a given tabfreq object.
If the tabfreq object contains multiple populations, a reference population from which to extract the
frequencies must be specified.
findfreq(mix, freq, refpop = NULL)findfreq(mix, freq, refpop = NULL)
mix |
a |
freq |
a |
refpop |
a factor giving the reference population in |
A list giving the allele frequencies for each locus.
Hinda Haned <[email protected]>
data(strusa) s2<-simumix(simugeno(strusa,n=c(0,2000,0)),ncontri=c(0,2,0)) findfreq(s2,strusa,refpop="Cauc")data(strusa) s2<-simumix(simugeno(strusa,n=c(0,2000,0)),ncontri=c(0,2,0)) findfreq(s2,strusa,refpop="Cauc")
The findmax function finds the maximum of a vector and its position.
findmax(vec)findmax(vec)
vec |
a numeric vector |
findmax finds the maximum value of a vector and its position.
A matrix of two columns:max the position of the maximum in vec maxval the maximum
findmax is an auxiliary function for the dataL function,
used to compute the
likelihood of the observed alleles in a mixed DNA stain given the number of contributors.
Hinda Haned <[email protected]>
findmax(1:10)findmax(1:10)
Hbsimu is a user-friendly graphical interface simulating the
heterozygous balance of heterozygous profiles generated according
to the simulation model described in Gill et al. (2005)
Hbsimu()Hbsimu()
No return value, called to show GUI.
Hinda Haned [email protected]
Gill P, Curran J and Elliot K. A graphical simulation model of the entire DNA process associated with the analysis of short tandem repeat loci. Nucleic Acids Research 2005, 33(2): 632-643.
Hbsimu()Hbsimu()
The lik function computes the likelihood of the observed alleles in a forensic DNA mixture, for a set
of loci, conditional on the number of contributors to the mixture. The overall likelihood is computed as the
product of loci likelihoods.
lik(x = 1, mix, freq, refpop = NULL, theta = NULL, loc=NULL)lik(x = 1, mix, freq, refpop = NULL, theta = NULL, loc=NULL)
x |
the number of contributors to the DNA mixture, default is 1 |
mix |
a |
freq |
a |
refpop |
a factor giving the reference population in |
theta |
a float from [0,1[ giving Wright's Fst coefficient.
|
loc |
loci for which the overall likelihood shall be computed. Default (NULL) corresponds to all loci |
lik computes the likelihood of the alleles observed at all loci conditional on the number of contributors.
This function implements the general formula for the interpretation of DNA mixtures
in case of population subdivision (Curran et al, 1999), in the particular case where all contributors are unknown
and belong to the same subpopulation.
The likelihood for multiple loci is computed as the product of loci likelihoods.
Numeric likelihood value.
Hinda Haned <[email protected]>
Haned H, Pene L, Lobry JR, Dufour AB, Pontier D.
Estimating the number of contributors to forensic DNA mixtures: Does maximum likelihood
perform better than maximum allele count? J Forensic Sci, accepted 2010.
Curran JM, Triggs CM, Buckleton J, Weir BS. Interpreting DNA Mixtures in Structured Populations.
J Forensic Sci 1999;44(5): 987-995
lik.loc for the likelihood per locus, likestim and
likestim.loc for the estimation of the number of contributors to a DNA mixture through
likelihood maximization
data(strusa) #simulation of 1000 genotypes from the African American allele frequencies gen<-simugeno(strusa,n=c(1000,0,0)) #3-person mixture mix3<-simumix(gen,ncontri=c(3,0,0)) sapply(1:3, function(i) lik(x=i,mix3, strusa, refpop="Afri"))data(strusa) #simulation of 1000 genotypes from the African American allele frequencies gen<-simugeno(strusa,n=c(1000,0,0)) #3-person mixture mix3<-simumix(gen,ncontri=c(3,0,0)) sapply(1:3, function(i) lik(x=i,mix3, strusa, refpop="Afri"))
The lik.loc function computes the likelihood of the observed data in a forensic DNA mixture, for each of the loci involved, conditional on the number of contributors to
the mixture.
lik.loc(x = 1, mix, freq, refpop = NULL, theta = NULL, loc=NULL)lik.loc(x = 1, mix, freq, refpop = NULL, theta = NULL, loc=NULL)
x |
the number of contributors to the DNA mixture |
mix |
a |
freq |
a |
refpop |
a factor giving the reference population in |
theta |
a float from [0,1[ giving Wright's Fst coefficien. |
loc |
the loci for which the likelihood shall be computed. Default (set to NULL) corresponds to all loci. |
lik.loc computes the likelihood per locus of the observed alleles.
This function implements the general formula for the interpretation
of DNA mixtures in case of subdivided populations (Curran et al, 1999), in the particular case where all contributors
are unknown and belong to the same subpopulation.
The Fst coefficient given in the theta argument allows accounting for population subdivision when all
contributors belong to the same subpopulation.
The function lik.loc returns a vector, of length the number of loci in loc,
giving the likelihood of the data for each locus.
Hinda Haned <[email protected]>
Haned H, Pene L, Lobry JR, Dufour AB, Pontier D.
Estimating the number of contributors to forensic DNA mixtures: Does maximum likelihood
perform better than maximum allele count? J Forensic Sci, accepted 2010.
Curran JM, Triggs CM, Buckleton J, Weir BS. Interpreting DNA Mixtures in Structured Populations. J Forensic Sci 1999;44(5): 987-995
lik for the overall loci likelihood, likestim and
likestim.loc for the estimation of the number of contributors to a DNA mixture through
likelihood maximization
data(strusa) #simulation of 1000 genotypes from the Caucasian allele frequencies gen<-simugeno(strusa,n=c(0,100,0)) #4-person mixture mix4 <- simumix(gen,ncontri=c(0,4,0)) lik.loc(x=2,mix4, strusa, refpop="Cauc") lik.loc(x=2,mix4, strusa, refpop="Afri") #You may also want to try: #likestim(mix4,strusa,refpop="Cauc")data(strusa) #simulation of 1000 genotypes from the Caucasian allele frequencies gen<-simugeno(strusa,n=c(0,100,0)) #4-person mixture mix4 <- simumix(gen,ncontri=c(0,4,0)) lik.loc(x=2,mix4, strusa, refpop="Cauc") lik.loc(x=2,mix4, strusa, refpop="Afri") #You may also want to try: #likestim(mix4,strusa,refpop="Cauc")
The likestim function gives multiloci estimation of the number of contributors to a forensic DNA mixture
using likelihood maximization.
likestim(mix, freq, refpop = NULL, theta = NULL, loc=NULL)likestim(mix, freq, refpop = NULL, theta = NULL, loc=NULL)
mix |
a |
freq |
a |
refpop |
the reference population from which to extract the allele frequencies used in the likelihood
calculation. If |
theta |
a float from [0,1[ giving Wright's Fst coefficient. |
loc |
loci to be considered in the estimation. Default (set to NULL) corresponds to all loci. |
The number of contributors which maximizes the likelihood of the data observed in the mixture is searched in the discrete interval [1,6]. In most cases this interval is a plausible range for the number of contributors.
A matrix of dimension 1 x 2, the first column, max, gives the maximum likelihood estimation of the number of contributors,
the second column
gives the corresponding likelihood value maxvalue.
Hinda Haned <[email protected]>
Haned H, Pene L, Lobry JR, Dufour AB, Pontier D.
Estimating the number of contributors to forensic DNA mixtures: Does maximum likelihood
perform better than maximum allele count? J Forensic Sci, accepted 2010.
Egeland T, Dalen I, Mostad PF.
Estimating the number of contributors to a DNA profile. Int J Legal Med 2003, 117: 271-275
Curran JM, Triggs CM, Buckleton J, Weir BS. Interpreting DNA Mixtures in Structured Populations. J Forensic Sci 1999, 44(5): 987-995
likestim.loc for maximum of likelihood estimations per locus
data(strusa) #simulation of 1000 genotypes from the Hispanic allele frequencies gen<-simugeno(strusa,n=c(0,0,100)) #4-person mixture mix4 <- simumix(gen,ncontri=c(0,0,4)) likestim(mix4,strusa,refpop="Hisp")data(strusa) #simulation of 1000 genotypes from the Hispanic allele frequencies gen<-simugeno(strusa,n=c(0,0,100)) #4-person mixture mix4 <- simumix(gen,ncontri=c(0,0,4)) likestim(mix4,strusa,refpop="Hisp")
The likestim.loc function returns the estimation of the number of contributors,
at each locus, obtained by maximizing the likelihood.
likestim.loc(mix, freq, refpop = NULL, theta = NULL, loc = NULL)likestim.loc(mix, freq, refpop = NULL, theta = NULL, loc = NULL)
mix |
a |
freq |
a |
refpop |
the reference population from which to extract the allele frequencies used in the likelihood
calculation. Default set to NULL, if |
theta |
a float from [0,1[ giving Wright's Fst coefficient. |
loc |
loci to be considered in the estimation. Default (set to NULL) corresponds to all loci. |
The number of contributors which maximizes the likelihood of the data observed in the mixture is searched in the discrete interval [1,6]. In most cases this interval is a plausible range for the number of contributors.
A matrix of dimension loc x 2. The first colum, max, gives the maximum likelihood estimation
of the number of contributors for each locus in row. The second column, maxvalue,
gives the corresponding likelihood value.
Hinda Haned <[email protected]>
Haned H, Pene L, Lobry JR, Dufour AB, Pontier D.
Estimating the number of contributors to forensic DNA mixtures: Does maximum likelihood
perform better than maximum allele count? J Forensic Sci, accepted 2010.
Egeland T , Dalen I, Mostad PF.
Estimating the number of contributors to a DNA profile. Int J Legal Med 2003, 117: 271-275
Curran, JM , Triggs CM, Buckleton J , Weir BS. Interpreting DNA Mixtures in Structured Populations. J Forensic Sci 1999, 44(5): 987-995
likestim for multiloci estimations
data(strusa) #simulation of 1000 genotypes from the Hispanic allele frequencies gen<-simugeno(strusa,n=c(0,0,100)) #4-person mixture mix4 <- simumix(gen,ncontri=c(0,0,4)) likestim.loc(mix4,strusa,refpop="Hisp")data(strusa) #simulation of 1000 genotypes from the Hispanic allele frequencies gen<-simugeno(strusa,n=c(0,0,100)) #4-person mixture mix4 <- simumix(gen,ncontri=c(0,0,4)) likestim.loc(mix4,strusa,refpop="Hisp")
likEvid allows the calculation of likelihood for a piece of DNA evidence, for any number of replicates, any number of contributors, and when drop-in and drop-out are possible.
likEvid(Repliste, Tg, Vg, x, theta, prDHet, prDHom, prC, freq)likEvid(Repliste, Tg, Vg, x, theta, prDHet, prDHom, prC, freq)
Repliste |
vector of alleles present at a given locus for any number of replicates. If there are two replicates, showing alleles 12,13, and 14 respectively, then |
Tg |
vector of genotypes for the known contributors under Hp. Genotype 12/17 should be given as a vector c(12,17) and genotypes 12/17,14/16, should be given as a unique vector: c(12,17,14,16). If T is empty, set to 0. |
Vg |
vector of genotypes for the known non-contributors (see References section) under Hp. See |
x |
Number of unknown individuals under H. Set to 0 if there are no unknown contributors. |
theta |
thete correction, value must be taken in [0,1) |
prDHet |
probability of dropout for heterozygotes. It is possible to assign different values per contributor. In this case, |
prDHom |
probability of dropout for homozygotes. See description ofr argument |
prC |
probability of drop-in applied per locus |
freq |
vector of the corresponding allele frequencies of the analysed locus in the target population |
Numeric likelihood value.
Hinda Haned [email protected]
Gill, P.; Kirkham, A. & Curran, J. LoComatioN: A software tool for the analysis of low copy number DNA profiles Forensic Science International, 2007, 166(2-3), 128-138
Curran, J. M.; Gill, P. & Bill, M. R. Interpretation of repeat measurement DNA evidence allowing for multiple contributors and population substructure Forensic Science International, 2005, 148, 47-53
#load allele frequencies library(forensim) data(ngm) #create vector of allele frequencies d10<-ngm$tab$D10 # evaluate the evidence under Hp; contributors are the suspect and one unknown, # dropout probabilities for the suspect and the unknown are the same: 0.2 for heterozygotes, # and 0.1 for homozygotes. likEvid(Repliste=c(12,13,14),Tg=c(12,13),Vg=0,x=1,theta=0,prDHet=c(0.2,0.2), prDHom=c(0.04,0.04),prC=0, freq=d10) # evaluate the evidence under Hd; contributors are two unknown people, the dropout # probabilities for the unknowns is kept the same under Hd likEvid(Repliste=c(12,13,14),Tg=0,Vg=0,x=2,theta=0,prDHet=c(0.2,0.2), prDHom=c(0.04,0.04),prC=0,freq=d10)#load allele frequencies library(forensim) data(ngm) #create vector of allele frequencies d10<-ngm$tab$D10 # evaluate the evidence under Hp; contributors are the suspect and one unknown, # dropout probabilities for the suspect and the unknown are the same: 0.2 for heterozygotes, # and 0.1 for homozygotes. likEvid(Repliste=c(12,13,14),Tg=c(12,13),Vg=0,x=1,theta=0,prDHet=c(0.2,0.2), prDHom=c(0.04,0.04),prC=0, freq=d10) # evaluate the evidence under Hd; contributors are two unknown people, the dropout # probabilities for the unknowns is kept the same under Hd likEvid(Repliste=c(12,13,14),Tg=0,Vg=0,x=2,theta=0,prDHet=c(0.2,0.2), prDHom=c(0.04,0.04),prC=0,freq=d10)
LR Allows the calculation of likelihood ratios for a piece of DNA evidence, for any number of replicates, any number of contributors, and when drop-in and drop-out are possible.
LR(Repliste, Tp, Td, Vp, Vd, xp, xd, theta, prDHet, prDHom, prC, freq)LR(Repliste, Tp, Td, Vp, Vd, xp, xd, theta, prDHet, prDHom, prC, freq)
Repliste |
vector of alleles present at a given locus for any number of replicates. If there are two replicates, showing alleles 12,13, and 14 respectively, then |
Tp |
vector of genotypes for the known contributors under Hp. Genotype 12/17 should be given as a vector c(12,17) and genotypes 12/17,14/16, should be given as a unique vector: c(12,17,14,16). |
Td |
vector of genotypes for the known contributors under Hd. Should be in the same format as Tp. If there are no known contributors under Hd, then set Td to 0. |
Vp |
vector of genotypes for the known non-contributors (see References section) under Hp. See |
Vd |
vector of genotypes for the known non-contributors (see References section) under Hd. Should be in the same format than Vp, if empty, set to 0. |
xp |
Number of unknown individuals under Hd. Set to 0 if there are no unknown contributors. |
xd |
Number of unknown individuals under Hd. Set to 0 if there are no unknown contributors. |
theta |
thete correction, value must be taken in [0,1) |
prDHet |
probability of dropout for heterozygotes. It is possible to assign different values per contributor. In this case, |
prDHom |
probability of dropout for homozygotes. See description ofr argument |
prC |
probability of drop-in applied per locus |
freq |
vector of the corresponding allele frequencies of the analysed locus in the target population |
List with named elements for numerator likelihood (num), denominator likelihood (deno) and likelihood ratio (LR)
Hinda Haned [email protected]
Gill, P.; Kirkham, A. & Curran, J. LoComatioN: A software tool for the analysis of low copy number DNA profiles Forensic Science International, 2007, 166(2-3), 128-138
Curran, J. M.; Gill, P. & Bill, M. R. Interpretation of repeat measurement DNA evidence allowing for multiple contributors and population substructure Forensic Science International, 2005, 148, 47-53
#load allele frequencies library(forensim) data(ngm) #create vector of allele frequencies d10<-ngm$tab$D10 # heterozygote dropout probability (resp. homozygote) is set to 0.2 for all # contributors (0.04 for homozygotes) LR(Repliste=c(12,13,14),Tp=c(12,13),Td=0,Vp=0,Vd=0,xd=2,xp=1,theta=0,prDHet=c(0.2,0.2), prDHom=c(0.04,0.04),prC=0,freq=d10)#load allele frequencies library(forensim) data(ngm) #create vector of allele frequencies d10<-ngm$tab$D10 # heterozygote dropout probability (resp. homozygote) is set to 0.2 for all # contributors (0.04 for homozygotes) LR(Repliste=c(12,13,14),Tp=c(12,13),Td=0,Vp=0,Vd=0,xd=2,xp=1,theta=0,prDHet=c(0.2,0.2), prDHom=c(0.04,0.04),prC=0,freq=d10)
User-friendly graphical user interface for the LR calculator LR.
LRmixTK(verbose)LRmixTK(verbose)
verbose |
if TRUE, progress is written to the console |
No return value, called to show GUI.
Hinda Haned [email protected]
The mastermix function launches a Tcl/Tk graphical user interface dedicated
to the resolution of two-person DNA mixtures using allele peak heights/ or areas information.
mastermix is the implementation of a method developed by Gill et al (see the references section),
and previously programmed into an Excel macro by Dr. Peter Gill.
mastermix()mastermix()
mastermix is a Tcl/Tk graphical user interface implementing a method developed by Gill et al
(1998) for simple mixtures resolution, using allele peak heights or areas information.
This method searches through simulation the most likely combination(s) of the contributors' genotypes.
Having previously obtained an estimation for the mixture proportion,
it is possible to reduce the number of possible genotype combinations by keeping only
those supported by the observed data. This is achieved by computing the sum of square differences between the expected
allelic ratio and the observed allelic ratio, for all possible mixture combinations.
The likelihood of peak heights (or areas), conditional on the combination of genotypes, is
high if the residuals are low.
Genotype combinations are thus selected according to the peak heights with the highest (conditioned) likelihoods.
mastermix offers a graphical representation of the simulation for three models:
-The two allele model: at a given locus, two alleles are observed in the DNA stain.
-The three allele model: at a given locus, three alleles are observed in the DNA stain.
-The four allele model: at a given locus, four alleles are observed in the DNA stain.
A left-click on each button launches a simulation dialog window for the corresponding model, while a right-click opens the corresponding help page.
No return value, called to show GUI.
-Each implemented model can either be launched using the mastermix interface, or the
A2.simu, A3.simu and A4.simu functions, depending on the considered model.
-For the computational details, please see forensim tutorial at http://forensim.r-forge.r-project.org/misc/forensim-tutorial.pdf.
Hinda Haned [email protected]
Gill P, Sparkes P, Pinchin R, Clayton, Whitaker J, Buckleton J. Interpreting simple STR mixtures using allele peak areas. Forensic Sci Int 1998;91:41-5.
mastermix()mastermix()
mincontri gives the minimum number of contributors required to explain a forensic DNA mixture. This method
is also known as the maximum allele count as it relies on the maximum number of alleles showed through all available
loci
mincontri(mix, loc = NULL)mincontri(mix, loc = NULL)
mix |
a |
loc |
the loci to consider for the calculation of the minimum of contributors, default (NULL) corresponds to all loci |
Integer with minium number of contibutors.
Hinda Haned [email protected]
likestim for the estimation of the number of contributors through likelihood maximization
data(strusa) #simulation of 1000 genotypes from the African American allele frequencies gen<-simugeno(strusa,n=c(1000,0,0)) #5-person mixture mix5<-simumix(gen,ncontri=c(5,0,0)) #compare likestim(mix5, strusa, refpop="Afri") mincontri(mix5)data(strusa) #simulation of 1000 genotypes from the African American allele frequencies gen<-simugeno(strusa,n=c(1000,0,0)) #5-person mixture mix5<-simumix(gen,ncontri=c(5,0,0)) #compare likestim(mix5, strusa, refpop="Afri") mincontri(mix5)
The maximum allele count principle leads to wrong conclusion for two contributors if only a maximum of one or two alleles is seen. This probability of error is calculated.
N2error(dat)N2error(dat)
dat |
a data frame, first column gives the alleles size, remaining columns give their frequencies |
The probability of error is returned.
Thore Egeland [email protected]
#Example based on 15 markers of Tu data library(forensim) data(Tu) N2error(Tu)#Example based on 15 markers of Tu data library(forensim) data(Tu) N2error(Tu)
The distribution of N, the number of alleles showing is calculated exactly assuming 2 contributors. Theta-correction is not implemented. The function may be used to check accuracy of simulations and indicate required number of simulations for one example.
N2Exact(p)N2Exact(p)
p |
vector of allele frequencies. Must sum to 1. Default: for uniformly distrubted alleles. |
Returns(P(N=i) for i=1,2,3,4
Thore Egeland [email protected]
#Distribution for a marker with 20 alles of equal frequency N2Exact(p=rep(0.05,20))#Distribution for a marker with 20 alles of equal frequency N2Exact(p=rep(0.05,20))
naomitab handles missing values (NA) in a data frame: it returns a list of the columns where NAs have
been removed.
naomitab(tab)naomitab(tab)
tab |
a data frame |
Returns a list of length the number of columns in tab
where each component is a column of tab, and the values are the corresponding rows where NAs have
been removed.
This function was designed to handle missing values in data frames in the format of the Journal of Forensic Sciences for population genetic data: allele names are given in the first column, and frequencies for a given allele are read in rows for different loci. When a given allele is not observed, the value is coded NA (originally coded "-" in the journal).
Hinda Haned <[email protected]>
data(Tu) naomitab(Tu)data(Tu) naomitab(Tu)
nball gives the number of alleles of a simumix object.
nball(mix, byloc = FALSE)nball(mix, byloc = FALSE)
mix |
a |
byloc |
a logical indicating whether the number of alleles must be calculated by locus or for all loci (default) |
If byloc=TRUE, the number of alleles by locus; otherwise the sum.
Hinda Haned <[email protected]>
data(strusa) #simulating 100 genotypes with allele frequencies from the African American population gaa<-simugeno(strusa,n=c(100,0,0)) #simulating a 4-person mixture maa4<-simumix(gaa,ncontri=c(4,0,0)) nball(maa4,byloc=TRUE)data(strusa) #simulating 100 genotypes with allele frequencies from the African American population gaa<-simugeno(strusa,n=c(100,0,0)) #simulating a 4-person mixture maa4<-simumix(gaa,ncontri=c(4,0,0)) nball(maa4,byloc=TRUE)
Allele frequencies for 15 autosomal short tandem repeats loci in the American Caucasian population.
data(ngm)data(ngm)
Budowle, B.; Ge, J.; Chakraborty, R.; Eisenberg, A.; Green, R.; Mulero, J.; Lagace, R. & Hennessy, L. Population genetic analyses of the NGM STR loci International Journal of Legal Medicine, 2011, 1-9
library(forensim) data(ngm) boxplot(ngm$tab)library(forensim) data(ngm) boxplot(ngm$tab)
Computes the random man exclusion probability of a mixture stored in a simumix object
PE(mix, freq, refpop = NULL, theta = 0, byloc = FALSE)PE(mix, freq, refpop = NULL, theta = 0, byloc = FALSE)
mix |
a |
freq |
a |
refpop |
character giving the reference population, used only if |
theta |
a float from [0,1[ giving Wright's Fst coefficient. |
byloc |
logical, if TRUE, than the exclusion probability is computed per locus, if FALSE (default), the calculations are done for all loci simultaneously |
PE gives the exclusion probability at a locus, or at several loci when conditions for Hardy Weinberg are
met. If this condition is not met in the population,
than a value for theta must be supplied to take into account dependencies
between alleles. The formula of the exclusion probability that allows taking into account departure
from Hardy Weinberg proportions due to population subdivision was provided by Bruce Weir, please see the
references section.
Numeric vector with exclusion probability (by locus if byloc = TRUE).
Hinda Haned <[email protected]>
Clayton T, Buckleton JS. Mixtures. In: Buckleton JS, Triggs CM, Walsh SJ, editors. Forensic DNA Interpretation. CRC Press 2005;217-74
data(strusa) geno1<-simugeno(strusa,n=c(0,0,100)) mix2 <-simumix(geno1,ncontri=c(0,0,2)) PE(mix2,strusa,"Hisp",byloc=TRUE)data(strusa) geno1<-simugeno(strusa,n=c(0,0,100)) mix2 <-simumix(geno1,ncontri=c(0,0,2)) PE(mix2,strusa,"Hisp",byloc=TRUE)
The PV function implements the predictive value of the maximum likelihood estimator of the number of contributors to a DNA
mixture
PV(mat, prior)PV(mat, prior)
mat |
matrix giving the estimates of the conditional probabilities that the maximum likelihood estimator classifies a given stain as a mixture of i contributors given that there are k contributor(s) to the stain. Estimates i must be given in columns for each possible value of the number of contributors given in rows. |
prior |
numeric vector giving the prior probabilities of encountering a mixture of i contributors. |
Vector of the predictive values
Hinda Haned <[email protected]>
Haned H., Pene L., Sauvage F., Pontier D., The predictive value of the maximum likelihood estimator of the number of contributors to a DNA mixture, submitted, 2010.
maximum likelihood estimator likestim
# the following examples reproduce some of the calculations appearing # in the article cited above, for illustrative purpose, the maximum #number of contributors is set here to 5 #matcondi: Table 2 in Haned et al. (2010) matcondi<-matrix(c(1,rep(0,4),0,0.998,0.005,0,0,0,0.002,0.937,0.067,0,0,0,0.058, 0.805,0.131,rep(0,3),0.127,0.662,rep(0,3),0.001,0.207),ncol=6) #prior defined by a forensic expert (Table 3 in Haned et al., 2010) prior1<-c(0.45,0.04,0.30,0.15,0.06) #uniform prior, for each mixture type, the probability of occurrence is 1/5, #5 being the threshold for the number of contributors prior2<-c(rep(1/5,5)) #predictive values for prior1 PV(matcondi,prior1) #for prior2 PV(matcondi, prior2)# the following examples reproduce some of the calculations appearing # in the article cited above, for illustrative purpose, the maximum #number of contributors is set here to 5 #matcondi: Table 2 in Haned et al. (2010) matcondi<-matrix(c(1,rep(0,4),0,0.998,0.005,0,0,0,0.002,0.937,0.067,0,0,0,0.058, 0.805,0.131,rep(0,3),0.127,0.662,rep(0,3),0.001,0.207),ncol=6) #prior defined by a forensic expert (Table 3 in Haned et al., 2010) prior1<-c(0.45,0.04,0.30,0.15,0.06) #uniform prior, for each mixture type, the probability of occurrence is 1/5, #5 being the threshold for the number of contributors prior2<-c(rep(1/5,5)) #predictive values for prior1 PV(matcondi,prior1) #for prior2 PV(matcondi, prior2)
Allele frequencies for 10 autosomal short tandem repeats loci in the Norwegian population.
data(sgmNorway)data(sgmNorway)
Andreassen, R., S. Jakobsen, and Mevaag, B., Norwegian population data for the 10 autosomal STR loci in the AMPFlSTR(R) SGM Plus(TM) system. Forensic Science International, 2007. 170(1): p. 59-61.
Simulates SNP mixtures and outputs optionally file suitable for wrapdataL function for estimation of number of contributors
simMixSNP(nSNP , p , ncont, writeFile, outfile , id )simMixSNP(nSNP , p , ncont, writeFile, outfile , id )
nSNP |
Integer number of SNPs>1 |
p |
Minor allele frequency |
ncont |
Number of contributors >= 1 |
writeFile |
If TRUE, output written to file |
outfile |
Name of output file |
id |
Column one of output file identifying run |
Returns a data frame with columns Id, marker, allele, frequency and height (=1 for now)
Thore Egeland <[email protected]>
simMixSNP()simMixSNP()
simPCR2 implements a simulation model for the polymorphism chain reaction (Gill et al., 2005).
Giving several input parameters, simPCR2 outputs the number of amplified DNA molecules and their corresponding peak heights
(in RFUs).
simPCR2(ncells,probEx,probAlq, probPCR, cyc = 28, Tdrop = 2 * 10^7, probSperm = 0.5, dip = TRUE,KH=55)simPCR2(ncells,probEx,probAlq, probPCR, cyc = 28, Tdrop = 2 * 10^7, probSperm = 0.5, dip = TRUE,KH=55)
ncells |
initial number of cells |
probEx |
probability that a DNA molecule is extracted (probability of surviving the extraction process) |
probAlq |
probability that a DNA molecule is selected for PCR amplification |
probPCR |
probability that a DNA molecule is amplified during a given round of PCR |
cyc |
number of PCR cycles, default is 28 cycles |
Tdrop |
threshold of detection: number of molecules (in the total PCR reaction mixture) that is needed to generate a signal, default is set to 2*10^7 molecules |
probSperm |
probability of observing alleles of type A in the initial sample of haploid cells (e.g. sperm cells). Probability
of observing allele B is given by 1- |
dip |
logical indicating the cell ploidy, default is diploid cells (TRUE), FALSE is for haploid cells |
KH |
positive constant used to scale the peak heights obtained from the number of amplified molecules (see reference section) |
A threshold of Tdrop (must be a multiple of 10^7) is needed to generate a signal, then, a log-linear relationship is used to
determine the intensity of the signal with respect to the number of successfully amplified DNA molecules. Dropout events occur
whenever less than Tdorp molecules are generated.
A matrix with the following components:
HeightA |
Peak height of allele A |
DropA |
Dropout variable for allele A |
HeightB |
Peak height of allele B |
DropB |
Dropout variable for allele B |
Hinda Haned [email protected]
Jeffreys AJ, Wilson V, Neumann R and Keyte J. Amplification of human minisatellites by the polymerase chain reaction: towards
DNA fingerprinting of single cells. Nucleic Acids Res 1988;16: 10953_10971.
Gill P, Curran J and Elliot K. A graphical simulation model of the entire DNA process associated with the analysis of short tandem repeat loci. Nucleic Acids Research 2005, 33(2): 632-643.
#simulation of a 28 cycles PCR, with the initial stain containing 5 cells simPCR2(ncells=5,probEx=0.6,probAlq=0.30,probPCR=0.8,cyc=28, Tdrop=2*10^7,dip=TRUE,KH=55)#simulation of a 28 cycles PCR, with the initial stain containing 5 cells simPCR2(ncells=5,probEx=0.6,probAlq=0.30,probPCR=0.8,cyc=28, Tdrop=2*10^7,dip=TRUE,KH=55)
simPCR2TK is a user-friendly graphical interface for the simPCR2 function that implements a simulation model
for the polymorphism chain reaction.
simPCR2TK()simPCR2TK()
No return value, called to show GUI.
Hinda Haned [email protected]
Gill P, Curran J and Elliot K. A graphical simulation model of the entire DNA process associated with the analysis of short tandem repeat loci. Nucleic Acids Research 2005, 33(2): 632-643.
#launch the graphical interface simPCR2TK()#launch the graphical interface simPCR2TK()
The simufreqD function simulate single population allele frequencies for independent loci.
Allele frequencies are generated as random deviates from a Dirichlet distribution, whose parameters control
the mean and the variance of the simulated allele frequencies.
simufreqD(nloc = 1, nal = 2, alpha = 1)simufreqD(nloc = 1, nal = 2, alpha = 1)
nloc |
the number of loci to simulate |
nal |
the numbers of alleles per locus. Either an integer, if the loci have the same number of alleles, or an integer vector, if the number of alleles differ between loci |
alpha |
the parameter used to simulate allele frequencies from the Dirichlet distribution. If the
When the number of alleles differ between loci, |
Allele frequencies for independent loci are simulated using a Dirichlet distribution with parameter
alpha. At a given locus L with n alleles, the allele frequencies are modeled as a vector of random
variables
p=(p1, ..., pn), following a Dirichlet distribution with parameters:
alpha = (alpha1, ..., alphan) where p1+...+pn=1 and alpha1,..., alphan > 0.
A matrix containing the simulated allele frequencies. The data is presented in the format of the Journal of Forensic Sciences for genetic data: allele names are given in the first column, and frequencies for a given allele are read in rows for the different markers in columns. When an allele is not observed for a given locus, the value is coded NA (instead of "-" in the original format).
The code used here for the generation of random Dirichlet deviates was previously implemented in the gtools library.
Hinda Haned [email protected]
Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions, vol 2. John Wiley & Sons, 1995.
Wright S. The genetical structure of populations. Ann Eugen 1951;15:323-354.
#simulate alleles frequencies for 5 markers with respectively 2, 3, 4, 5, and 6 alleles simufreqD(nloc=5,na=c(2,3,4,5,6) , alpha=1)#simulate alleles frequencies for 5 markers with respectively 2, 3, 4, 5, and 6 alleles simufreqD(nloc=5,na=c(2,3,4,5,6) , alpha=1)
The S4 simugeno class is used to store existing or simulated genotypes.
tab.freq: a list giving allele frequencies for each locus.
If there are several populations,
tab.freq gives allele frequencies in each population
nind: integer vector giving the number of individuals. If there are several
populations, nind gives the numbers of individuals per population
pop.names:factor of populations names
popind:factor giving the population of each individual
which.loc:character vector giving the locus names
tab.geno:matrix giving the genotypes (in rows) for each locus (in columns). The genotype of a homozygous individual carrying the allele "12" is coded "12/12". A heterozygous individual carrying alleles "12" and "13" is coded "12/13" or "13/12".
indID:character vector giving the individuals ID
signature(x = "simugeno"): gives the names of the attributes of
a simugeno object
signature(object = "simugeno"): shows a simugeno object
signature(object = "simugeno"): prints a simugeno object
Hinda Haned [email protected]
as.simugeno for the simugeno class constructor,
is.simugeno, simumix and
tabfreq
showClass("simugeno")showClass("simugeno")
Constructor for simugeno objects.
The function simugeno creates a simugeno object from
a tabfreq object.
The function as.simugeno is an alias for simugeno function.
is.simugeno tests if an object is a valid simugeno object.
Note: to get the manpage about simugeno, please type 'class ? simugeno'.
simugeno(tab,which.loc=NULL,n=1) as.simugeno(tab,which.loc=NULL,n=1) is.simugeno(x)simugeno(tab,which.loc=NULL,n=1) as.simugeno(tab,which.loc=NULL,n=1) is.simugeno(x)
tab |
a tabfreq object created with constructor |
which.loc |
a character vector giving the chosen loci for the genotypes simulation. The default is set to NULL,
which corresponds to all the loci of the |
n |
integer vector giving the number of individuals. If there are several
populations, |
x |
an object |
At a given locus, an individual's genotype is simulated by randomly drawing two alleles (with replacement) at their respective allele frequencies in the target population.
For simugeno and as.simugeno, a simugeno object. For is.simugeno, a logical.
Hinda Haned [email protected]
"simugeno", and tabfreq for creating a tabfreq object from a data file.
data(Tu) tab<-tabfreq(Tu) #simulation of 3 individual genotypes for the STR marker FGA geno1 <- simugeno(tab,which.loc='FGA', n =100) [email protected]data(Tu) tab<-tabfreq(Tu) #simulation of 3 individual genotypes for the STR marker FGA geno1 <- simugeno(tab,which.loc='FGA', n =100) geno1@tab.geno
The S4 simumix class is used to store DNA mixtures of individual genotypes
along with informations about the individuals poulations and the loci used to simulate the genotypes.
ncontri: integer vector giving the number of contributors to the DNA mixture. If there are
several populations, ncontri gives the number of contributors per population
mix.prof:matrix giving the contributors genotypes (in rows) for each locus (in columns). The genotype of a homozygous individual carrying the allele "12" is coded "12/12". A heterozygous individual carrying alleles "12" and "13" is coded "12/13" or "13/12".
mix.all:list giving the alleles present in the mixture for each locus
which.loc:character vector giving the locus names
popinfo:factor giving the population of each contributor
signature(x = "simumix"): gives the names of the attributes of a simumix object
signature(object = "simumix"): shows a simumix object
signature(object = "simumix"): prints a simumix object
Hinda Haned [email protected]
simugeno, as.simumix, is.simumix, simugeno and tabfreq
showClass("simumix") data(strusa)showClass("simumix") data(strusa)
Constructor for simumix objects.
The function simumix creates a simumix object from
a tabfreq object.
The function as.simumix is an alias for simumix function.
is.simumix tests if an object is a valid simumix object.
Note: to get the manpage about simumix, please type 'class ? simumix'.
simumix(tab,which.loc=NULL,ncontri=1) as.simumix(tab,which.loc=NULL,ncontri=1) is.simumix(x)simumix(tab,which.loc=NULL,ncontri=1) as.simumix(tab,which.loc=NULL,ncontri=1) is.simumix(x)
tab |
a simugeno object created with constructor simugeno |
which.loc |
a character vector giving the chosen loci for the genotypes simulation. The default is set to NULL,
which corresponds to all the loci of the |
ncontri |
integer vector giving the number of individuals. If there are several populations,
|
x |
an object |
DNA mixtures are created by randomly drawing individual genotypes
with a uniform probability.
If there are N individuals in the sample (the simugeno object), then
each individual has a probability of 1/N to be selected.
For simumix and as.simumix, a simumix object. For is.simumix, a logical.
Hinda Haned [email protected]
"simumix", simugeno for creating a simugeno object.
data(Tu) tab<-simugeno(tabfreq(Tu),n=1200) #simulation of a 3-person mixture characterized with markers FGA, TH01 and TPOX simumix(tab,which.loc=c('FGA','TH01', 'TPOX') , n =3)data(Tu) tab<-simugeno(tabfreq(Tu),n=1200) #simulation of a 3-person mixture characterized with markers FGA, TH01 and TPOX simumix(tab,which.loc=c('FGA','TH01', 'TPOX') , n =3)
Simulate multi-population allele frequencies for independent loci, from a given reference population, following a Dirichlet model. Allele frequencies in the populations are generated as random deviates from a Dirichlet distribution, whose parameters control the deviation of allele frequencies from the values in the reference population.
simupopD(npop = 1, nloc = 1, na = 2, globalfreq = NULL, which.loc = NULL, alpha1, alpha2 = 1)simupopD(npop = 1, nloc = 1, na = 2, globalfreq = NULL, which.loc = NULL, alpha1, alpha2 = 1)
npop |
the number of populations |
nloc |
the number of loci |
na |
an integer vector giving the numbers of alleles per locus |
globalfreq |
matrix of allele frequencies in the reference population. Data must be given in the format of the Journal of Forensic
Sciences for genetic data. Default corresponds to allele frequencies generated form a Dirichlet distribution
with parameter |
which.loc |
which loci to simulate from the |
alpha1 |
a positive float vector of length |
alpha2 |
a positive float giving the parameter to be used to in the Dirichlet distribution to generate allele frequencies for the reference population |
In the reference population, allele frequencies for independent loci are simulated using a Dirichlet distribution with
parameter alpha2.
At a given locus L with n alleles, the allele frequencies are modeled as a vector of random
variables p=(p1, ..., pn) following a Dirichlet distribution with a parameter vector of length n,
where each component is equal to alpha2, p1+...+pn=1 and alpha2 > 0.
Note that a more sophisticated generation of global allele frequencies is possible using the simufreqD function.
Similarly, allele frequencies in the independent populations are simulated using a Dirichlet Distribution.
For example, for the first population to simulate, at a given locus L with n alleles,
the allele frequencies are modeled as a vector
of random variables p=(p1, ..., pn) following a Dirichlet distribution with a parameter vector of length n:
(p1(1-a1)/alpha1[1], ..., pn(1-alpha1[1])/alpha1[1]), where p1+...+pn=1 and alpha1[1] > 0.
alpha1[1] is the variance parameter for population 1 and is equivalent to Wright's Fst. The closest this parameter is to one,
the more the population allele frequencies are different from the values of the reference population.
The result is stored in a list with two elements :
globfreq |
a |
popfreq |
a |
The code used here for the generation of random Dirichlet deviates was previously implemented in the gtools library.
Hinda Haned [email protected]
Nicholson G, Smith AV, Jonsson F, Gustafsson O, Stefansson K, Donnelly P.
Assessing population differentiation and isolation from single-nucleotide polymorphism data.
J Roy Stat Soc B 2002;64:695–715
Marchini J, Cardon LR. Discussion on the meeting on "Statistical modelling and analysis of genetic data"
J Roy Stat Soc B, 2002;64:740-741
Wright S. The genetical structure of populations. Ann Eugen 1951;15:323-354
# simulate allele frequencies for two populations data(Tu) simupopD(npop=2,globalfreq=Tu, which.loc=c("FGA","TH01","TPOX"), alpha1=c(0.2,0.3),alpha2=1)# simulate allele frequencies for two populations data(Tu) simupopD(npop=2,globalfreq=Tu, which.loc=c("FGA","TH01","TPOX"), alpha1=c(0.2,0.3),alpha2=1)
Allele frequencies for 15 autosomal short tandem repeats loci on three American populations : Caucasians, African Americans and Hispanics. Among the 15 loci, 13 belong to the core Combined DNA Index System (CODIS) loci used by the Federal Bureau of Investigation (USA), in forensic DNA analysis, and two supplementary loci are more commonly used in Europe, see details.
data(strusa)data(strusa)
strusa is a tabfreq object giving allele frequencies of 15 loci in three American populations.
CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51 and D21S11, belong to the core CODIS loci used in the US, whereas D2S1338 and D19S433 belong to the European core loci.
Butler JM, Reeder DJ. http://www.cstl.nist.gov/strbase/index.htm, last visited: May 11th 2009
Butler JM, Schoske R, Vallone MP, Redman JW, Kline MC. Allele frequencies for 15 autosomal STR loci on U.S. Caucasian, African American, and Hispanic populations. J Forensic Sci 2003;48(8):908-911.
data(strusa) strusa #genotypes simulations from each population geno<- simugeno(strusa,n=c(100,100,100)) geno #3-person mixture simulation with the contributors from the 3 populations mix3<- simumix(geno,ncontri=c(1,1,1)) mix3data(strusa) strusa #genotypes simulations from each population geno<- simugeno(strusa,n=c(100,100,100)) geno #3-person mixture simulation with the contributors from the 3 populations mix3<- simumix(geno,ncontri=c(1,1,1)) mix3
Allele frequencies for three short tandem repeats loci D10S1248, D2S441 and D22S1045 in a sample of 198 individuals born in Veneto, Italy. These loci are commonly used in forensic DNA characterization.
data(strveneto)data(strveneto)
strveneto is a tabfreq object
Turrina S, Atzei R, De Leo D. Population study of three miniSTR loci in Veneto (Italy). Forensic Sci Int Genetics 2008; 1(1);378-379
data(strveneto) #allele frequencies strveneto@tabdata(strveneto) #allele frequencies strveneto@tab
The S4 tabfreq class is used to store allele frequencies, from either one or several populations.
tab: a list giving allele frequencies for each locus. If there are several populations,
tab gives allele frequencies in each population
which.loc:character vector giving the names of the loci
pop.names:factor of populations names (optional)
signature(x = "tabfreq")
: gives the names of the attributes of a tabfreq object
signature(object = "tabfreq")
: shows a tabfreq object
signature(object="tabfreq")
: prints a tabfreq object
Hinda Haned [email protected]
as.tabfreq, is.tabfreq and simugeno for genotypes simulation from allele frequencies stored in a tabfreq object
showClass("tabfreq")showClass("tabfreq")
Constructor for tabfreq objects.
The function tabfreq creates a tabfreq object from
a data frame or a matrix giving allele frequencies for a single population in the Journal of Forensic Sciences (JFS) format for population genetic data.
Whene multiple populations are considered, data shall be given as a list, where each element is either a matrix or a data frame in the JFS format, and the
populations names must be specified.
The function as.tabfreq is an alias for the tabfreq function.
is.tabfreq tests if an object is a valid tabfreq object.
Note: to get the manpage about tabfreq, please type 'class ? tabfreq'.
tabfreq(tab,pop.names=NULL) as.tabfreq(tab,pop.names=NULL) is.tabfreq(x)tabfreq(tab,pop.names=NULL) as.tabfreq(tab,pop.names=NULL) is.tabfreq(x)
tab |
either a matrix or a data.frame of markers allele frequencies given in the Journal of Forensic Sciences format for population genetic data |
pop.names |
(optional) a factor giving the populations names. For a single population in |
x |
an object |
For tabfreq and as.tabfreq, a tabfreq object. For is.tabfreq, a logical.
Hinda Haned [email protected]
"tabfreq", simugeno for creating a simugeno object from a tabfreq object.
data(Tu) tabfreq(Tu,pop.names=factor("Tu"))data(Tu) tabfreq(Tu,pop.names=factor("Tu"))
Population genetic analysis of 15 STR loci of Chinese Tu ethnic minority group.
data(Tu)data(Tu)
a data frame presented in the format of the Journal of Forensic Sciences for genetic data: allele names are given in the first column, and frequencies for a given allele are read in rows for the different markers. When a given allele is not observed, value is coded NA (rather than "-" in the original format).
CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51 and D21S11, belong to the core CODIS loci used in the US, whereas D2S1338 and D19S433 belong to the European core loci.
Zhu B, Yan J, Shen C, Li T, Li Y, Yu X, Xiong X, Muf H, Huang Y, Deng Y. (2008). Population genetic analysis of 15 STR loci of Chinese Tu ethnic minority group. Forensic Sci Int; 174: 255-258.
data(Tu) tabfreq(Tu)data(Tu) tabfreq(Tu)
Virtual classes that are only for internal use in forensim
A virtual Class: programming tool, not intended for objects creation.
Hinda Haned [email protected]
Wrap up of dataL in forensim. Given file with columns: "No, Marker, Allele, Frequency and Height" the log likelihood for requested number of contributors is calculated. For now only "Frequency" column is used.
wrapdataL(fil , plotte , nInMixture , tit )wrapdataL(fil , plotte , nInMixture , tit )
fil |
Input file |
plotte |
If T, plot |
nInMixture |
Alternatives for number of contributors, say 1:5 |
tit |
Title to be used in plot |
Plot (optional) and log likelihoods
Thore Egeland [email protected]
aa<-simMixSNP(nSNP=5,writeFile=FALSE,outfile="sim.txt",ncont=3) #Simulates data # run with writeFile = TRUE for plot # aa<-simMixSNP(nSNP=5,writeFile=TRUE,outfile="sim.txt",ncont=3) # res<-wrapdataL(fil="sim.txt") # Calculates and plotsaa<-simMixSNP(nSNP=5,writeFile=FALSE,outfile="sim.txt",ncont=3) #Simulates data # run with writeFile = TRUE for plot # aa<-simMixSNP(nSNP=5,writeFile=TRUE,outfile="sim.txt",ncont=3) # res<-wrapdataL(fil="sim.txt") # Calculates and plots