Title: | Design of QTL (Quantitative Trait Locus) Experiments |
---|---|
Description: | Design of QTL (quantitative trait locus) experiments involves choosing which strains to cross, the type of cross, genotyping strategies, phenotyping strategies, and the number of progeny to raise and phenotype. This package provides tools to help make such choices. Sen and others (2007) <doi:10.1007/s00335-006-0090-y>. |
Authors: | Saunak Sen [aut, cre], Jaya Satagopan [ctb], Karl Broman [ctb], Gary Churchill [ctb], Brian Yandell [ctb] |
Maintainer: | Saunak Sen <[email protected]> |
License: | GPL-3 |
Version: | 0.953 |
Built: | 2025-03-13 03:15:09 UTC |
Source: | https://github.com/cran/qtlDesign |
Provides expected confidence interval widths for QTL location when we have dense markers.
ci.length(cross,n,effect,p=0.95,sigma2=1,env.var,gen.var,bio.reps=1)
ci.length(cross,n,effect,p=0.95,sigma2=1,env.var,gen.var,bio.reps=1)
cross |
String indicating cross type which is "bc", for backcross, "f2" for intercross, and "ri" for recombinant inbred lines. |
n |
Sample size |
p |
Confidence level for desired confidence interval |
effect |
The QTL effect we want to detect. For
|
sigma2 |
Error variance; if this argument is absent,
|
env.var |
Environmental (within genotype) variance |
gen.var |
Genetic (between genotype) variance due to all loci segregating between the parental lines. |
bio.reps |
Number of biological replicates per unique genotype. This is usually 1 for backcross and intercross, but may be larger for RI lines. |
With dense markers, the log likelihood follows a compound process. Approximate expected confidence intervals can be calculated by pretending the log likelihood decays linearly with a drift rate that depends on the effect size and cross type.
Returns the expected confidence interval width (scalar) in cM assuming dense markers.
Saunak Sen
Dupuis J and Siegmund D (1999) Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151:373-386.
Darvasi A (1998) Experimental strategies for the genetic dissection of complex traits in animal models. Nature Genetics 18:19-24.
Kong A and Wright FA (1994) Asymptotic theory for gene mapping. Proceedings of the National Academy of Sciences of the USA 91:9705-9709.
ci.length(cross="bc",n=400,effect=5,p=0.95,sigma2=1)
ci.length(cross="bc",n=400,effect=5,p=0.95,sigma2=1)
Functions to calculate the information under the null hypothesis of no effect. Functions for discount factors for incomplete genotyping.
info(sel.frac,theta=0,cross) info.bc(sel.frac,theta=0) info.f2(sel.frac,theta=0) deflate(theta,cross) deflate.bc(theta) deflate.f2(theta) nullinfo(sel.frac)
info(sel.frac,theta=0,cross) info.bc(sel.frac,theta=0) info.f2(sel.frac,theta=0) deflate(theta,cross) deflate.bc(theta) deflate.f2(theta) nullinfo(sel.frac)
cross |
Cross type, either "bc" for backcross, or "f2" for intercross. |
sel.frac |
Selection fraction; proportion of extremes genotyped |
theta |
Recombination fraction between flanking markers |
The nullinfo
function calculates the information
content per observation for any contrast between genotype means when
densely genotyping an sel.frac
fraction of
the extreme phenotypic individuals. The information content is
calculated under the null hypothesis of no difference between the
genotype means. For small differences in genotype means, the
information content will be approximately equal to the null, but in
general, the information estimate under the null is the lower bound.
The info
function calculates the information per observation
for backcross, and F2 intercross under the null hypothesis of equal
gentoype means. The information is calculated for a point in the
middle of an interval spanned by markers separated by a recombination
fraction theta
. The function deflate
calculates a
deflation factor for the information attenuation in the middle of a
marker interval relative to a completely typed location.
Information per individual for information functions, and the discount factor for the discount functions.
Information is calculated under the equal means assumption. This approximation is very good in practice, and is slightly conservative. If the difference between the means is large, these functions will underestimate the information. For power calculations, that is okay.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
nullinfo(0.5) info(0.5,cross="bc") info(0.5,cross="f2") info(0.5,0.1,cross="bc") info(0.5,0.1,cross="f2") deflate(0.1,"bc") deflate(0.1,"f2")
nullinfo(0.5) info(0.5,cross="bc") info(0.5,cross="f2") info(0.5,0.1,cross="bc") info(0.5,0.1,cross="f2") deflate(0.1,"bc") deflate(0.1,"f2")
Functions to calculate information cost-ratios.
info2cost(sel.frac,cost,d,G=NULL,cross) info2cost.bc(sel.frac,cost,d,G=NULL) info2cost.f2(sel.frac,cost,d,G=NULL)
info2cost(sel.frac,cost,d,G=NULL,cross) info2cost.bc(sel.frac,cost,d,G=NULL) info2cost.f2(sel.frac,cost,d,G=NULL)
sel.frac |
Selection fraction; proportion of individuals genotyped |
cost |
Genotyping cost in units of raising an individual. When
|
d |
Marker spacing in centiMorgans |
G |
Genome size in Morgans |
cross |
Cross type, "bc"or "f2" |
The information calculations are done under the null hypothesis of no QTL effect.
For d!=0
it calculates the ratio of information in the
middle of a marker interval of length d
cM to the cost of
genotyping the cross. For d=0
, it calculates the ratio of
information at any locus to the cost of genotyping the cross.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
info2cost(0.5,1,cross="bc") info2cost(0.5,1,10,1450,cross="bc")
info2cost(0.5,1,cross="bc") info2cost(0.5,1,10,1450,cross="bc")
Calculate the MMA K1, K12, and the standardized dissimilarity score (eff1).
Kstat(genomat, type = 1) K1(genomat) K12(genomat) eff1(n, nmark, s1)
Kstat(genomat, type = 1) K1(genomat) K12(genomat) eff1(n, nmark, s1)
genomat |
Genotype matrix. |
n |
Desired sample size. |
type |
Type of dissimilarity measure desired (first or second moment). |
nmark |
Number of markers. |
s1 |
Dissimilarity score from |
Score or standardized score based on selected marker list.
K1
and K12
call Kstat
with type
= 1 and 2,
respectively. Kstat computes the minimum moment abberation
score. eff1
computes the standardized genetic dissimilarity.
Brian S. Yandell (mailto:[email protected])
Jin C, Lan H, Attie AD, Churchill GA, Bulutuglo D, Yandell BS (2004) Selective phenotyping for increased efficiency in genetic mapping studies. Genetics 168: 2285-2293.
Selective phenotyping with similarity measure 2 to select the most dissimilar subset of individuals.
mma(genof, p, sequent = FALSE, exact = FALSE, dismat = FALSE)
mma(genof, p, sequent = FALSE, exact = FALSE, dismat = FALSE)
genof |
Genotype matrix. |
p |
Sample size to select. |
sequent |
Perform sequential optimization if TRUE (see below). |
exact |
Count allele differences if |
dismat |
Return dissimilarity matrix if TRUE. |
Sequentially minimize 1st moment and then 2nd moment, swapping one
subject at a time.
op
finds all the samples with same 1st moment similarity with mma
results. op2
finds all the samples with the same 1st moment
similarity with every list from op result. A combination of op
and op2
comes very close to exhaustive search in
practice. moment2
find the best list with minimum 2nd moments
from the output of op2
. Note that some warnings occurs
accompanying our return statement. The results are not affected though.
This function combines several functions in Jin's original code.
mma(genof,p,sequent=TRUE
is identical to the depricated
mmasequent(genof,p
.
mma(genof,p,exact=TRUE
is identical to the depricated
mmaM1(genof,p
(actually, mma
uses dissimilarity while
mmaM1
used similarity = 1 - dissimilarity).
A list containing cList
, dismat
if that option is
TRUE
and further optimized lists (op
, op2
,
moment2
) if sequent
is TRUE
.
vector as the first item. The list of items includes:
cList |
vector of selected subjects by function mma |
op |
list containing vector of selection and update flag from function op |
op2 |
matrix of selection by function op2 |
moment2 |
vector of second moment calculations |
dismat |
dissimilarity matrix |
Brian S. Yandell (mailto:[email protected])
Jin C, Lan H, Attie AD, Churchill GA, Bulutuglo D, Yandell BS (2004) Selective phenotyping for increased efficiency in genetic mapping studies. Genetics 168: 2285-2293.
This routine is for internal use. It sets 3 levels to 0,1,2.
mma.level(mat)
mma.level(mat)
mat |
input matrix |
Converts matrix to levels between 0 and 2.
Matrix of genotype levels between 0 and 2.
Brian S. Yandell (mailto:[email protected])
Jin C, Lan H, Attie AD, Churchill GA, Bulutuglo D, Yandell BS (2004) Selective phenotyping for increased efficiency in genetic mapping studies. Genetics 168: 2285-2293.
Functions to find optimal marker spacing given cost.
optspacing(cost,G=NULL,sel.frac,cross) optspacing.bc(cost,G=NULL,sel.frac) optspacing.f2(cost,G=NULL,sel.frac) optspacing(cost,G=NULL,sel.frac=NULL,cross) optspacing.bc(cost,G=NULL,sel.frac=NULL) optspacing.f2(cost,G=NULL,sel.frac=NULL)
optspacing(cost,G=NULL,sel.frac,cross) optspacing.bc(cost,G=NULL,sel.frac) optspacing.f2(cost,G=NULL,sel.frac) optspacing(cost,G=NULL,sel.frac=NULL,cross) optspacing.bc(cost,G=NULL,sel.frac=NULL) optspacing.f2(cost,G=NULL,sel.frac=NULL)
cost |
Cost of genotyping in units of raising an individual |
sel.frac |
Selection fraction; proportion of individuals genotyped |
G |
Genome size in centiMorgans |
cross |
Cross type, "bc" or "f2" |
The function optim
is used to search for the optima.
In the first form, with the selection fraction specified, the
spacing in centiMorgans that maximizes the information to cost ratio
in the middle of the marker interval. In the second form, with the
selection fraction unspecified, it returns the value of
(spacing
,sel.frac
) which maximizes the information
to cost ratio in the middle of the marker interval.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
optspacing(cost=0.1,G=1440,sel.frac=0.5,cross="bc") optspacing(cost=30/3000,G=1440,sel.frac=NULL,cross="f2")
optspacing(cost=0.1,G=1440,sel.frac=0.5,cross="bc") optspacing(cost=30/3000,G=1440,sel.frac=NULL,cross="f2")
Functions to find optimal selection fractions given cost.
optselection(cost,d=0,G=NULL,cross) optselection.bc(cost,d=0,G=NULL) optselection.f2(cost,d=0,G=NULL)
optselection(cost,d=0,G=NULL,cross) optselection.bc(cost,d=0,G=NULL) optselection.f2(cost,d=0,G=NULL)
cost |
Cost per genotype in units of raising individual |
d |
Marker spacing in Morgans |
G |
Genome size in Morgans |
cross |
Cross type, "bc" or "f2" |
The function optimize
is used to search for the optima.
The optimal selection fraction.
Saunak Sen
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
optselection(1,cross="bc") optselection(0.001,10,1450,cross="bc") optselection(0.001,10,1450,cross="f2")
optselection(1,cross="bc") optselection(0.001,10,1450,cross="bc") optselection(0.001,10,1450,cross="f2")
Returns the version number for the qtlDesign package.
version.qtlDesign()
version.qtlDesign()
The version number.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Power, sample size, and minimum detectable effect size calculations are performed for backcross, F2 intercross, and recombinant inbred (RI) lines.
powercalc(cross,n,effect,sigma2,env.var,gen.var,thresh=3,sel.frac=1, theta=0,bio.reps=1) detectable(cross,n,effect=NULL,sigma2,env.var,gen.var,power=0.8,thresh=3, sel.frac=1,theta=0,bio.reps=1) samplesize(cross,effect,sigma2,env.var,gen.var,power=0.8,thresh=3, sel.frac=1,theta=0,bio.reps=1)
powercalc(cross,n,effect,sigma2,env.var,gen.var,thresh=3,sel.frac=1, theta=0,bio.reps=1) detectable(cross,n,effect=NULL,sigma2,env.var,gen.var,power=0.8,thresh=3, sel.frac=1,theta=0,bio.reps=1) samplesize(cross,effect,sigma2,env.var,gen.var,power=0.8,thresh=3, sel.frac=1,theta=0,bio.reps=1)
cross |
String indicating cross type which is "bc", for backcross, "f2" for intercross, and "ri" for recombinant inbred lines. |
n |
Sample size |
sigma2 |
Error variance; if this argument is absent,
|
env.var |
Environmental (within genotype) variance |
gen.var |
Genetic (between genotype) variance due to all loci segregating between the parental lines. |
effect |
The QTL effect we want to detect. For
|
power |
Proportion indicating power desired |
thresh |
LOD threshold for declaring significance |
sel.frac |
Selection fraction |
theta |
Recombination fraction corresponding to a marker interval |
bio.reps |
Number of biological replicates per unique genotype. This is usually 1 for backcross and intercross, but may be larger for RI lines. |
These calculations are done assuming that the asymptotic chi-square
regimes apply. A warning message is printed if the effective sample size
is less than 30 and either sel.frac
is less than 1 or theta
is greater than 0. First we calculate the effective sample size using the
width of the marker interval and the selection fraction. The QTL is
assumed to be in the middle of the marker interval. Then we use the fact
that the non-centrality parameter of the likelihood ration test is
, where
is the effctive sample size and
is the QTL effect measured as the deviation of the genotype
means from the overall mean. The chi-squared approximation is used to
calculate the power. The minimum detectable effect size is obtained by
solving the power equation numerically using
uniroot
. The theory
behind the information calculations is described by Sen et. al. (2005).
A key input is the error variance, sigma2
which is generally
unknown. The user can enter the error variance directly, or estimate it
using env.var
and gen.var
. The function error.var
is used to the error variance using estimates of the environmental variance
and genetic variance. Another key input is the effect segregating in
a cross, which can be calculated using gmeans2model
.
For powercalc
the power is returned, along with the
proportion of variance explained. For detectable
the effect size
detectable is returned, along with the proportion of variance explained.
For backcross and RI lines this is the effect of an allelic
substitution. For F2 intercross the additive and dominance components
are returned. For samplesize
the sample size (rounded up to the
nearest integer) is returned along with the proportion of variance
explained.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
uniroot
. error.var
,
gmeans2effect
.
powercalc("bc",100,5,sigma2=1,sel.frac=1,theta=0) powercalc(cross="ri",n=30,effect=5,env.var=64,gen.var=25,bio.rep=6) detectable("bc",100,sigma2=1) detectable(cross="ri",n=30,env.var=64,gen.var=25,bio.rep=8) samplesize(cross="f2",effect=c(5,0),env.var=64,gen.var=25)
powercalc("bc",100,5,sigma2=1,sel.frac=1,theta=0) powercalc(cross="ri",n=30,effect=5,env.var=64,gen.var=25,bio.rep=6) detectable("bc",100,sigma2=1) detectable(cross="ri",n=30,env.var=64,gen.var=25,bio.rep=8) samplesize(cross="f2",effect=c(5,0),env.var=64,gen.var=25)
Provides genome-wide thresholds and tail probabilities for the maxima of genome scans using Poisson approximations.
tailprob(t,G,cross,type="1",d=0.01,cov.dim=0) thresh(G,cross,type="1",p=c(0.10,0.05,0.01),d=0.01,cov.dim=0, interval=c(1,40))
tailprob(t,G,cross,type="1",d=0.01,cov.dim=0) thresh(G,cross,type="1",p=c(0.10,0.05,0.01),d=0.01,cov.dim=0, interval=c(1,40))
G |
Genome size in centiMorgans. |
t |
LOD value for which tail probability is desired. |
p |
Vector giving the genome-wide Type I error for which thresholds are desired. |
cross |
String indicating cross type which is "bc", for backcross, "f2" for intercross. |
type |
Type of LOD score for which threshold is desired. Right now the only option is "1", but more options will be added in the future. |
d |
Marker spacing in centiMorgans. |
cov.dim |
Dimension of interacting covariate. Set to 0 right now. |
interval |
Interval over which to search for LOD threshold. |
The tail probabilities are calculated using the method of
Dupuis and Siegmund (1999). The thresholds are calculated by solving
the tail probability equation using uniroot
. At this time only
one-dimensional thresholds are calculated, but this function will be
extended in the future.
The function tailprob
returns the probability that the
genome-wide maximum LOD score exceeds a particular value. The
function thresh
returns genome-wide LOD thresholds
corresponding to a particular Type I error rate.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Dupuis J and Siegmund D (1999) Statistical methods for mapping quantitative trait loci froma dense set of markers. Genetics 151:373-386.
tailprob(t=3,G=1440,cross="f2",d=10) thresh(G=1440,cross="bc",d=10)
tailprob(t=3,G=1440,cross="f2",d=10) thresh(G=1440,cross="bc",d=10)
Utility functions
recomb(d) genetic.dist(theta)
recomb(d) genetic.dist(theta)
d |
Genetic distance in Morgans |
theta |
Recombination fraction |
recomb
returns the recombination fraction
corresponding to a genetic distance in Morgans. genetic.dist
returns the genetic distance in Morgans for a recombination fraction.
We assume Haldane mapping function for the genetic distance.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
recomb(0.1) genetic.dist(0.1)
recomb(0.1) genetic.dist(0.1)
The function error.var
estimates the error variance using
estimates of the environmental variance and genetic variance. The effect
segregating at a locus, can be calculated using gmeans2effect
These are key inputs for power calculations. The function
prop.var
calculates the proportion of variance explained by a
locus given the effect size and error variance.
error.var(cross,env.var=1,gen.var=0,bio.reps=1) gmeans2effect(cross,means) prop.var(cross,effect,sigma2)
error.var(cross,env.var=1,gen.var=0,bio.reps=1) gmeans2effect(cross,means) prop.var(cross,effect,sigma2)
cross |
String indicating cross type which is "bc", for backcross, "f2" for intercross, and "ri" for recombinant inbred lines. |
env.var |
Environmental (within genotype) variance |
gen.var |
Genetic (between genotype) variance due to all loci segregating between the parental lines. |
bio.reps |
Number of biological replicates per unique genotype. This is usually 1 for backcross and intercross, but may be larger for RI lines. |
means |
Vector of genotype means in the form |
effect |
The QTL effect which depends on the cross. For
backcross, it is the difference in means the heterozygote and
homozygote. For RI lines it is half the difference in means of the
homozygotes, for intercross, it is a two component vector of the form
|
sigma2 |
Error variance. |
The function error.var
estimates the error variance
segregating in a cross using estimates of the environmental (within
genotype) variance, and the genetic (between genotype variance). The
environmental variance is assumed to be invariant between cross types.
The genetic variance segregating in RI lines is assumed to be double
that in F2 intercross, and four times that of the backcross. This
assumption holds if all loci are additive. The error variance at a
locus of interest is aproximately
where is the genetic variance
(
gen.var
), is a
constant depending on the cross type (1, for RI lines, 1/2 for F2
intercross, and 1/4 for backross),
is the
environmental
variance (
env.var
), and is the number of biological
replicates per unique genotype (
bio.reps
).
The function gmeans2effect
calculates the QTL effects from
genotype means depending on the cross.
The function prop.var
calculates the proportion of variance
attributable to a locus given the effects size(s) and the error
variance. The definition of effect size is in the output of
gmeans2effect
(see below).
For error.var
the value is the estimated error variance
based on the assumptions mentioned above. For gmeans2effect
the value depends on the type of cross. For RI lines it is simply the
additive effect of the QTL which is half the difference between the
homozygote means. For intercross, it is a vector giving the additive and
dominance components. The additive component is half the difference
between the homozygote means, and the dominance component is the
difference between the heterozygotes and the average of the
homozygotes. For the backcross, it is a vector of length 2,
c(a-h,h-b)
, which is the effect of an allelic substitution of
an "A" allele in the backcrosses to each parental strain.
Saunak Sen, Jaya Satagopan, Karl Broman, and Gary Churchill
Sen S, Satagopan JM, Churchill GA (2005) Quantitative trait locus study design from an information perspective. Genetics, 170:447-64.
error.var(cross="bc",env.var=1,gen.var=1,bio.reps=1) error.var(cross="f2",env.var=1,gen.var=1,bio.reps=1) error.var(cross="ri",env.var=1,gen.var=1,bio.reps=1) error.var(cross="ri",env.var=1,gen.var=1,bio.reps=10) gmeans2effect(cross="f2",means=c(0,1,2)) gmeans2effect(cross="f2",means=c(0,1,1)) gmeans2effect(cross="bc",means=c(0,1,1)) gmeans2effect(cross="ri",means=c(0,1,1)) prop.var(cross="bc",effect=5,sigma2=1) prop.var(cross="f2",effect=c(5,0),sigma2=1) prop.var(cross="ri",effect=5,sigma2=1)
error.var(cross="bc",env.var=1,gen.var=1,bio.reps=1) error.var(cross="f2",env.var=1,gen.var=1,bio.reps=1) error.var(cross="ri",env.var=1,gen.var=1,bio.reps=1) error.var(cross="ri",env.var=1,gen.var=1,bio.reps=10) gmeans2effect(cross="f2",means=c(0,1,2)) gmeans2effect(cross="f2",means=c(0,1,1)) gmeans2effect(cross="bc",means=c(0,1,1)) gmeans2effect(cross="ri",means=c(0,1,1)) prop.var(cross="bc",effect=5,sigma2=1) prop.var(cross="f2",effect=c(5,0),sigma2=1) prop.var(cross="ri",effect=5,sigma2=1)