<< All Back-issues
<< This Issue's Table of Contents
ILAR Journal V38(2) 1997
The Role of Computational Models in Animal Research
| Bennett Dyke, Ph.D., and Michael C. Mahaney, Ph.D. are in the Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas. |
We again use the example of HDL-C in baboons to show the application of segregation analysis. In this analysis, MacCluer and others (1988) tested a series of models using PAP (Hasstedt 1989). Underlying the method is the assumption that the distribution of the trait in the population can be decomposed into the sum of 3 separate phenotypic distributions, each with its own mean (m1, m2, and m3). Associated with each of these phenotypic distributions is a putative genotype for a major locus (A1A1, A1A2, and A2A2). Likewise, associated with each genotype is a transmission parameter tau (t1, t2, and t3), which is the probability that allele A1 is passed from parent to offspring. The phenotypic distributions share a common variance and a common polygenic heritability, or the proportion of the variability due to polygenes for each distribution. Transmission models to be tested are constructed by defining various combinations of parameters, some of which may be set at fixed values, while others are estimated, that is, they are free to change during the operation of the program until they reach values that make the model fit most closely to the observed phenotypic distribution.
In this study, 9 models were tested, of which we show 5 that best illustrate the method. The fullest, or general transmission model used here incorporates 9 parameters. These are the means of the 3 distributions, their single common variance and polygenic heritability, the 3 transmission probabilities of the underlying putative genotypes, and the frequency of allele A1 (which is all that is needed to define the 3 genotype frequencies under assumptions of random mating). Because it incorporates the greater number of parameters, we expect this model to fit the observed phenotypic distribution better than models based on fewer parameters.
Although the general transmission model is the one against which all others are evaluated, it is usually easier in practice to begin estimating parameters in simpler models, and to work up to more complex models with more parameters. Following this practice, a sporadic model was constructed by constraining the 3 distribution means to be equal to each other (but otherwise free to vary), the transmission probabilities to 1.0, the polygenic heritability to 0, and the frequency of A1 to 1.0. This means that parameters of a single distribution are estimated without the influence of any genetic factors whatsoever, making all phenotypic variation a function of unmeasured environmental factors alone. In this study, the estimated values shown in Table 1 (mean 66.4, variance 323.28) recovered the observed phenotypic mean and variance. Also shown are the natural log likelihood (-2738.9) by which the model is scored, and the c2 values resulting from comparison with the general model.
A polygenic model was tested next. Parameters were set in the same way as in the sporadic model, except that in addition to the mean and variance of the single distribution, the polygenic heritability was estimated (that is, unfixed and allowed to vary). The log likelihood of this model (-2660.0) is greater than that for the sporadic model, and a chi square test (c2 = 157.8, 1df, P < .0001) indicates a significantly better fit to the observed distribution when genetics is taken into account. This analysis yielded the estimate of heritability (h2 = 0.48) reported above. This is only an approximate estimate of the total heritability, since it is based on the assumption that the genetic contribution to the trait is entirely polygenic, which may not be the case if evidence for a major gene emerges from subsequent stages of the segregation analysis.
A codominant mixed model, tested next, adds a codominant major gene to the polygenic model (a codominant gene locus is one in which the heterozygous genotype has an expression intermediate between the homozygotes). Heritability is estimated as in the polygenic case, but now phenotypic values are no longer constrained to a single distribution, and 3 separate means corresponding to genotypes A1A1, A1A2, and A2A2 are estimated, as is the frequency of allele A1. Transmission probabilities for this allele are set to the Mendelian ratios 1, 1/2 and 0. Respectively, these values represent the probabilities of transmitting the A1 allele to an offspring when the parental genotype is A1A1 (t1 = 1), A1A2 (t2 = 1/2) or A2A2 (t3 = 0). The heritability shown here (h2 = 0.18) is an estimate of the effect of polygenes on each of the three separate distributions, rather than on the phenotypic distribution as a whole. Comparison of log likelihoods indicates a significantly better fit of this model than the polygenic model alone (c2 = 81.2, 1df, P < .0001). Although this result is suggestive of the presence of a major gene, convincing evidence requires 2 more steps.
First, parameter estimates of the candidate model must be close to those of the general transmission model. The general model differs from the codominant mixed model in that it estimates all parameters, including transmission probabilities, which are no longer constrained to the Mendelian ratios. Comparison of the entries for the 2 models in Table 1 shows parameters to be in remarkably good agreement. A formal statistical requirement is that the fit of the candidate model must not be significantly worse than the general transmission model. That this is so can be seen from the c2 value of 2.99 (2df, P = .2238) for the codominant mixed model.
Second, to be certain that the better fit was not simply the result of changing the number of phenotypic distributions from 1 to 3, a so-called environmental mixed model is run. This model is parameterized like the codominant mixed model, except that transmission probabilities are estimated, although they are all constrained to be equal to the frequency of allele A1. This allows 3 phenotypic distributions, but assumes that they are the result of some major non-Mendelian ("environmental") factor rather than a major gene. As in the codominant mixed model, variation around the means of each of the distributions can be influenced by polygenes. The fit of the environmental model must be significantly worse than the general transmission model; the c2 value of 31.03 (1df, P < .0001) indicates that this is the case.
Thus, we have evidence for the existence of a major codominant gene locus (not dominant/recessive) plus polygenes contributing to HDL-C levels on a chow diet in baboons. Table 2 gives parameter estimates derived from the model. Genotype frequencies are Hardy-Weinberg proportions computed from the frequency of allele A1, and the number of animals is based on the 373 members of the study population. Although the calculations are not given here, from the information in Table 1 it is quite simple to estimate the relative contributions of the major gene, polygenes, and environment to the total phenotypic variance. In this case, the major gene accounts for 34.9% of the total variance in HDL-C, polygenes for 11.7%, and the remaining 45.6% is attributed to environmental factors that are random with respect to genotype.
A useful byproduct of this analysis is that genotypes of the major gene can be predicted on a probabilistic basis for each individual in the pedigree. In animal models, this makes it possible to choose individuals for purposes of experiment or breeding that are likely to be carrying genes that influence the trait in question, even though the function of the actual gene has not been identified. For example, in comparative physiologic and molecular studies of high and low HDL-C, choosing matched samples of animals from the 15 presumptive A1A1 high HDL-C, and the 239 A2A2 low HDL-C distributions increases the probability that metabolic differences are endogenous, rather than due to exogenous environmental causes.
This example is a deliberately simplified one. As pointed out above, for purposes of illustration we have abbreviated our description of the full sequence of models actually tested in the search for an HDL-C major gene. Furthermore, the models developed in this study were relatively simple compared to many other segregation analyses, which may include more than one trait or major locus, interaction effects, and correlates such as age and sex. The ability to model such biological complexity makes the method a powerful and flexible tool for analysis of any biological or behavioral trait that can be expressed as a quantitative measure. Segregation analysis can detect the presence of a major gene, but cannot identify its function or chromosomal location in the genome. Nonetheless, the method has important applications in noninbred species for which a detailed gene map has not yet been developed.
Linkage Analysis
Quantitative trait linkage analysis is the next logical step in characterizing genetic influences on continuously distributed traits. Classical genetic linkage analysis uses statistical methods to determine the likelihood that 2 or more genetic loci exhibiting simple Mendelian inheritance are located on the same chromosome. These may be either genes with known function, or "anonymous" markers whose function is unknown. Quantitative trait linkage analysis extends these methods, making it possible to determine linkage relationships between discrete loci and a quantitative trait locus (or QTL).
There are 2 general approaches to quantitative trait linkage analysis: penetrance model-based and nonpenetrance model-based methods. Penetrance model-based linkage analysis is an extension of segregation analysis, and usually begins with detection of a major gene in the manner described above. The best estimates of the segregation model parameters (such as genotypic means and variances, polygenic heritability, major gene frequency, and covariate effects) are then used as starting values for a new combined segregation and linkage analysis that includes parameters such as the marker gene frequency and the recombination frequency (a measure of the relative distance between the marker locus and the major locus). In the simplest test for evidence of linkage, a model in which the recombination fraction is constrained to 0.50 (indicative of no linkage) is compared to that of a model in which the recombination is estimated. A significant difference between the likelihoods of these models is interpreted as evidence for linkage between the 2 loci.
A combined segregation and linkage analysis was used in a recent study of the sources of variation in low density lipoprotein cholesterol (LDL-C) in 1183 pedigreed baboons at SFBR (C.M. Kammerer, personal communication). Segregation analysis found evidence for a major gene for this phenotype (Konigsberg and others 1991), and the combined analysis tested for linkage between the major locus and a DNA marker in the LDL receptor locus. A likelihood ratio test comparing a model with tight linkage against one with no linkage (recombination fraction equal to 0.50) rejected the no linkage model at p = 0.0009, equivalent to a log10 odds ratio (lod score) of 2.4, which is highly suggestive of linkage in this chromosomal region.
Nonpenetrance model-based linkage analysis methods have been developed to test for linkage between marker loci and QTLs in the absence of prior statistical evidence for a major gene. These approaches are often referred to as "allele-sharing'' methods because they rely on the probability that any 2 individuals will share alleles that come from a common ancestor (that is, alleles that are identical by descent or 1BD), which is a function of their kinship. For example, full sibling pairs have a higher probability of sharing alleles that are IBD than do half siblings. The best known application of allele-sharing methods to quantitative trait linkage analysis is the sib-pair test (Haseman and Elston 1972). This test estimates the correlation of the squared difference in trait values between pairs of sibs, and the probability that the pairs are IBD at a marker locus. A correlation significantly less than zero implies linkage of the marker and a QTL influencing the trait.
The sib-pair method has been popular in studies of human linkage because it requires collection of data only from pairs of sibs and their parents, rather than from extended pedigrees. The power of the method depends on the availability of large numbers of sib pairs, however, and the savings in data collection may be offset by loss of statistical power to detect linkage. When extended pedigrees are readily available for study (as is the case in many laboratory animal colonies), limiting an allele-sharing analysis to sibs alone wastes the allele-sharing information inherent in all other classes of relatives. A more promising allele-sharing approach to quantitative trait linkage analysis is an extension of the variance component methods used in quantitative genetic analysis. Unlike combined linkage and segregation analysis, these methods (Goldgar 1990; Amos 1994; Blangero and Almasy 1996) do not estimate the parameters of a major gene (allele frequencies and transmission probabilities), but they do require calculation of IBD probabilities for relationship pairs beyond the nuclear family (such as grandparent-grandchild; uncle-nephew; and 1st, 2nd, 3rd cousins). Recently developed algorithms (Duggirala and others 1996) have reduced the computational burden of applying this approach to complex pedigrees that typify animal colonies. At SFBR we have begun to map and screen the baboon genome in search of genes influencing bone mineral density and other phenotypes related to osteoporosis using variance component linkage methods.
Data requirements for quantitative trait linkage analysis are greater than those of segregation or quantitative genetic analysis, since it requires knowledge of marker loci that are mapped to specific chromosomes, as well as measurements on phenotypes of interest. As of October 1996, slightly more than one sixth of the approximately 100,000 loci in the human genome had been mapped, and it is anticipated that a complete map will be done by the year 2005 (Schuler and others 1996). Clearly, gene maps of this resolution are not likely to be available for most animal models of human disease in the near future. Nonetheless, candidate gene libraries have already been established for at least 30 nonhuman animal species, and concerted gene mapping efforts are under way for a variety of species (including mice, pigs, rats, chickens, baboons and laboratory opossums). Moreover, chromosomal homologies and the fact that many marker typing reagents developed for humans work well with closely related species make it possible to map interesting genes to small chromosomal regions without the need for full-scale gene maps, as we have been doing with bone density genes in baboons at SFBR.
STATISTICAL GENETIC MODELING AND THE USE OF ANIMALS
Managing the production of animals used in genetic research may require more attention to colony breeding and demographic structure than is common for animals used in other kinds of biomedical research. Relatively large samples of animals (N³200) with both parents known are often required for analysis, with phenotypic measurements made on as many of these individuals possible. Nuclear families should be structured so that relatively large full sibships (offspring sharing both parents) are available (4-6 individuals per sibship). Large half-sibships (offspring sharing one parent) are acceptable, but to simplify analyses, an animal that has bred with more than one mate should not be mated to an animal that also has bred with more than one mate. Since for many species there are usually more breeding age females than males in a colony, in practice it may be simpler to avoid producing maternal half-sibships. Pedigrees of 3 to 4 generations in depth are desirable, although extremely deep pedigrees in most colonies tend to become overly complex and may require simplification by ignoring links between families prior to analysis.
Other aspects of breeding structure may depend on the kind of analysis being done. Segregation analysis requires a normal range of variability in the population. This means that animals should not be bred selectively for affected or extreme phenotypes, particularly if these are the phenotypes to be analyzed. For segregation analysis it is best that inbreeding be avoided, although mating between distant relatives can be handled by breaking pedigree links. On the other hand, the power of linkage analysis can be improved by the appropriate use of inbreeding, as well as by selecting for extreme phenotypes.
On balance, a breeding plan that minimizes inbreeding and avoids selection for extreme phenotypes is probably the safest strategy for the moment. This is because segregation analysis is likely to be an important analytic method of genetic analysis in the absence of detailed gene maps, and because such a strategy increases the probability that a given breeding colony will be useful for the analysis of a broad variety of phenotypes. The situation may change in the future, however, as gene maps for experimental animals become more extensive, and linkage analysis becomes the method of choice for identifying genes underlying disease-related phenotypes.
The way in which pedigreed animals are used as research subjects also depends to some extent on the life history of the species in question. With rapid and prolific breeders having large litters and short generation times (such as rodents) it may be possible to replace entire pedigrees quickly enough that sacrifice of key family members is not a serious concern. In contrast, with species (such as nonhuman primates) that reproduce relatively slowly, key family members often must be held for breeding purposes, making it difficult to select subjects for invasive or life-threatening experiments, or for long periods of experimental isolation, which can seriously disrupt a breeding program. In the latter case especially, a premium is usually placed on maintaining animals in the pedigree throughout their lives in a healthy condition with minimal experimental intervention. Phenotypes are preferably derived from venipuncture or other innocuous tissue sampling, morphometrics, radiation and sonic imaging, and other such methods. It is important to note, however, that virtually all breeding colonies can produce more offspring than are required to replace those lost through natural mortality. Pedigreed colonies frequently serve as an excellent source of production for animals used in non-genetic research, with a core of infants carefully selected on the basis of family relationships and reserved for breeding and genetic research, and the remainder (often a substantial majority) supplied to the biomedical community at large.
REFERENCES
Amos CI. 1994. Robust variance components approach for assessing genetic linkage in pedigrees. Am J Hum Genet 54:535-543.
Blangero J, Almasy LA. 1996. Sequential Oligogenic Linkage Analysis Routines. Population Genetics Laboratory Technical Report No. 6, Southwest Foundation for Biomedical Research, San Antonio, TX 78228.
Duggirala R, Stem MP, Mitchell BD, Reinhart LJ, Shipman PA, Uresandi OC, Leibel RL, Hales CN, O'Connell P, Blangero J 1996 Quantitative variation in obesity-related traits and insulin precursors linked to the ob gene region on human chromosome 7. Am J Hum Genet 59:694-703.
Elston RC, Bailey-Wilson JE, Bonney GE, Keats BJ, Wilson AF. 1986. S.A.G.E.--A package of computer programs to perform Statistical Analysis for Genetic Epidemiology. Berlin: 7th Congr Hum Genet.
Goldgar DE. 1990. Multipoint analysis of human quantitative genetic variation. Am J Hum Genet 47:957-967.
Haseman JK, Elston RC. 1972. The estimation of linkage between a quantitative trait and a marker. Behav Genet 2:3-19.
Hasstedt SJ. 1989. Pedigree Analysis Package. Rev 3.0 Department of Human Genetics, University of Utah Medical Center.
Konigsberg LW, Blangero J, Kammerer CM, Mott GE. 1991. Mixed model segregation analysis of LDL-C concentration with genotype-covariate interaction. Genet Epidemiol 8:69-80.
Lange K, Weeks DE, Boehnke ML. 1988. Programs for pedigree analysis: MENDEL, FISHER and dGENE. Genet Epidemiol 5:471-472.
MacCluer JW. 1993. Applications of pedigree analysis to animal models for complex diseases. In: Sing CF, Hanis C, editors. Genetics of Individual, Family and Population Variability. New York: Oxford University Press. p 122-139.
MacCluer JW, Kammerer CM, Blangero J, Dyke B, Mort GE, VandeBerg JL, McGill HC Jr. 1988. Pedigree analysis of HDL cholesterol concentration in baboons on two diets. Am J Hum Genet 43:401-413
Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K, White RE, Rodriguez-Tome P, Aggarwal A, Bajorek E and others. 1996. A gene map of the human genome. Science 274:540-546.
ACKNOWLEDGMENTS
Supported by NIH Grants HL28972, RR09950 (BD) and HL54141 (MCM). Comments by Drs. J. Blangero, S. Williams-Blangero, J. MacCluer, and B. Mitchell of the Southwest Foundation are gratefully acknowledged.
TABLE 1 Segregation analysis parameters for 5 transmission models
| m1 | m2 | m3 | s2 | h2 | f(A1) | t1 | t2 | t3 | InL | c2 | df | |
| Sporadic | 66.4 | 323.28 | [0] | [0] | [0] | [0] | [0] | -2738.9 | 243.99 | 6 | ||
| Polygenic | 64.0 | 277.22 | .48 | [0] | [0] | [0] | [0] | -2660.0 | 86.16 | 5 | ||
| Codominant mixed | 59.3 | 67.9 | 104.5 | 159.52 | .18 | .80 | [1] | [.5] | [0] | -2619.4 | 2.99 | 2 |
| General | 59.3 | 68.1 | 105.4 | 156.00 | .20 | .79 | 1.0 | .53 | .08 | -2617.9 | 0 | |
| Environmental | 59.3 | 74.7 | 109.5 | 136.89 | .84 | .81 | .81 | -2633.4 | 31.03 | 1 |
TABLE 2 Parameter estimates from the codominant mixed model
| Genotype | A1 A1 | A1A2 | A2 A2 |
| Frequency | .64 | .32 | .04 |
| Number of animals | 239 | 119 | 15 |
| Mean HDL-C levels | 59.3 | 67.9 | 104.5 |
| Variance = 159.52 h2 =0.18 |
Copyright © 2008. National Academy of Sciences.
All rights reserved.
500 Fifth St. N.W., Washington, D.C. 20001.
Terms of Use and Privacy Statement