Jurg Ott 30 June 1994 Tel. (212) 960-2507 Columbia University, Unit 58 FAX: (212) 568-2750 722 West 168th Street New York, NY 10032 E-mail: jurg.ott@columbia.edu Documentation to homogeneity programs Copyright (C) Jurg Ott 1990 Contents ----------------------------------------------------------------- INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 1 CONSTRUCTION OF SUPPORT INTERVALS. . . . . . . . . . . . . . . 4 HOMOG program. . . . . . . . . . . . . . . . . . . . . . . . . 6 HOMOG1a program. . . . . . . . . . . . . . . . . . . . . . . . 7 HOMOG1b program. . . . . . . . . . . . . . . . . . . . . . . . 9 HOMOG2 program . . . . . . . . . . . . . . . . . . . . . . . 11 HOMOG3 and HOMOG4 programs . . . . . . . . . . . . . . . . . 12 HOMOG3R program. . . . . . . . . . . . . . . . . . . . . . . 13 POINT4 program . . . . . . . . . . . . . . . . . . . . . . . 17 MTEST program. . . . . . . . . . . . . . . . . . . . . . . . 17 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . 18 ----------------------------------------------------------------- INTRODUCTION This documentation and the programs described herein are copyrighted. This means that the programs may be copied freely for nonprofit scientific use but they must not be used for commercial purposes unless a specific license for commercial use is obtained from the author. Also, anyone modifying these programs must display a note indicating this fact, and the note must appear both in the source code and in the output file produced by the programs. Two test situations may be distinguished, a mixture of families in which families cannot unequivocally be assigned to one or the type or group (HOMOG programs), or known groups of families (fixed classifications, MTEST program, see separate section below), for example, as distinguished by their origin. The HOMOG programs are written as close to standard Pascal as possible. They should with little modification be compilable with almost any Pascal compiler. For DOS, they were compiled with Turbo Pascal 7.0 and will sense a coprocessor if one is in- stalled, and will emulate it if none is present. In case of problems with the coprocessor, the sensing mechanism may be turned off by telling Turbo Pascal whether it should use the coprocessor or not. This is achieved by issuing the DOS command SET 87=YES or SET 87=NO, respectively. In Turbo Pascal, SEEKEOLN is a standard function; for possible use in other Pascals, the source code of a function with the same effect as SEEKEOLN is included. For OS/2, the programs are compiled with NDP Pascal from Microway Inc. For easy of recompiling the programs, two command files, SETNDP.CMD and COMPILE.CMD, are included in the OS/2 package. NDP Pascal is close to Pascal on Unix machines. This documentation describes different forms of the homoge- neity (admixture) test. All HOMOG programs analyze heterogeneity (two or more disease loci) with respect to either a single marker locus or to a known map of markers. In the first case, the programs expect lod scores between disease phenotype and the marker, and in the second case, they expect multipoint lod scores for disease versus the known map of markers. Such multipoint lod scores can be obtained from the LINKMAP program by letting the disease locus walk in steps along the map. The multipoint lod scores for each family will have to calculated from the LINKMAP output or may be obtained from LINKMAP output using the LINKLODS program of the LINKAGE package. HOMOG carries out a homogeneity test (A-test) under the following alternative hypothesis: two family types, one with linkage between a trait (or any gene locus for that matter) to a marker of map of markers, the other without linkage. For more information see Ott (1991). HOMOG1 is an extension of the homogeneity test, with the following alternative hypothesis: two family types, one linked the other unlinked, plus a possible sex difference in the recom- bination fraction. This program comes in two versions, depending on whether the lods for the two sexes are independent or not: HOMOG1a reads independent lods, HOMOG1b reads dependent lods. For the same problem, with independent lods, HOMOG1a is more efficient in terms of memory space. HOMOG2 is also an extension of the homogeneity test, with the alternative hypothesis of two family types, both with linkage but to two different markers on the same chromosome. The recom- bination fraction (or map distance) between trait and marker 1 is theta1, that between trait and marker 2 is theta2, where theta1 < theta2 < 0.5 (male and female recombination fractions the same). HOMOG3 and HOMOG4 are analogous to HOMOG2 but specify 3 or 4 family types (marker loci). They only calculate the max. log likelihood and the ML estimates. HOMOG3R is a specialized version of the HOMOG3 program. It calculates log likelihoods under the assumption that a trait is linked in some families to marker 1 on some chromosome, and in other families it is linked to a marker 2 on another chromosome. POINT4 is interactive and calculates the log likelihood at specific parameter values for a mixture of up to 4 family types. Before running one of the programs, an input file must be constructed according to the rules given below. Input and output files either have fixed names, for example, for the HOMOG pro- gram, the input file is HOMOG.DAT and the output file is HOM- OG.OUT. Output of lod scores and log likelihoods is preset to a width of 80 columns unless the input quantity LL is read. In each of the homogeneity tests, groups or types of fami- lies are assumed where any given family cannot unequivocally be assigned to either of these types. The family types differ from each other with respect to the recombination fraction between two loci (lod scores, two-point situation) or with respect to the map distance between a locus and a fixed point on a map of marker loci (location scores, multi-point situation). In the latter situation, the programs still refers to the map distances as "theta values". The statistical hypotheses referred to in the programs are defined as follows: H0 is the very basic hypothesis of both homogeneity and absence of linkage. H1 is the usual null hypothesis of homogeneity, ie, all families belong to a single family type with linkage between the main locus and the marker locus. H2 refers to the hypothesis of heterogeneity, with two family types, type 1 and type 2, where alpha is the proportion of families of type 1 or, equivalently, the probability of a family of belonging to type 1. The family type 1 is characterized by a recombination fraction theta (programs HOMOG, HOMOG1a, and HOMOG1b) or theta1 (program HOMOG2) while in families of type 2, the recombination fraction is assumed to be equal to 1/2 (pro- grams HOMOG, HOMOG1a, and HOMOG1b) or theta2 (program HOMOG2, theta1 < theta2 < 0.5). H3 refers to a particular type of "homogeneity": there is only one family type with recombination fraction theta, but allowance is made for a difference in the recombination fraction between the sexes. H4 is the heterogeneity alternative to H3, ie, there are two family types with recombination fractions of theta and 1/2 and, in addition, there might also be a sex difference in theta between the sexes. The relationship between the hypotheses 1 through 4 can be displayed as follows: Recombination fraction Alpha=1 Alpha<1 in the two sexes (Homogeneity) (Heterogeneity) ---------------------------------------------------------- equal H1 H2 unequal H3 H4 ---------------------------------------------------------- In the programs and on output, genetic distance is labelled in terms of the recombination fraction, theta. However, the programs may also be used when the genetic distances are in centimorgans, x. To accommodate both types of applications, free recombination (infinite map distance) is on output designated as theta = 99 or -99. Tests of one hypothesis against another are carried out as likelihood ratio tests, where the likelihood ratio with respect to the two hypotheses is calculated. Asymptotic p-values are no longer reported because in many applications they may be unreli- able. CONSTRUCTION OF SUPPORT INTERVALS Some of the programs described below will calculate support intervals for parameters estimated (support "regions" for more than two parameters) and, for each family, the conditional probability of being one of the family types considered. Such calculations are only then carried out, however, when on input a value for LDIFF is specified. Otherwise, no support interval calculations will be performed which will result in faster program execution. Support intervals may be interpreted as approximate confi- dence intervals. However, such approximate confidence intervals will be very crude and practically useless if only a few theta values with lod scores are present so that the theta values are far apart of each other, or when the step size for alpha is too large, say, larger than 0.10. It is important for good support intervals that lod scores are available at many theta values. When only few lod scores are available, a possible solution is to approximate lod scores by interpolation between calculated lod scores before inputting them to the HOMOG programs. In the HOMOG programs, support regions/intervals are comput- ed as follows. First, the program determines the highest Ln likelihood, Lmax, under the most general alternative hypothesis, ie, that with the largest number of parameters estimated. Then, the program recalculates likelihoods and marks as belonging to the support interval all those parameter values which have an Ln likelihood larger than Lmax-Ldiff (Ln likelihood within Ldiff of the maximum). Such a support interval is called an Ldiff-unit support interval. The table below gives examples for the corre- spondence between Ldiff and the associated likelihood ratio. Under regular conditions, support intervals may be inter- preted as approximate confidence intervals. For example, with two-point analysis and two family types (one linked and the other unlinked), 2xLdiff approximately follows a chi-square distribu- tion on 1 df when no heterogeneity is present. In multipoint situations, however, the approximation by chi-square is unreli- able because the distribution of the test statistic is unknown. Difference in units of Approx. Likelihood ----------------------- p-value ratio (LR) ln(LR)=Ldiff lod score (1 df) ---------------------------------------------- 7.39 2.00 0.87 .046 10 2.30 1 .032 20 3.00 1.30 .014 50 3.91 1.70 .005 100 4.60 2 .002 1000 6.91 3 .0002 ---------------------------------------------- HOMOG program Input is as described below. The default file names are HOMOG.DAT for input and HOMOG.OUT for output. Line 1: Title line Line 2: N STEPSIZE LDIFF where N = no. of theta values at which lod scores are available or should be computed (ISW=R). Omit lod=0 at theta=0.5. STEPSIZE = step size at which the alpha values are incre- mented in the search over the likelihood surface (for example, 0.05). LDIFF (optional) = difference in log likelihood, used in the construction of support intervals. LDIFF is optional; if it is not given (or when LDIFF=0) no support intervals will be comput- ed. In regular situations, the joint support interval for alpha and theta corresponds to an approximate 95% confidence region when Ldiff = 3.00. Line 3: OUT ALOW LL where the OUTput option is set as follows: OUT Table of lnL(alpha,theta) Lods for families 0 no no 1 no yes 2 yes no 3 yes yes ALOW = lowest value of alpha analyzed (eg, ALOW=0) LL = line length of output (optional; if missing: LL=80) Line 4: N recombination fraction (theta) values, e.g., 0.01, 0.05, 0.1, etc. At these points, lod scores will be computed. A large number N of thetas (e.g., 10) will yield more accurate results than a small number. If on the same line as the last theta value, a number is given following the last theta value, that number indicates the theta value against which heterogeneity will be tested. Without any such additional number, the test will be against theta=0.5 (infinite map distance). Line 5: NFAM = number of families for which lods are provided Line 6: Lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity and a log likelihood of minus infinity will appear as -99 on output. Repeat line 6 for families 2, 3, etc. ---------------------------------------------- Sample data: the file HOMOG.DAT shows a specific example based on the analysis of Morton (1956) on Elliptocytosis vs. Rh. HOMOG1a program The possible hypotheses under which likelihoods are calcu- lated by the HOMOG1 program can be displayed as follows, where df stands for degrees of freedom (see also introduction). Male and female Homogeneity (one Heterogeneity (two rec. fractions family type) family types) ---------------------------------------------------------- equal H1 (1 df) H2 (2 df) unequal H3 (2 df) H4 (3 df) ---------------------------------------------------------- The test of H1 against H4 leads to a chi-square value with 2 df that may be partitioned into two components according to the manner in which H4 is reached from H1. Note that there are two possible paths leading from H1 to H4. Input to the HOMOG1a program is similar to that for the HOMOG program and is as given in the following table. But refer to the notes below this table. File names are HOMOG1A.DAT for input and HOMOG1A.OUT for output. Line 1: Title line Line 2: NM NF STEPSIZE LDIFF where NM = no. of male theta values, tm, at which lod scores are available. Do not count theta=0.5. NF = no. of female theta values, tf, at which lod scores are available. Do not count theta=0.5. STEPSIZE = step size at which the alpha values are incre- mented in the search over the likelihood surface (eg, 0.05). LDIFF (optional) = difference in log likelihood, used in the construction of support intervals (see section 1, above). In regular situations, the joint support interval for alpha and theta corresponds to an approximate 95% confidence region when Ldiff = 3.91 (LR ÷ 50). Line 3: OUT ALOW LL where the OUTput option is set as fol- lows (Warning: the table of lnL(alpha,theta) contains [NMxNF-1]/ STEPSIZE lines): OUT Table of lnL(alpha,theta) Lods for families 0 no no 1 no yes 2 yes no 3 yes yes ALOW = lowest value of alpha analyzed (eg, ALOW = 0) LL (optional) = line length of output (may be missing) Line 4: NM male theta values, tm. They may be entered on a single line, or distributed over several lines. The order is irrelevant. Line 5: NF female theta values, tf. Line 6: NFAM = number of families Line 7: NM male lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output. Line 8: NF female lod scores for family 1. Repeat lines 7 and 8 for families 2, 3, etc. As to the theta values at which lod scores are available in each family, the user is essentially free which theta values to choose. However, he or she should make sure that there is a sufficiently large number of pairs both with tm=tf and with tm<>tf (<> stands for "not equal to"). Sample data: the file HOMOG1A.DAT provides an example of data that may be analyzed for heterogeneity as well as for a sex difference in the recombination fraction. Data quoted in Ott (1986). HOMOG1b program The calculations performed by the HOMOG1b program are basically the same as those by the HOMOG1a program. HOMOG1b allows input of nonindependent lod scores for the two sexes and, everything else being equal, requires more memory to run. The input is slightly different from that to the HOMOG1a program and is as follows. File names are HOMOG1B.DAT for input and HOMOG- 1B.OUT for output. Line 1: Title line Line 2: N STEPSIZE LDIFF where N = no. of pairs of theta values, tm and tf (male and female recombination fractions), at which lod scores are available. Do not count theta=0.5. STEPSIZE = step size at which the alpha values are incre- mented in the search over the likelihood surface (eg, 0.05). LDIFF (optional) = difference in log likelihood, used in the construction of support intervals (see section 1, above). In regular situations, the joint support interval for alpha and theta corresponds to an approximate 95% confidence region when Ldiff = 3.91 (LR ÷ 50). Line 3: OUT ALOW LL where the OUTput option is set as follows: OUT Table of lnL(alpha,theta) Lods for families --------------------------------------------------- 0 no no 1 no yes 2 yes no 3 yes yes ALOW = lowest value of alpha analyzed (eg, ALOW=0) LL (optional) = line length of output (may be missing) Line 4: N pairs of theta values, tm and tf, where in each pair the first value is the male and the second value is the female recombination fraction. Each pair may be entered on a single line, or several pairs may be entered on one line, e.g., 0.01, 0.01, 0.05, 0.05, 0.01, 0.05,... There must be exactly as many pairs (N of them) as there are lod scores for each family as provided on lines no. 5, below. Omit lod=0 at tm=tf=0.5. The order in which these pairs are provided is irrelevant. Line 5: NFAM = number of families Line 6: Lod scores (N of them) for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likeli- hood of minus infinity will appear as -99 on output. Repeat line 6 for families 2, 3, etc. ------------------------------------------------ As to the pairs of theta values at which lod scores are available in each family, the user is essentially free which theta values to choose. However, he or she should make sure that there is a sufficiently large number of pairs both with tm=tf and with tm<>tf (<> stands for "not equal to"). A minimum set of pairs would be all possible combinations of tm and tf with tm = 0, 0.05, ..., 0.5, and with tf = 0, 0.05, ..., 0.5. When the highest lod score occurs at tf>tm, then it might be sufficient to provide lods in one triangle of the plane of (tm,tf)-values only, eg, at tm = 0, 0.05, ..., 0.5, and tm <= tf <= 0.5. Such a set of theta values may graphically be represented as follows (marked with crosses): tm = 0.5 0.3 x x 0.1 x x x 0.05 x x x x 0 x x x x x -------------------------------- tf = 0 0.05 0.1 0.3 0.5 On input, for example, the following theta values would have to be given on line(s) 4: 0 0 0 0.05 0 0.1 0 0.3 0 0.5 0.05 0.05 0.05 0.1 0.05 0.3 0.05 0.5 0.1 0.1 ... 0.3 0.5 Sample data: the file HOMOG1B.DAT contains the same data as the file HOMOG1A.DAT referenced in the previous section except that the joint lod scores, Z(thm,thf), have been reconstructed from the independent sex-specific lod scores, Z(thm) and Z(thf), as Z(thm,thf) = Z(thm) + Z(thf). HOMOG2 program As mentioned in the introduction, in this extension to the A-test, the alternative hypothesis H2 of heterogeneity specifies two family types, both with linkage, one with recombination fraction theta1 between trait and marker 1, the other with recombination fraction theta2 between trait and marker 2 (theta1 < theta2 < 0.5), where alpha denotes the probability of belonging to type 1 (with theta1). The two markers are on the same chromo- some so that only one set of lod scores of the trait versus the "map" of two markers is provided. Input format is the same as for the HOMOG program and is as given in the following table, but refer to the notes after the table. Notice that there are two possible modes of indicating the number of families: Either one precedes each family with a code, R or L, and provides as many families as desired, or one indicates at the beginning of the family data the total number of families for whom data will follow. File names are HOMOG2.DAT for input and HOMOG2.OUT for output. Line 1: Title line Line 2: NT STEPSIZE LDIFF where NT = no. of theta values (or map distances) at which lod scores are available or should be computed (ISW=R). Omit lod=0 at theta=0.5. STEPSIZE = step size at which the alpha values are incre- mented in the search over the likelihood surface (eg, 0.05). LDIFF (optional) = difference in log likelihood, used in the construction of support intervals. LDIFF is optional. When it is missing, no support intervals will be calculated. Line 3: OUT ALOW LL where the OUTput option is set as follows (Warning: table of lnL(alpha,theta) contains 0.5xN(N+3)/ STEPSIZE lines): OUT Table of lnL(alpha,theta) Lods for families --------------------------------------------------- 0 no no 1 no yes 2 yes no 3 yes yes --------------------------------------------------- ALOW = lowest value of alpha analyzed (eg, ALOW=0) LL = line length of output (optional; if missing: LL=80) Line 4: Recombination fraction (theta) values, e.g., 0.01, 0.05, etc. At these points, lod scores will be computed. A rather large number NT of recombination fractions (e.g., 10) will yield more accurate results than a small number. Line 5: NFAM = number of families for which lod scores are provided. Line 6: Lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output. Repeat line 6 for each family. -------------------------------------------- The null hypothesis (H1) is specified by alpha=1 or, equiva- lently, by theta1=theta2, and has one degree of freedom (df), ie, theta1. The alternative hypothesis (H2) is characterized by three df, ie, alpha, theta1, and theta2. However, setting alpha=1 forces theta2 to be equal to theta1 so that the asymptot- ic chi-square distribution may not apply. Instead of p-values, the current version of the HOMOG2 program outputs the likelihood ratios (odds ratios) for the hypotheses considered. HOMOG3 and HOMOG4 programs These programs are straightforward extensions of the HOMOG2 program to 3 and 4 family types. They use the same input format as the HOMOG and HOMOG2 programs, and the input files are HOM- OG3.DAT and HOMOG4.DAT, respectively. Output files will be called HOMOG3.OUT and HOMOG4.OUT. The HOMOG3 and HOMOG4 programs simply calculate the max. Ln likelihood under the most general hypothesis of heterogeneity. Appropriate significance tests will have to be carried out manually by the user by comparing output from these programs with output from the HOMOG or HOMOG2 programs. Notice that HOMOG3 and HOMOG4 carry out an exhaustive search of the parameter space and may require a large amount of computer time. While they are running, they display the current alpha values so that they may be interrupted by the user. Interpreting results of HOMOG3 or HOMOG4 is not straightfor- ward. For example, whenever one of the components (alpha's) is equal to zero, the associated theta value is irrelevant. Also, there may be more than one parameter constellation with the same maximum likelihood. The HOMOG3 and HOMOG4 programs differ in their output as follows. In the HOMOG3 program, if the OUTput option (line 3) is set to a value larger than 1, all possible sets of alpha values will be printed (one set per line), and for each set the maximum likelihood over the theta values will be given along with those theta values at which the maximum occurred. In the HOMOG4 program, if the OUTput option (line 3) is set to a value larger than 1, a table containing the Ln likelihood for each possible set of parameter values will be written to the output file. WARNING: THIS FILE COULD BE VERY LARGE! For example, when the sample HOMOG.DAT file is analyzed by the HOMOG4 program, the output file will be 1.5MB long. For most practical situations, one should set OUT=0 on line 3. Notice that each alpha component cannot take on the whole range of values from 0 through 1. For computational efficiency, only alpha3 is allowed to have a value of 1; if in the course of the calculations, alpha2 or alpha1 were also allowed to be equal to one, the resulting likelihoods and theta estimates would be exactly the same as with alpha3=1. Similarly, alpha3 cannot be equal to zero -- if a single alpha is zero, it must be alpha1; if two alpha's are zero, these must be alpha1 and alpha2. HOMOG3R program This is a specialized version of the HOMOG3 program. It calculates log likelihoods under the assumption that in a propor- tion a1 of families a trait is linked to marker 1 and in a pro- portion a2 of families it is linked to marker 2, where the two markers (or maps of markers) are located in different regions of the genome such that the trait is never truly linked to both markers. There may be a third proportion, a3=1-a1-a2, of fami- lies without linkage to markers 1 and 2. The two regions of the genome usually correspond to differ- ent chromosomes and are identified in the following table by the respective sets of theta values, NT1 and NT2. Default file names are HOMOG3R.DAT for input and HOMOG3R.OUT for output. Line 1: Title line Line 2: NT1 NT2 STEPSIZE where NT1 = no. of theta values (or map locations) at which lod scores are available for trait versus marker 1. Omit lod = 0 at theta = 0.5. NT2 analogous for marker 2. STEPSIZE = step size at which the alpha values are incre- mented in the search over the likelihood surface (eg, 0.05). Line 3: OUT ALOW where the OUTput option is set as follows: OUT Table of lnL(alpha,theta) Lods for families --------------------------------------------------- 0 no no 1 no yes 2 yes no 3 yes yes --------------------------------------------------- The table of lnL(alpha,theta) will print one line for each pair of alpha1 and alpha2. In each line, the log likelihood, maximized over the thetas, is printed. ALOW = lowest value of alpha analyzed (eg, ALOW=0). Line 4: All NT1+NT2 theta values, e.g., 0.01, 0.05, etc., that is, the theta values for marker 1 immediately followed by the theta values for marker 2. These values are for identification purposes only and not used in the calculations. It may thus be useful to distinguish theta values for marker 1 (eg. -0.10 or 0.11) from those for marker 2 (eg. 0.10). A large number of recombination fractions will yield more accurate results than a small number. Line 5: NFAM = number of families for which lod scores are provided. Line 6: The NT1+NT2 od scores for family 1. Lods smaller than - 80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output. Repeat line 6 for each family. ------------------------------------------------ A special situation is given when the two markers are taken to be candidate genes and lod scores are evaluated at theta=0 only. In this case, the HOMOG3R program will maximize the likelihoods over theta=0 and theta=0.5. Consider the following input file (another sample data set is provided in the file HOMOG3R.DAT): Linkage to two candidate genes on different chromosomes 1 1 .05 1 0 -0.01 0.01 4 0.903 -99 2.007 -99 0.601 0.601 -99 1.204 For each of four families, at each of two chromosomes, the file contains lod scores at theta=0 (identified as -0.01 for marker 1 on chromosome 1 and 0.01 for marker 2 on chromosome 2). There are three possible hypotheses of homogeneity: 1) all families are linked with marker 1 but unlinked with marker 2; 2) all families are linked with marker 2 but not with marker 1; 3) all families are unlinked with markers 1 and 2. The first two hypotheses clearly have zero likelihood, because there is always at least one family with one or more known recombinations. The HOMOG3R program furnishes the following output: Program HOMOG3R version 1.70 J. Ott Heterogeneity -- Three family types, type 1 with linkage to first set of theta values, type 2 with linkage to second set of theta values (usually two different chromosomes), type 3 unlinked. >> Linkage to two candidate genes on different chromosomes << Fam. Lod scores 1 0.9030 -99.0000 2 2.0070 -99.0000 3 0.6010 0.6010 4 -99.0000 1.2040 Theta -0.0100 0.0100 Results for different hypotheses (fixed values in parentheses) Hypothesis a1 a2 a3 t1 t2 lnL ---------------------------------------------------------------------- H1 Heterogeneity 0.65 0.35 0.00 -0.010 0.010 8.9453 H2 Het, a3=0 0.65 0.35 (0) -0.010 0.010 8.9453 H3 Het, a2=0 0.70 (0) 0.30 -0.010 (-99) 5.9688 H4 Het, a1=0 (0) 0.40 0.60 (-99) -0.010 1.7107 H5 Homogeneity, a1=1 (1) (0) (0) -99.000 (-99) 0.0000 H6 Homogeneity, a2=1 (0) (1) (0) (-99) -99.000 0.0000 H7 Homogeneity, a3=1 (0) (0) (1) (-99) (-99) (0) Evidence for heterogeneity (H1 vs. H5/6/7): Difference in Ln(L) = 8.9453 Lik. ratio for heterog. = 7671.7558 Evidence for heterogeneity (H1 vs. H3/4): Difference in Ln(L) = 2.9765 Lik. ratio for heterog. = 19.6190 Family Conditional prob. of being no. type 1 type 2 type 3 (under heterogeneity, H1) 1 1.0000 0.0000 0.0000 2 1.0000 0.0000 0.0000 3 0.6500 0.3500 0.0000 4 0.0000 1.0000 0.0000 The program output shows positive log likelihoods for hypotheses H1 through H4. It may come as a surprise that formal- ly the program also indicates non-null likelihoods (log likeli- hood not equal to -99, ie. not equal to negative infinity) for hypotheses H5 and H6, that is, the data are compatible with homogeneity (locus in all families on chromosome 1 or chromosome 2). This is so because the likelihood is maximized over theta = 0 and theta = 0.5; the estimated theta values are then 0.5 (indicated by -99.000 in the output) when theta=0 is incompatible with the data. Of course, for fixed values, theta=0, the data are incompatible with hypotheses a1=1 and a2=1, but the HOMOG3R program does not work with fixed theta values. Two ways of measuring evidence for heterogeneity are distin- guished: 1.) Any heterogeneity (H1, 2 alpha parameters estimat- ed) versus strict homogeneity, either on chromosome 1 or chromo- some 2 or elsewhere (no alpha parameters estimated), and 2.) any heterogeneity versus homogeneity on chromosomes 1 or 2, where under each hypothesis a proportion of unlinked families is allowed for (that proportion, a3, is treated as a nuisance param- eter). The latter test specifically adresses the question of heterogeneity between chromosomes 1 and 2, irrespective of heterogeneity between known chromosomes and locations elsewhere. POINT4 program The POINT4 program is interactive and calculates the log likelihood at specific parameter values for a mixture of up to 4 family types. It reads input files in the regular format for the HOMOG program. When files in the format for HOMOG3R are to be used by POINT4, the two numbers on the second line indicating numbers of recombination fractions must be replaced by a single number, which is the sum of the previous two numbers. To use the program, you will have to furnish 4 values of alpha (proportions of family types), eg, 0.23 0.77 0 0 for two components. Also, you need to specify "theta" values. However, rather than the actual recombination fractions, the program expects the consecutive (integer) numbers corresponding to the theta values given in the input file, for example, 3 for the third theta value. To specify a recombination fraction of 50%, enter a number outside the range of numbers of theta values, eg. 0. The theta numbers corresponding to an alpha=0 are irrele- vant. MTEST program The MTEST program implements Morton's likelihood ratio test for heterogeneity of the recombination fraction among different groups of families (Morton 1956). Each group consists of a certain number of families, eg, the groups may correspond to investigators, or to countries of origin. Also, each family may be regarded as forming a group of its own (Morton's original usage of the test). The test assumes homogeneity within each group (same theta). The null hypothesis specifies overall homogeneity while under the alternative hypothesis of heterogene- ity, a potentially different theta value exists for each group. Files used by the program have the following fixed names: MTEST.DAT is the input file. It has the same structure as the input file to the HOMOG program. A sample MTEST.DAT file is provided. MTEST.OUT is the output file. MTEST.GRP is an input file holding the family group defini- tions. The first line contains the number, NGR, of groups to follow on subsequent lines. On each of the following NGR lines, family numbers are given that form one group, eg, 3 11 12 15. Contiguous family numbers may be given in abbreviated form, eg, numbers 7 through 11 may be given as -7 11. The first line and the following NGR lines define one set of groups. As many such sets may be given as desired. An example MTEST.GRP file is provided. Note that after the number NGR of groups, a title may follow on the same line, but there must be at least one space between NGR and the title. If NGR=0 is given as the number of groups, this is taken to indicate that each family should form one group of its own (original usage of Morton's test). In that case, no family numbers are to be provided. The last line of the MTEST.GRP file should contain the number -1 to indicate the end of input. REFERENCES Morton NE (1956) The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type. Am J Hum Genet 8, 80-96 Ott J (1986) Linkage probability and its approximate confi- dence interval under possible heterogeneity. Genet Epidemiol Suppl 1, 251-257 Ott J (1991) Analysis of Human Genetic Linkage, revised edition. Johns Hopkins University Press, Baltimore Terwilliger JD, Ott J (1994) Handbook of Human Genetic Linkage. Johns Hopkins University Press, Baltimore