Jurg Ott                 30 June 1994         Tel. (212) 960-2507
Columbia University, Unit 58                  FAX: (212) 568-2750
722 West 168th Street
New York, NY 10032                 E-mail:  jurg.ott@columbia.edu


          Documentation to homogeneity programs
               
               Copyright (C) Jurg Ott 1990


                         Contents
-----------------------------------------------------------------

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . .  1

CONSTRUCTION OF SUPPORT INTERVALS. . . . . . . . . . . . . . .  4

HOMOG program. . . . . . . . . . . . . . . . . . . . . . . . .  6

HOMOG1a program. . . . . . . . . . . . . . . . . . . . . . . .  7

HOMOG1b program. . . . . . . . . . . . . . . . . . . . . . . .  9

HOMOG2 program . . . . . . . . . . . . . . . . . . . . . . .   11

HOMOG3 and HOMOG4 programs . . . . . . . . . . . . . . . . .   12

HOMOG3R program. . . . . . . . . . . . . . . . . . . . . . .   13

POINT4 program . . . . . . . . . . . . . . . . . . . . . . .   17

MTEST program. . . . . . . . . . . . . . . . . . . . . . . .   17

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . .   18
-----------------------------------------------------------------


INTRODUCTION

     This documentation and the programs described herein are
copyrighted.  This means that the programs may be copied freely
for nonprofit scientific use but they must not be used for
commercial purposes unless a specific license for commercial use
is obtained from the author.  Also, anyone modifying these
programs must display a note indicating this fact, and the note
must appear both in the source code and in the output file
produced by the programs.
     Two test situations may be distinguished, a mixture of
families in which families cannot unequivocally be assigned to
one or the type or group (HOMOG programs), or known groups of
families (fixed classifications, MTEST program, see separate
section below), for example, as distinguished by their origin.
     The HOMOG programs are written as close to standard Pascal
as possible.  They should with little modification be compilable
with almost any Pascal compiler.  For DOS, they were compiled
with Turbo Pascal 7.0 and will sense a coprocessor if one is in-
stalled, and will emulate it if none is present.  In case of
problems with the coprocessor, the sensing mechanism may be
turned off by telling Turbo Pascal whether it should use the
coprocessor or not.  This is achieved by issuing the DOS command
SET 87=YES or SET 87=NO, respectively.  In Turbo Pascal, SEEKEOLN
is a standard function;  for possible use in other Pascals, the
source code of a function with the same effect as SEEKEOLN is
included.  For OS/2, the programs are compiled with NDP Pascal
from Microway Inc.  For easy of recompiling the programs, two
command files, SETNDP.CMD and COMPILE.CMD, are included in the
OS/2 package.  NDP Pascal is close to Pascal on Unix machines.
     This documentation describes different forms of the homoge-
neity (admixture) test.  All HOMOG programs analyze heterogeneity
(two or more disease loci) with respect to either a single marker
locus or to a known map of markers.  In the first case, the
programs expect lod scores between disease phenotype and the
marker, and in the second case, they expect multipoint lod scores
for disease versus the known map of markers.  Such multipoint lod
scores can be obtained from the LINKMAP program by letting the
disease locus walk in steps along the map.  The multipoint lod
scores for each family will have to calculated from the LINKMAP
output or may be obtained from LINKMAP output using the LINKLODS
program of the LINKAGE package.

     HOMOG carries out a homogeneity test (A-test) under the
following alternative hypothesis: two family types, one with
linkage between a trait (or any gene locus for that matter) to a
marker of map of markers, the other without linkage.  For more
information see Ott (1991).
     HOMOG1 is an extension of the homogeneity test, with the
following alternative hypothesis:  two family types, one linked
the other unlinked, plus a possible sex difference in the recom-
bination fraction.  This program comes in two versions, depending
on whether the lods for the two sexes are independent or not: 
HOMOG1a reads independent lods, HOMOG1b reads dependent lods. 
For the same problem, with independent lods, HOMOG1a is more
efficient in terms of memory space.
     HOMOG2 is also an extension of the homogeneity test, with
the alternative hypothesis of two family types, both with linkage
but to two different markers on the same chromosome.  The recom-
bination fraction (or map distance) between trait and marker 1 is
theta1, that between trait and marker 2 is theta2, where theta1 <
theta2 < 0.5 (male and female recombination fractions the same).
     HOMOG3 and HOMOG4 are analogous to HOMOG2 but specify 3 or 4
family types (marker loci).  They only calculate the max. log
likelihood and the ML estimates.
     HOMOG3R is a specialized version of the HOMOG3 program.  It
calculates log likelihoods under the assumption that a trait is
linked in some families to marker 1 on some chromosome, and in
other families it is linked to a marker 2 on another chromosome.
     POINT4 is interactive and calculates the log likelihood at
specific parameter values for a mixture of up to 4 family types.

     Before running one of the programs, an input file must be
constructed according to the rules given below.  Input and output
files either have fixed names, for example, for the HOMOG pro-
gram, the input file is HOMOG.DAT and the output file is HOM-
OG.OUT.  Output of lod scores and log likelihoods is preset to a
width of 80 columns unless the input quantity LL is read.
     In each of the homogeneity tests, groups or types of fami-
lies are assumed where any given family cannot unequivocally be
assigned to either of these types.  The family types differ from
each other with respect to the recombination fraction between two
loci (lod scores, two-point situation) or with respect to the map
distance between a locus and a fixed point on a map of marker
loci (location scores, multi-point situation).  In the latter
situation, the programs still refers to the map distances as
"theta values".  The statistical hypotheses referred to in the
programs are defined as follows:
     H0 is the very basic hypothesis of both homogeneity and
absence of linkage.
     H1 is the usual null hypothesis of homogeneity, ie, all
families belong to a single family type with linkage between the
main locus and the marker locus.
     H2 refers to the hypothesis of heterogeneity, with two
family types, type 1 and type 2, where alpha is the proportion of
families of type 1 or, equivalently, the probability of a family
of belonging to type 1.  The family type 1 is characterized by a
recombination fraction theta (programs HOMOG, HOMOG1a, and
HOMOG1b) or theta1 (program HOMOG2) while in families of type 2,
the recombination fraction is assumed to be equal to 1/2 (pro-
grams HOMOG, HOMOG1a, and HOMOG1b) or theta2 (program HOMOG2,
theta1 < theta2 < 0.5).
     H3 refers to a particular type of "homogeneity": there is
only one family type with recombination fraction theta, but
allowance is made for a difference in the recombination fraction
between the sexes.
     H4 is the heterogeneity alternative to H3, ie, there are two
family types with recombination fractions of theta and 1/2 and,
in addition, there might also be a sex difference in theta
between the sexes.
     The relationship between the hypotheses 1 through 4 can be
displayed as follows:

Recombination fraction          Alpha=1        Alpha<1
in the two sexes            (Homogeneity)  (Heterogeneity)
----------------------------------------------------------
           equal                  H1              H2
         unequal                  H3              H4
----------------------------------------------------------

     In the programs and on output, genetic distance is labelled
in terms of the recombination fraction, theta.  However, the
programs may also be used when the genetic distances are in
centimorgans, x.  To accommodate both types of applications, free
recombination (infinite map distance) is on output designated as
theta = 99 or -99.
     Tests of one hypothesis against another are carried out as
likelihood ratio tests, where the likelihood ratio with respect
to the two hypotheses is calculated.  Asymptotic p-values are no
longer reported because in many applications they may be unreli-
able.

CONSTRUCTION OF SUPPORT INTERVALS

     Some of the programs described below will calculate support
intervals for parameters estimated (support "regions" for more
than two parameters) and, for each family, the conditional
probability of being one of the family types considered.  Such
calculations are only then carried out, however, when on input a
value for LDIFF is specified.  Otherwise, no support interval
calculations will be performed which will result in faster
program execution.
     Support intervals may be interpreted as approximate confi-
dence intervals.  However, such approximate confidence intervals
will be very crude and practically useless if only a few theta
values with lod scores are present so that the theta values are
far apart of each other, or when the step size for alpha is too
large, say, larger than 0.10.  It is important for good support
intervals that lod scores are available at many theta values. 
When only few lod scores are available, a possible solution is to
approximate lod scores by interpolation between calculated lod
scores before inputting them to the HOMOG programs.
     In the HOMOG programs, support regions/intervals are comput-
ed as follows.  First, the program determines the highest Ln
likelihood, Lmax, under the most general alternative hypothesis,
ie, that with the largest number of parameters estimated.  Then,
the program recalculates likelihoods and marks as belonging to
the support interval all those parameter values which have an Ln
likelihood larger than Lmax-Ldiff (Ln likelihood within Ldiff of
the maximum).  Such a support interval is called an Ldiff-unit
support interval.  The table below gives examples for the corre-
spondence between Ldiff and the associated likelihood ratio.
     Under regular conditions, support intervals may be inter-
preted as approximate confidence intervals.  For example, with
two-point analysis and two family types (one linked and the other
unlinked), 2xLdiff approximately follows a chi-square distribu-
tion on 1 df when no heterogeneity is present.  In multipoint
situations, however, the approximation by chi-square is unreli-
able because the distribution of the test statistic is unknown.

                    Difference in units of  Approx.
     Likelihood    -----------------------  p-value
     ratio (LR)    ln(LR)=Ldiff  lod score  (1 df)
     ----------------------------------------------
          7.39         2.00         0.87     .046
         10            2.30         1        .032
         20            3.00        1.30      .014
         50            3.91        1.70      .005
        100            4.60        2         .002
       1000            6.91        3         .0002
     ----------------------------------------------


HOMOG program

     Input is as described below.  The default file names are
HOMOG.DAT for input and HOMOG.OUT for output.

Line 1:  Title line

Line 2:  N  STEPSIZE  LDIFF    where
     N = no. of theta values at which lod scores are available or
should be computed (ISW=R).  Omit lod=0 at theta=0.5.
     STEPSIZE = step size at which the alpha values are incre-
mented in the search over the likelihood surface (for example,
0.05).
     LDIFF (optional) = difference in log likelihood, used in the
construction of support intervals.  LDIFF is optional;  if it is
not given (or when LDIFF=0) no support intervals will be comput-
ed.  In regular situations, the joint support interval for alpha
and theta corresponds to an approximate 95% confidence region
when Ldiff = 3.00.

Line 3:  OUT  ALOW  LL     where the OUTput option is set as
follows:

  OUT   Table of lnL(alpha,theta)   Lods for families
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes

     ALOW = lowest value of alpha analyzed (eg, ALOW=0)
     LL = line length of output (optional; if missing: LL=80)

Line 4:  N recombination fraction (theta) values, e.g., 0.01,
0.05, 0.1, etc.  At these points, lod scores will be computed.  A
large number N of thetas (e.g., 10) will yield more accurate
results than a small number.  If on the same line as the last
theta value, a number is given following the last theta value,
that number indicates the theta value against which heterogeneity
will be tested.  Without any such additional number, the test
will be against theta=0.5 (infinite map distance).

Line 5:  NFAM = number of families for which lods are provided

Line 6:  Lod scores for family 1.  Lods smaller than -80 are
taken to represent minus infinity and a log likelihood of minus
infinity will appear as -99 on output.

Repeat line 6 for families 2, 3, etc.
----------------------------------------------

Sample data: the file HOMOG.DAT shows a specific example based on
the analysis of Morton (1956) on Elliptocytosis vs. Rh.


HOMOG1a program

     The possible hypotheses under which likelihoods are calcu-
lated by the HOMOG1 program can be displayed as follows, where df
stands for degrees of freedom (see also introduction).

Male and female    Homogeneity (one     Heterogeneity (two
rec. fractions     family type)         family types)
----------------------------------------------------------
        equal          H1 (1 df)            H2 (2 df)
      unequal          H3 (2 df)            H4 (3 df)
----------------------------------------------------------

     The test of H1 against H4 leads to a chi-square value with 2
df that may be partitioned into two components according to the
manner in which H4 is reached from H1.  Note that there are two
possible paths leading from H1 to H4.
     Input to the HOMOG1a program is similar to that for the
HOMOG program and is as given in the following table.  But refer
to the notes below this table.  File names are HOMOG1A.DAT for
input and HOMOG1A.OUT for output.


Line 1:   Title line

Line 2:   NM  NF  STEPSIZE  LDIFF     where
     NM = no. of male theta values, tm, at which lod scores are
available.  Do not count theta=0.5.
     NF = no. of female theta values, tf, at which lod scores are
available.  Do not count theta=0.5.
     STEPSIZE = step size at which the alpha values are incre-
mented in the search over the likelihood surface (eg, 0.05).
     LDIFF (optional) = difference in log likelihood, used in the
construction of support intervals (see section 1, above).  In
regular situations, the joint support interval for alpha and
theta corresponds to an approximate 95% confidence region when
Ldiff = 3.91 (LR ÷ 50).

Line 3:  OUT  ALOW  LL    where the OUTput option is set as fol-
lows (Warning: the table of lnL(alpha,theta) contains [NMxNF-1]/
STEPSIZE lines):

  OUT   Table of lnL(alpha,theta)   Lods for families
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes

     ALOW = lowest value of alpha analyzed (eg, ALOW = 0)
     LL (optional) = line length of output (may be missing)

Line 4:  NM male theta values, tm.  They may be entered on a
single line, or distributed over several lines.  The order is
irrelevant.

Line 5:  NF female theta values, tf.

Line 6:  NFAM = number of families

Line 7:  NM male lod scores for family 1.  Lods smaller than -80
are taken to represent minus infinity, and a log likelihood of
minus infinity will appear as -99 on output.

Line 8:  NF female lod scores for family 1.

Repeat lines 7 and 8 for families 2, 3, etc.

     As to the theta values at which lod scores are available in
each family, the user is essentially free which theta values to
choose.  However, he or she should make sure that there is a
sufficiently large number of pairs both with tm=tf and with
tm<>tf (<> stands for "not equal to").
     Sample data: the file HOMOG1A.DAT provides an example of
data that may be analyzed for heterogeneity as well as for a sex
difference in the recombination fraction.  Data quoted in Ott
(1986).

HOMOG1b program

     The calculations performed by the HOMOG1b program are
basically the same as those by the HOMOG1a program.  HOMOG1b
allows input of nonindependent lod scores for the two sexes and,
everything else being equal, requires more memory to run.  The
input is slightly different from that to the HOMOG1a program and
is as follows.  File names are HOMOG1B.DAT for input and HOMOG-
1B.OUT for output.

Line 1:  Title line

Line 2:  N  STEPSIZE  LDIFF     where
     N = no. of pairs of theta values, tm and tf (male and female
recombination fractions), at which lod scores are available.  Do
not count theta=0.5.
     STEPSIZE = step size at which the alpha values are incre-
mented in the search over the likelihood surface (eg, 0.05).
     LDIFF (optional) = difference in log likelihood, used in the
construction of support intervals (see section 1, above).  In
regular situations, the joint support interval for alpha and
theta corresponds to an approximate 95% confidence region when
Ldiff = 3.91 (LR ÷ 50).

Line 3:  OUT  ALOW  LL    where the OUTput option is set as
follows:

   OUT   Table of lnL(alpha,theta)   Lods for families
   ---------------------------------------------------
    0               no                     no
    1               no                    yes
    2              yes                     no
    3              yes                    yes

     ALOW = lowest value of alpha analyzed (eg, ALOW=0)
     LL (optional) = line length of output (may be missing)

Line 4:  N pairs of theta values, tm and tf, where in each pair
the first value is the male and the second value is the female
recombination fraction.  Each pair may be entered on a single
line, or several pairs may be entered on one line, e.g., 0.01,
0.01, 0.05, 0.05, 0.01, 0.05,...  There must be exactly as many
pairs (N of them) as there are lod scores for each family as
provided on lines no. 5, below.  Omit lod=0 at tm=tf=0.5.  The
order in which these pairs are provided is irrelevant.

Line 5:  NFAM = number of families

Line 6:  Lod scores (N of them) for family 1.  Lods smaller
than -80 are taken to represent minus infinity, and a log likeli-
hood of minus infinity will appear as -99 on output.

Repeat line 6 for families 2, 3, etc.
------------------------------------------------

As to the pairs of theta values at which lod scores are available
in each family, the user is essentially free which theta values
to choose.  However, he or she should make sure that there is a
sufficiently large number of pairs both with tm=tf and with
tm<>tf (<> stands for "not equal to").  A minimum set of pairs
would be all possible combinations of tm and tf with tm = 0,
0.05, ..., 0.5, and with tf = 0, 0.05, ..., 0.5.  When the
highest lod score occurs at tf>tm, then it might be sufficient to
provide lods in one triangle of the plane of (tm,tf)-values only,
eg, at tm = 0, 0.05, ..., 0.5, and tm <= tf <= 0.5.  Such a set
of theta values may graphically be represented as follows (marked
with crosses):

             tm = 0.5
                  0.3                 x    x
                  0.1            x    x    x
                  0.05      x    x    x    x
                   0   x    x    x    x    x
             --------------------------------
             tf =      0  0.05  0.1  0.3  0.5

On input, for example, the following theta values would have to
be given on line(s) 4:

0 0   0 0.05   0 0.1   0 0.3   0 0.5   0.05 0.05
0.05 0.1  0.05 0.3   0.05 0.5   0.1 0.1 ... 0.3 0.5

     Sample data: the file HOMOG1B.DAT contains the same data as
the file HOMOG1A.DAT referenced in the previous section except
that the joint lod scores, Z(thm,thf), have been reconstructed
from the independent sex-specific lod scores, Z(thm) and Z(thf),
as Z(thm,thf) = Z(thm) + Z(thf).

HOMOG2 program

     As mentioned in the introduction, in this extension to the
A-test, the alternative hypothesis H2 of heterogeneity specifies
two family types, both with linkage, one with recombination
fraction theta1 between trait and marker 1, the other with
recombination fraction theta2 between trait and marker 2 (theta1
< theta2 < 0.5), where alpha denotes the probability of belonging
to type 1 (with theta1).  The two markers are on the same chromo-
some so that only one set of lod scores of the trait versus the
"map" of two markers is provided.
     Input format is the same as for the HOMOG program and is as
given in the following table, but refer to the notes after the
table.  Notice that there are two possible modes of indicating
the number of families:  Either one precedes each family with a
code, R or L, and provides as many families as desired, or one
indicates at the beginning of the family data the total number of
families for whom data will follow.  File names are HOMOG2.DAT
for input and HOMOG2.OUT for output.

Line 1:  Title line

Line 2:  NT  STEPSIZE  LDIFF     where

     NT = no. of theta values (or map distances) at which lod
scores are available or should be computed (ISW=R).  Omit lod=0
at theta=0.5.
     STEPSIZE = step size at which the alpha values are incre-
mented in the search over the likelihood surface (eg, 0.05).
     LDIFF (optional) = difference in log likelihood, used in the
construction of support intervals.  LDIFF is optional.  When it
is missing, no support intervals will be calculated.

Line 3:  OUT  ALOW  LL     where the OUTput option is set as
follows (Warning: table of lnL(alpha,theta) contains 0.5xN(N+3)/
STEPSIZE lines):

  OUT   Table of lnL(alpha,theta)   Lods for families
  ---------------------------------------------------
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes
  ---------------------------------------------------
     ALOW = lowest value of alpha analyzed (eg, ALOW=0)
     LL = line length of output (optional; if missing: LL=80)

Line 4:  Recombination fraction (theta) values, e.g., 0.01, 0.05,
etc.  At these points, lod scores will be computed.  A rather
large number NT of recombination fractions (e.g., 10) will yield
more accurate results than a small number.

Line 5:  NFAM = number of families for which lod scores are
provided.

Line 6:  Lod scores for family 1.  Lods smaller than -80 are
taken to represent minus infinity, and a log likelihood of minus
infinity will appear as -99 on output.

Repeat line 6 for each family.
--------------------------------------------

     The null hypothesis (H1) is specified by alpha=1 or, equiva-
lently, by theta1=theta2, and has one degree of freedom (df), ie,
theta1.  The alternative hypothesis (H2) is characterized by
three df, ie, alpha, theta1, and theta2.  However, setting
alpha=1 forces theta2 to be equal to theta1 so that the asymptot-
ic chi-square distribution may not apply.  Instead of p-values,
the current version of the HOMOG2 program outputs the likelihood
ratios (odds ratios) for the hypotheses considered.

HOMOG3 and HOMOG4 programs

     These programs are straightforward extensions of the HOMOG2
program to 3 and 4 family types.  They use the same input format
as the HOMOG and HOMOG2 programs, and the input files are HOM-
OG3.DAT and HOMOG4.DAT, respectively.  Output files will be
called HOMOG3.OUT and HOMOG4.OUT.
     The HOMOG3 and HOMOG4 programs simply calculate the max. Ln
likelihood under the most general hypothesis of heterogeneity.
Appropriate significance tests will have to be carried out
manually by the user by comparing output from these programs with
output from the HOMOG or HOMOG2 programs.  Notice that HOMOG3 and
HOMOG4 carry out an exhaustive search of the parameter space and
may require a large amount of computer time.  While they are
running, they display the current alpha values so that they may
be interrupted by the user.
     Interpreting results of HOMOG3 or HOMOG4 is not straightfor-
ward.  For example, whenever one of the components (alpha's) is
equal to zero, the associated theta value is irrelevant.  Also,
there may be more than one parameter constellation with the same
maximum likelihood.  The HOMOG3 and HOMOG4 programs differ in
their output as follows.
     In the HOMOG3 program, if the OUTput option (line 3) is set
to a value larger than 1, all possible sets of alpha values will
be printed (one set per line), and for each set the maximum
likelihood over the theta values will be given along with those
theta values at which the maximum occurred.
     In the HOMOG4 program, if the OUTput option (line 3) is set
to a value larger than 1, a table containing the Ln likelihood
for each possible set of parameter values will be written to the
output file.  WARNING:  THIS FILE COULD BE VERY LARGE!  For
example, when the sample HOMOG.DAT file is analyzed by the HOMOG4
program, the output file will be 1.5MB long.  For most practical
situations, one should set OUT=0 on line 3.
     Notice that each alpha component cannot take on the whole
range of values from 0 through 1.  For computational efficiency,
only alpha3 is allowed to have a value of 1;  if in the course of
the calculations, alpha2 or alpha1 were also allowed to be equal
to one, the resulting likelihoods and theta estimates would be
exactly the same as with alpha3=1.  Similarly, alpha3 cannot be
equal to zero -- if a single alpha is zero, it must be alpha1; if
two alpha's are zero, these must be alpha1 and alpha2.

HOMOG3R program

     This is a specialized version of the HOMOG3 program.  It
calculates log likelihoods under the assumption that in a propor-
tion a1 of families a trait is linked to marker 1 and in a pro-
portion a2 of families it is linked to marker 2, where the two
markers (or maps of markers) are located in different regions of
the genome such that the trait is never truly linked to both
markers.  There may be a third proportion, a3=1-a1-a2, of fami-
lies without linkage to markers 1 and 2.
     The two regions of the genome usually correspond to differ-
ent chromosomes and are identified in the following table by the
respective sets of theta values, NT1 and NT2.
     Default file names are HOMOG3R.DAT for input and HOMOG3R.OUT
for output.

Line 1:  Title line

Line 2:  NT1  NT2  STEPSIZE    where

     NT1 = no. of theta values (or map locations) at which lod
scores are available for trait versus marker 1.  Omit lod = 0 at
theta = 0.5.
     NT2 analogous for marker 2.
     STEPSIZE = step size at which the alpha values are incre-
mented in the search over the likelihood surface (eg, 0.05).

Line 3:  OUT  ALOW    where the OUTput option is set as follows:

  OUT   Table of lnL(alpha,theta)   Lods for families
  ---------------------------------------------------
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes
  ---------------------------------------------------
     The table of lnL(alpha,theta) will print one line for each
pair of alpha1 and alpha2. In each line, the log likelihood,
maximized over the thetas, is printed.
     ALOW = lowest value of alpha analyzed (eg, ALOW=0).

Line 4:  All NT1+NT2 theta values, e.g., 0.01, 0.05, etc., that
is, the theta values for marker 1 immediately followed by the
theta values for marker 2.  These values are for identification
purposes only and not used in the calculations.  It may thus be
useful to distinguish theta values for marker 1 (eg. -0.10 or
0.11) from those for marker 2 (eg. 0.10).  A large number of
recombination fractions will yield more accurate results than a
small number.

Line 5:  NFAM = number of families for which lod scores are
provided.

Line 6:  The NT1+NT2 od scores for family 1.  Lods smaller than -
80 are taken to represent minus infinity, and a log likelihood of
minus infinity will appear as -99 on output.

Repeat line 6 for each family.
------------------------------------------------

     A special situation is given when the two markers are taken
to be candidate genes and lod scores are evaluated at theta=0
only.  In this case, the HOMOG3R program will maximize the
likelihoods over theta=0 and theta=0.5.  Consider the following
input file (another sample data set is provided in the file
HOMOG3R.DAT):

Linkage to two candidate genes on different chromosomes
1 1 .05
1 0
-0.01 0.01
4
 0.903  -99
 2.007  -99
 0.601  0.601
 -99    1.204

     For each of four families, at each of two chromosomes, the
file contains lod scores at theta=0 (identified as -0.01 for
marker 1 on chromosome 1 and 0.01 for marker 2 on chromosome 2).
     There are three possible hypotheses of homogeneity:  1) all
families are linked with marker 1 but unlinked with marker 2;  2)
all families are linked with marker 2 but not with marker 1;  3)
all families are unlinked with markers 1 and 2.  The first two
hypotheses clearly have zero likelihood, because there is always
at least one family with one or more known recombinations.  The
HOMOG3R program furnishes the following output:


Program  HOMOG3R  version 1.70   J. Ott

Heterogeneity -- Three family types, type 1 with linkage to first set of
theta values, type 2 with linkage to second set of theta values (usually
two different chromosomes), type 3 unlinked.

>> Linkage to two candidate genes on different chromosomes <<

      Fam.  Lod scores
         1    0.9030  -99.0000
         2    2.0070  -99.0000
         3    0.6010    0.6010
         4  -99.0000    1.2040
     Theta   -0.0100    0.0100

Results for different hypotheses (fixed values in parentheses)

            Hypothesis    a1    a2    a3        t1      t2         lnL
----------------------------------------------------------------------
H1       Heterogeneity  0.65  0.35  0.00    -0.010   0.010      8.9453
H2           Het, a3=0  0.65  0.35   (0)    -0.010   0.010      8.9453
H3           Het, a2=0  0.70   (0)  0.30    -0.010   (-99)      5.9688
H4           Het, a1=0   (0)  0.40  0.60     (-99)  -0.010      1.7107
H5   Homogeneity, a1=1   (1)   (0)   (0)   -99.000   (-99)      0.0000
H6   Homogeneity, a2=1   (0)   (1)   (0)     (-99) -99.000      0.0000
H7   Homogeneity, a3=1   (0)   (0)   (1)     (-99)   (-99)         (0)

  Evidence for heterogeneity (H1 vs. H5/6/7):
    Difference in Ln(L)     =     8.9453
    Lik. ratio for heterog. =  7671.7558

  Evidence for heterogeneity (H1 vs. H3/4):
    Difference in Ln(L)     =     2.9765
    Lik. ratio for heterog. =    19.6190

Family   Conditional prob. of being
  no.     type 1   type 2   type 3  (under heterogeneity, H1)
   1      1.0000   0.0000   0.0000
   2      1.0000   0.0000   0.0000
   3      0.6500   0.3500   0.0000
   4      0.0000   1.0000   0.0000

     The program output shows positive log likelihoods for
hypotheses H1 through H4.  It may come as a surprise that formal-
ly the program also indicates non-null likelihoods (log likeli-
hood not equal to -99, ie. not equal to negative infinity) for
hypotheses H5 and H6, that is, the data are compatible with
homogeneity (locus in all families on chromosome 1 or chromosome
2).  This is so because the likelihood is maximized over theta =
0 and theta = 0.5;  the estimated theta values are then 0.5
(indicated by -99.000 in the output) when theta=0 is incompatible
with the data.  Of course, for fixed values, theta=0, the data
are incompatible with hypotheses a1=1 and a2=1, but the HOMOG3R
program does not work with fixed theta values.
     Two ways of measuring evidence for heterogeneity are distin-
guished:  1.) Any heterogeneity (H1, 2 alpha parameters estimat-
ed) versus strict homogeneity, either on chromosome 1 or chromo-
some 2 or elsewhere (no alpha parameters estimated), and 2.) any
heterogeneity versus homogeneity on chromosomes 1 or 2, where
under each hypothesis a proportion of unlinked families is
allowed for (that proportion, a3, is treated as a nuisance param-
eter).  The latter test specifically adresses the question of
heterogeneity between chromosomes 1 and 2, irrespective of
heterogeneity between known chromosomes and locations elsewhere.

POINT4 program

     The POINT4 program is interactive and calculates the log
likelihood at specific parameter values for a mixture of up to 4
family types.  It reads input files in the regular format for the
HOMOG program.  When files in the format for HOMOG3R are to be
used by POINT4, the two numbers on the second line indicating
numbers of recombination fractions must be replaced by a single
number, which is the sum of the previous two numbers.
     To use the program, you will have to furnish 4 values of
alpha (proportions of family types), eg, 0.23  0.77  0  0  for
two components.  Also, you need to specify "theta" values. 
However, rather than the actual recombination fractions, the
program expects the consecutive (integer) numbers corresponding
to the theta values given in the input file, for example, 3 for
the third theta value.  To specify a recombination fraction of
50%, enter a number outside the range of numbers of theta values,
eg. 0.  The theta numbers corresponding to an alpha=0 are irrele-
vant.

MTEST program

     The MTEST program implements Morton's likelihood ratio test
for heterogeneity of the recombination fraction among different
groups of families (Morton 1956).  Each group consists of a
certain number of families, eg, the groups may correspond to
investigators, or to countries of origin.  Also, each family may
be regarded as forming a group of its own (Morton's original
usage of the test). The test assumes homogeneity within each
group (same theta).  The null hypothesis specifies overall
homogeneity while under the alternative hypothesis of heterogene-
ity, a potentially different theta value exists for each group.

Files used by the program have the following fixed names:
     MTEST.DAT is the input file.  It has the same structure as
the input file to the HOMOG program.  A sample MTEST.DAT file is
provided.
     MTEST.OUT is the output file.
     MTEST.GRP is an input file holding the family group defini-
tions. The first line contains the number, NGR, of groups to
follow on subsequent lines.  On each of the following NGR lines,
family numbers are given that form one group, eg, 3 11 12 15.
Contiguous family numbers may be given in abbreviated form, eg,
numbers 7 through 11 may be given as -7 11.
     The first line and the following NGR lines define one set of
groups.  As many such sets may be given as desired.  An example
MTEST.GRP file is provided.  Note that after the number NGR of
groups, a title may follow on the same line, but there must be at
least one space between NGR and the title.
     If NGR=0 is given as the number of groups, this is taken to
indicate that each family should form one group of its own
(original usage of Morton's test).  In that case, no family
numbers are to be provided.
     The last line of the  MTEST.GRP file should contain the
number -1 to indicate the end of input.

REFERENCES

     Morton NE (1956) The detection and estimation of linkage
between the genes for elliptocytosis and the Rh blood type.  Am J
Hum Genet 8, 80-96
     Ott J (1986) Linkage probability and its approximate confi-
dence interval under possible heterogeneity.  Genet Epidemiol
Suppl 1, 251-257
     Ott J (1991) Analysis of Human Genetic Linkage, revised
edition.  Johns Hopkins University Press, Baltimore
     Terwilliger JD, Ott J (1994) Handbook of Human Genetic
Linkage.  Johns Hopkins University Press, Baltimore