Likelihood - Based Disequilibrium Analysis Programs  (2/21/96)

----------------------------------------------------------------------

UPDATE - 21 February 1996:

Support interval calculations added for lambda and theta in dismult.
Maximization routines improved in DISMULT and DISLAMB 
Haplotype Relative Risk extensions included to read LINKAGE files.

I wish to acknowledge some assistance in detecting a bug in the 
maximization routine discovered by Michael Knapp, and some programming
advice from Alejandro Shaffer.

----------------------------------------------------------------------

The algorithms used in these programs are described in detail
in the following paper (which should be cited whenever these
programs are used):

Terwilliger, JD "A Powerful Likelihood Method for the Analysis of
Linkage Disequilibrium Between Trait Loci and One or More 
Polymorphic Marker Loci" American Journal of Human Genetics 
56:777-787 (1995).

Terwilliger, JD "Response to Sham et al." Am J Hum Genet (1996) 
(In Press).

In this document I will only discuss technical issues regarding
their use.   Versions are included for either DEC machines or SUN
machines.  Where other UNIX versions differ, you may need to modify
some of the file handling routines on your own.  Compiled code is
provided with statically linked libraries in case you have no Pascal
compiler on yout system for OSF, SUNOS, and SOLARIS.

In SUN Pascal, you should compile as follows:

pc -L -O4 -o dislamb dislamb.p
pc -L -O4 -o dismult dismult.p

and for DEC Pascal versions, under OSF it is recommended to compile with

pc -O4 -C all -o dislamb dislamb.p
pc -O4 -C all -o dismult dismult.p

And under VMS, just compile and link with

pas dislamb/nocheck
link dislamb/notraceback
pas dismult/nocheck
link dismult/notraceback

Then everything should work.  If you still have trouble, contact me
by sending EMAIL to joe@well.ox.ac.uk


DISLAMB is to be used for single locus association analysis.  Just
enter the data interactively - if the instructions are confusing
let me know and I will try to make it more clear.  


Consider the following dataset, using an assumption that the disease allele
frequency is 0.001 (so that case sample provides little information for
estimation of population allele frequencies), and using the nonparametric
model (this is in general recommended...).  This locus has 13 alleles, as follows:

CASE    10 48 12 1 3 17 2 5 1 2 3 4 1
CONTROL  8  8 10 3 3 15 2 7 3 4 2 5 2

Since the result was significant, I allowed it to estimate the recombination
fraction between marker and disease under different assumptions about alpha
and n - note that these estimates are very much dependent on your assumptions,
and that your population is in equilibrium, so do not place to much faith in
these estimates of map distance, as mentioned in the paper - support intervals 
are given (normally I would use the same stringency as the test in constructing 
support intervals for lambda - typically 3-lod-units. It is important to note 
that the support intervals for theta given do not take into account the user-
input age of mutation in this population.  To construct a support interval 
allowing for this, you must separately compute support intervals for each 
combination of age and heterogeneity parameter you wish to admit.


*************************************************************
*                                                           *
*       Program DISLAMB - Version 2.1   (2/3/96)            *
*                                                           *
*       AJHG 56:777-787 (1995)                              *
*                                                           *
*************************************************************


Disease allele frequency = 0.00100000

=========================================================================================
CASE      | 10. | 48. | 12. |  1. |  3. | 17. |  2. |  5. |  1. |  2. |  3. |  4. |  1. |
CONTROL   |  8. |  8. | 10. |  3. |  3. | 15. |  2. |  7. |  3. |  4. |  2. |  5. |  2. |
=========================================================================================

Estimated parameters for likelihood ratio test:
 Allele frequencies: 
    Allele            H0:             H1
         1     0.09944751     0.12594046
         2     0.30939227     0.12355539
         3     0.12154696     0.15425412
         4     0.02209945     0.02805155
         5     0.03314917     0.04207248
         6     0.17679558     0.22436222
         7     0.02209945     0.02804928
         8     0.06629834     0.08413984
         9     0.02209945     0.02804972
        10     0.03314917     0.04207317
        11     0.02762431     0.03530303
        12     0.04972376     0.06311307
        13     0.01657459     0.02103565
    Lambda     0.00000000       0.361496

LRT Chi-Square =        19.87098 p-value =    0.000004200154388 Lambda =   0.361496
SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY LRT TEST
2 x n table Chi-square =   26.27953 P-value =    0.009797348469226
NO SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY 2 x N TABLE CHI-SQUARE TEST

3-lod-unit support interval for lambda-hat : (0.09,0.58)

Alpha = 0.70   N = 100.   Theta-hat = 0.006587   Support Interval = (0.001853,0.020142)


In this output file, you first get the results for the likelihood-based association
test, which in this case is significant, with a p-value of 0.0000042 (significant
because p < 0.0001).  The estimate of lambda is 0.3615, which is evident of a not
very strong association, even though the p-value is quite low.  This is an important
distinction, since the p-value tells you whether or not there is an association,
while lambda serves as a measure of how strong the association is!  The program also
performd a more standard chi-square test of independence on the 2 x 13 table one
could use to compare case and control allele frequency distributions.  Here, that is
not significant, having a p-value of only 0.0098 (it is assumed here that you have not
proven the existence of a gene through linkage, which normally requires a p-value
of 0.0001 (= Lod Score of 3), so you should have the same stringency in your
association test - if not more, since there is more potential to have a multiple
testing problem with association studies...).  Afterwards, you can see how the value
of lambda = 0.3615 can be translated into recombination fraction under certain
assumptions about the age of this mutation in this population, and its degree of
homogeneity.

DISMULT does the multipoint analysis of association.  The algorithms are
explained in the aforementioned paper, and to use it requires an input
file called diseqin.dat, which is described below.  Again it is not 
recommended to use the genotype data, but rather allele data, since it
is more robust and straightforward.  If you have detailed questions,
send me a message.

Program DISMULT does an association analysis for multiple loci
jointly, assuming a fixed map of the marker loci in either cM or kb
(with some assumption about the relationship between cM and kb, since
disequilibrium is a function of genetic, not physical distance.

The input file should be called DISEQIN.DAT

1) On the first line, enter the Disease Allele frequency (if
this is smaller than 0.01, it makes little difference what it is
as it is used simply to estimate marker allele frequencies).


2) On the next line enter the number of markers

3) on the next line, enter your assumption about the number of 
   centiMorgans/Megabase (assuming the distances are 
   to be entered in kilobases.  If you are entering the data in
   centiMorgans, then the value of this variable should be
   1000 (since the distances are assumed to be entered in kb...)
   If this number is too small, n will be very large, and if this is 
   too large, the estimates of n will be very small.

4) On one line Enter the distances between markers in kb (or in cM
   if you have entered 1000 as the conversion factor in the preceding line).
   D(1,2)  D(2,3)  D(3,4) ... D(n-1,n)

5) On the next line, enter the number of analysis points per
   intermarker interval (must be <= constant maxpos), and usually 5 is
   sufficient.  This is analogous to number of evaluations in LINKMAP
   program of LINKAGE package. (It should be noted that in the outer intervals,
   10 steps are taken over 10cM out from the map flanking markers)

6) Minimum Value for n - the decay parameter
   - usually 10 or 20 is best.  If you find the
   maximum likelihood at a boundary value of n, try either lowering it, or
   making the markers more tightly linked (i.e. alter the assumption about
   cM/Mb). Note that for points outside the map, this will typically be at
   or near the boundary, since disequilibrium is found a large distance away,
   i.e. at (presumably) the first marker locus or later...

7) Maximum Value for n - the decay parameter
   - usually 250 - 500 is fine.  If you hit a boundary value, 
   raise this no. or make the markers more widely spaced...  When this
   value is high, there is a rapid decay in the association as you move
   small distances from the point you are evaluating the likelihood at.

8) For each marker enter the data in the following format for
allele data

  NUMBER OF ALLELES AT MARKER
  CASE(1) CONTROL(1)
  CASE(2) CONTROL(2)
  CASE(3) CONTROL(3)
  ...

9) Name the file DISEQIN.DAT, save it, and run the program
DISMULT,  Output is in a file called diseq.out

Example DISEQIN.DAT

0.022    Disease allele frequency
3        3 Markers
1        cM/Mb
500  120 Physical Distance between Markers in Kb
5        5 Points per Interval
20       Minimum n
800      Maximum n
2        Number of alleles at locus 1
48 28    Case(1) Control(1)
58 59
3        Locus 2 
70 10
6 44
10 40
5        Locus 3 
73 53
69 35
1 10
2 14
3 31


Output file DISEQ.OUT from the above problem:

To make a graph just plot the Map position against the "Lod Score"
or -2ln(LR) which is interpreted conservatively as 0.5* Chi-square(1df).  
Quantities like alpha and n are the vertical and horizontal control parameters 
described in the paper, and the interval and position are given so that 
you can see where each marker is (i.e. everytime you have position 0, 
you have a new marker locus, the number of which is given in the Interval
column).


*************************************************************
*       Program DISMULT - Version 2.1   (2/5/96)            *
*       AJHG 56:777-787 (1995)                              *
*       Multilocus Linkage Disequilibrium Analysis          *
*************************************************************

 
LOCUS 1

==============|=====|=====|
    ALLELE:   |  1  |  2  |
--------------|-----|-----|
    CASE:     | 48  | 58  |
    CONTROL:  | 28  | 59  |
==============|=====|=====|


LOCUS 2

==============|=====|=====|=====|
    ALLELE:   |  1  |  2  |  3  |
--------------|-----|-----|-----|
    CASE:     | 70  |  6  | 10  |
    CONTROL:  | 10  | 44  | 40  |
==============|=====|=====|=====|


LOCUS 3

==============|=====|=====|=====|=====|=====|
    ALLELE:   |  1  |  2  |  3  |  4  |  5  |
--------------|-----|-----|-----|-----|-----|
    CASE:     | 73  | 69  |  1  |  2  |  3  |
    CONTROL:  | 53  | 35  | 10  | 14  | 31  |
==============|=====|=====|=====|=====|=====|


     Alpha     N   Inter.  Position   Map Position      Lod Score       -2ln(LR)
  1.000000    390.      0         0       -4.25866        0.00000        0.00000
  1.000000     20.      0         1       -0.10034        6.07393       27.97147
  1.000000     20.      0         2       -0.08935        7.45386       34.32631
  1.000000     20.      0         3       -0.07859        8.97969       41.35302
  1.000000     20.      0         4       -0.06807       10.57762       48.71173
  1.000000     20.      0         5       -0.05776       12.08476       55.65239
  1.000000     20.      0         6       -0.04766       13.16856       60.64346
  0.916548     20.      0         7       -0.03775       13.36366       61.54191
  0.760124     20.      0         8       -0.02804       13.35679       61.51031
  0.631592     20.      0         9       -0.01852       13.35003       61.47918
  0.525249     20.      0        10       -0.00917       13.34339       61.44857
  0.437886     20.      1         0        0.00000       13.33684       61.41844
  0.439703     20.      1         1        0.00100       13.65798       62.89733
  0.445691     28.      1         2        0.00199       13.95389       64.26002
  0.999995    314.      1         3        0.00299       18.45487       84.98783
  0.999998    426.      1         4        0.00400       22.16551      102.07595
  0.782908    800.      1         5        0.00500       23.83323      109.75606
  0.782976    800.      2         0        0.00500       23.83322      109.75605
  0.883834    800.      2         1        0.00524       22.65050      104.30941
  0.959204    800.      2         2        0.00548       18.32883       84.40738
  0.521664    142.      2         3        0.00572       15.80475       72.78358
  0.485274     89.      2         4        0.00596       15.11418       69.60335
  0.452230     59.      2         5        0.00620       14.68567       67.63001
  0.452230     59.      3         0        0.00620       14.68567       67.63001
  0.821114     62.      3         1        0.01537       14.69946       67.69351
  0.999994     44.      3         2        0.02472       14.63657       67.40389
  0.999995     30.      3         3        0.03424       14.50510       66.79845
  0.999998     23.      3         4        0.04395       14.40457       66.33549
  1.000000     20.      3         5        0.05386       14.17518       65.27912
  1.000000     20.      3         6        0.06396       13.02926       60.00196
  1.000000     20.      3         7        0.07427       11.41072       52.54832
  1.000000     20.      3         8        0.08479        9.69005       44.62431
  1.000000     20.      3         9        0.09555        8.04766       37.06086
  1.000000     20.      3        10        0.10654        6.56494       30.23266