Documentation for MULTI-ILINK program:

This program allows multipoint lod scores to be computed with the marker
map as a nuisance parameter (Terwilliger, 1994).  It is assumed, for computational
simplicity that the marker order is well known, but that the intermarker genetic
distances cannot be accurately measured - note that this is normally the case,
since typically genetic distances are estimated based on about 100 informative meioses,
and if the estimated recombination fraction is 5%, the 95% confidence interval for
the true recombination fraction would range from 1.5% to 11.3%, which can lead to
substantial errors in multipoint analysis.  To see this, consider that it is the
multiplicative factor your estimate is away from the true value which is important
in the linkage analysis - if you have true genetic distance between two markers of
theta, and you have incorrectly decided to use a distance of k*theta, then the
likelihood of a recombinant meiosis is k*theta, instead of theta, or k times too large.
In multipoint analysis, the effect is more dramatic - if you put the disease locus
in between two flanking markers, separated by true distance theta, and theta is small,
the probability of a double recombinant, assuming absence of interference, is 
(theta/2)(theta/2) = (theta)(theta)/4.  However, if your estimate of theta is off by 
a factor of k, the probability of said double recombinant would be computed as
(k*theta/2)(k*theta/2) = k^2(theta)(theta)/4, so the probability is off by a factor
of k^2.  If the estimate is 1/2 the true theta, then this probability is 4 times too
small!  So, you can see the effect of these errors can be quite dramatic!

One way to compensate for this is to compute the multipoint lod scores without making
assumptions about the intermarker distances by using the ILINK program.  To compute the 
likelihood under the hypothesis of no linkage, you would have the disease at 50%
recombination fraction from the first marker, and then maximize the likelihood in your
pedigrees over the distances between each of the marker loci - if there are 3 markers,
that would be like this (where 1 = disease (locus 1), 2 = marker 1 (locus 2), etc...)

1-(0.50)-2-(theta1)-3-(theta2)-4     (call this likelihood L0)

Maximize the likelihood over theta1 and theta2 to get the null hypothesis likelihood.

Then, there are four possible orientations of the disease against this ordered set of
marker loci

1-2-3-4      (Likelihood = L1)
2-1-3-4      (Likelihood = L2)
2-3-1-4      (Likelihood = L3)
2-3-4-1      (Likelihood = L4)

Under each order, the likelihood is maximized over all three recombination fractions.
The multipoint lod score for each order is then just the common log of the maximum
likelihood under that order divided by L0, for example, under order 1, the lod score
would be just log(L1/L0).  The null hypothesis distribution of these is as for normal
multipoint lod scores generated with LINKMAP, and exclusion mapping can be performed
more justly using this approach, by excluding orders for which the maximum lod score
is less than -2.  

Input files:

pedin.dat - LINKAGE format pedigree file containing the disease plus the marker
	loci (allele numbers format) to be analyzed jointly with ILINK in the
	above prescribed manner - this file can be generated by extracting the
	loci to be analyzed in assumed chromosome order from your full pedigree and
	parameter files, with EXTRACT, for example, and where needed, these data
	should be downcoded with DOWNFREQ such that the data can be analyzed with
	ILINK - you are responsible for setting your own ILINK constants as you see fit,
	and you may use any version of UNKNOWN or ILINK, from LINKAGE or FASTLINK.

datain.dat - LINKAGE format parameter file defining the loci in the pedin.dat input
	file.

Output files:

milink.out - Estimated recombination fractions under each orientation of disease
	against the marker map, and LOD scores for each hypothesis.

Programs needed:

ILINK, UNKNOWN, LSP - You must have these programs installed in the path on your computer
	before running this program - you may use any version of these programs you like,
	either LINKAGE (Lathrop et al, 1984) or FASTLINK (Cottingham et al, 1993).

To use:

From the directory containing these input files, you need to type

multi-ilink

to run the analyses in foreground, or to run them in background type

nohup multi-ilink > multi-ilink.log &


Sample analysis:

If you try this analysis on the pedin.dat and datain.dat files provided in the
multipoint/sample/multi-ilink, you will get an output file like the following:

 Summary of output from MULTI-ILINK 

 Intermarker genetic distances treated as nuisance parameter.

         Marker Map                Lod
  1-- 0.500--  2-- 0.101--  3    0.0000
  1-- 0.363--  2-- 0.100--  3    2.5255
  2-- 0.085--  1-- 0.073--  3  -17.0851
  2-- 0.101--  3-- 0.260--  1    6.3842

The intermarker genetic distances changes slightly under each order, and in the
order where it was least likely for the disease to be located (lod score = -17.08)
the marker loci are pushed apart to a recombination fraction of 0.1456 (the above
recombination fractions are estimated assuming no interference, so you must compute the
recombination fraction between loci 2 and 3 by first converting the recombination
fraction between loci 2 and 1 to Haldane cM, and then the recombination fraction
between 1 and 3 to Haldane cM, then add these genetic distances together (remember
that recombination fractions are NOT additive themselves), and convert them back to
recombination fraction using the Haldane mapping function - in this case yielding a
recombination fraction of 14.56%).  From this analysis we can see that the most likely
location of the disease locus is to the right of marker 3, with a lod score of 6.38

In simulations previously conducted using this method as opposed to LINKMAP as a means
of testing and estimating the location of a disease locus in multipoint linkage
analysis, the expected increase in the lod score is from 10-40%, and thus this approach 
is to be preferred over LINKMAP when the necessary computational resources are available.

References:

Cottingham, R.W., R.M. Idury, A.A. Schaffer (1993) "Faster sequential genetic linkage
	computations." Am J Hum Genet 53:252-263.

Lathrop, G.M., J.M. Lalouel, C. Julier et. al. (1984) "Strategies for multilocus 
     	linkage analysis in humans." PNAS 81:3443-3446.

Terwilliger, J.D. (1994) "The available possibilities to analyze data of polygenic
	disease statistically." Abstract to IVth Workshop of the Nordic Human
	Genome Initiative.