Documentation for MULTI-ILINK program: This program allows multipoint lod scores to be computed with the marker map as a nuisance parameter (Terwilliger, 1994). It is assumed, for computational simplicity that the marker order is well known, but that the intermarker genetic distances cannot be accurately measured - note that this is normally the case, since typically genetic distances are estimated based on about 100 informative meioses, and if the estimated recombination fraction is 5%, the 95% confidence interval for the true recombination fraction would range from 1.5% to 11.3%, which can lead to substantial errors in multipoint analysis. To see this, consider that it is the multiplicative factor your estimate is away from the true value which is important in the linkage analysis - if you have true genetic distance between two markers of theta, and you have incorrectly decided to use a distance of k*theta, then the likelihood of a recombinant meiosis is k*theta, instead of theta, or k times too large. In multipoint analysis, the effect is more dramatic - if you put the disease locus in between two flanking markers, separated by true distance theta, and theta is small, the probability of a double recombinant, assuming absence of interference, is (theta/2)(theta/2) = (theta)(theta)/4. However, if your estimate of theta is off by a factor of k, the probability of said double recombinant would be computed as (k*theta/2)(k*theta/2) = k^2(theta)(theta)/4, so the probability is off by a factor of k^2. If the estimate is 1/2 the true theta, then this probability is 4 times too small! So, you can see the effect of these errors can be quite dramatic! One way to compensate for this is to compute the multipoint lod scores without making assumptions about the intermarker distances by using the ILINK program. To compute the likelihood under the hypothesis of no linkage, you would have the disease at 50% recombination fraction from the first marker, and then maximize the likelihood in your pedigrees over the distances between each of the marker loci - if there are 3 markers, that would be like this (where 1 = disease (locus 1), 2 = marker 1 (locus 2), etc...) 1-(0.50)-2-(theta1)-3-(theta2)-4 (call this likelihood L0) Maximize the likelihood over theta1 and theta2 to get the null hypothesis likelihood. Then, there are four possible orientations of the disease against this ordered set of marker loci 1-2-3-4 (Likelihood = L1) 2-1-3-4 (Likelihood = L2) 2-3-1-4 (Likelihood = L3) 2-3-4-1 (Likelihood = L4) Under each order, the likelihood is maximized over all three recombination fractions. The multipoint lod score for each order is then just the common log of the maximum likelihood under that order divided by L0, for example, under order 1, the lod score would be just log(L1/L0). The null hypothesis distribution of these is as for normal multipoint lod scores generated with LINKMAP, and exclusion mapping can be performed more justly using this approach, by excluding orders for which the maximum lod score is less than -2. Input files: pedin.dat - LINKAGE format pedigree file containing the disease plus the marker loci (allele numbers format) to be analyzed jointly with ILINK in the above prescribed manner - this file can be generated by extracting the loci to be analyzed in assumed chromosome order from your full pedigree and parameter files, with EXTRACT, for example, and where needed, these data should be downcoded with DOWNFREQ such that the data can be analyzed with ILINK - you are responsible for setting your own ILINK constants as you see fit, and you may use any version of UNKNOWN or ILINK, from LINKAGE or FASTLINK. datain.dat - LINKAGE format parameter file defining the loci in the pedin.dat input file. Output files: milink.out - Estimated recombination fractions under each orientation of disease against the marker map, and LOD scores for each hypothesis. Programs needed: ILINK, UNKNOWN, LSP - You must have these programs installed in the path on your computer before running this program - you may use any version of these programs you like, either LINKAGE (Lathrop et al, 1984) or FASTLINK (Cottingham et al, 1993). To use: From the directory containing these input files, you need to type multi-ilink to run the analyses in foreground, or to run them in background type nohup multi-ilink > multi-ilink.log & Sample analysis: If you try this analysis on the pedin.dat and datain.dat files provided in the multipoint/sample/multi-ilink, you will get an output file like the following: Summary of output from MULTI-ILINK Intermarker genetic distances treated as nuisance parameter. Marker Map Lod 1-- 0.500-- 2-- 0.101-- 3 0.0000 1-- 0.363-- 2-- 0.100-- 3 2.5255 2-- 0.085-- 1-- 0.073-- 3 -17.0851 2-- 0.101-- 3-- 0.260-- 1 6.3842 The intermarker genetic distances changes slightly under each order, and in the order where it was least likely for the disease to be located (lod score = -17.08) the marker loci are pushed apart to a recombination fraction of 0.1456 (the above recombination fractions are estimated assuming no interference, so you must compute the recombination fraction between loci 2 and 3 by first converting the recombination fraction between loci 2 and 1 to Haldane cM, and then the recombination fraction between 1 and 3 to Haldane cM, then add these genetic distances together (remember that recombination fractions are NOT additive themselves), and convert them back to recombination fraction using the Haldane mapping function - in this case yielding a recombination fraction of 14.56%). From this analysis we can see that the most likely location of the disease locus is to the right of marker 3, with a lod score of 6.38 In simulations previously conducted using this method as opposed to LINKMAP as a means of testing and estimating the location of a disease locus in multipoint linkage analysis, the expected increase in the lod score is from 10-40%, and thus this approach is to be preferred over LINKMAP when the necessary computational resources are available. References: Cottingham, R.W., R.M. Idury, A.A. Schaffer (1993) "Faster sequential genetic linkage computations." Am J Hum Genet 53:252-263. Lathrop, G.M., J.M. Lalouel, C. Julier et. al. (1984) "Strategies for multilocus linkage analysis in humans." PNAS 81:3443-3446. Terwilliger, J.D. (1994) "The available possibilities to analyze data of polygenic disease statistically." Abstract to IVth Workshop of the Nordic Human Genome Initiative.