MFLINK documentation 

MFLINK is a simple program which automates setting up likelihood 
calculations for linkage analysis using a variety of different 
transmission models, and then collating the results in order to 
produce a (nearly) model free lod score as described in our 
paper: 

Curtis D,  Sham PC. (1995) Model-free linkage analysis using 
likelihoods. Am J Hum Genet 57: 703-716. 

Please see this paper for a full discussion of the methodology 
and interpretation of results. 


VERSION HISTORY

1.6 Fixed bug which stopped some pedigrees working with vitesse.

1.5 Fixed bug which might have caused a problem with X-linked 
analyses (though apparently didn't).

1.4 Fixed bug to allow X-linked analyses.

1.2 Added VITESSE compatibility.

1.1 Documented grid.dat. Removed blank lines from end of example 
files.


NECESSARY FILES 

MFLINK needs three data files, pedfile.dat, linked.dat and 
unlinked.dat. It also needs a copy of the MLINK program which 
has been renamed to be called NOSCORE. Ideally, this will have 
been recompiled with the "score" constant set to be false. This 
prevents MLINK from outputting the log likelihoods with the test 
interval set to have a recombination fraction of 50%. However if 
recompiling MLINK is a problem, then the standard version of 
MLINK can be used. The first set of log likelihoods produced by 
MLINK (the ones with the recombination fraction set to 50%) will 
then simply be ignored by MFLINK when it reads the MLINK output. 
The only disadvantage is that calculating these likelihoods 
makes the procedure take twice as long. You can get a copy of 
MLINK called NOSCORE just by changing to the directory 
containing it and copying the program file to have the 
appropriate name. If you are using MSDOS then you can issue the 
following command: 

COPY MLINK.EXE NOSCORE.EXE 

Or under Unix: 

cp mlink noscore 

If you do not have the appropriate permission to access the 
program file or wish to save disk space then you can set up a 
Unix script file called noscore or a DOS batch file called 
NOSCORE.BAT which contains the single command "mlink". 
Alternatively, under Unix you can set up a symbolic link called 
noscore which links to the mlink executable. For example, on my 
system the LINKAGE executables are kept in the 
/packages/fastlink/bin directory, so I can do this by issuing 
the following command from a directory on my search path: 

ln -s /packages/fastlink/bin/mlink noscore 

However you set up the NOSCORE executable, it is important that 
it lies on the search path so that it can be run without having 
to specify the name or the directory in which it resides. MFLINK 
runs NOSCORE by executing the C statement: 

system("noscore") 

This will work under DOS and Unix (and I assume other operating 
systems) provided the NOSCORE executable is on the path. 

NB!!! It essential that, when MLINK is compiled to produce 
NOSCORE, the constant "byfamily" is set to be true. This means 
that NOSCORE will output the log likelihoods for each individual 
family rather than the log likelihoods totalled over all 
families. Failing to set "score" to make MFLINK run twice as 
slowly, but if "byfamily" is not set to true then the program 
will fail completely. 


DATA FILES 

The three data files consist of pedfile.dat, which is a standard 
LINKAGE pedigree file suitable for input to MLINK, and two MLINK 
locus data files called linked.dat and unlinked.dat. The two 
files both relate to pedfile.dat, but differ in their 
description of the recombination fractions between loci and 
sometimes the locus order. MFLINK is used to test a specified 
position on the genetic map (see the paper for full details). 
The first file, linked.dat, describes the situation where the 
affection locus is at this test position, while unlinked.dat 
describes the situation when the affection locus is unlinked to 
the marker(s). 

The description of the transmission model for the affection 
locus is used only to obtain the population prevalence for the 
disease. MFLINK calculates the prevalence which would be 
produced from the allele frequencies and penetrance values 
provided, and then uses this value to construct its own 
transmission models. The transmission model parameters provided 
are not used for any other purpose, and should simply be chosen 
to produce the desired prevalence value. Please note however 
that no penetrance values should be set to 0 or 1, because doing 
this might cause UNKNOWN to detect an impossible segregation 
pattern. Please note also that there can only be one liability 
class for the affection locus. 

Here is an example linked.dat file: 

 2   0   0   5  << no loci, risk locus, sexlinked(if 1) 
 0  0.0  0.0  0  << mut locus, mut rate, haplotype freq(if 1) 
 1 2 << order of loci 

 1 2  << affection, #alleles [MDPN] 
 0.9 0.1  << gene freqs 
 1 << number of liability classes 
 0.00100 0.00100 0.50000 

 3 4  << numbered alleles, #alleles [5119] 
 0.25 0.25 0.25 0.25 << gene freqs 

 0 0 
  0.01 
 1 2 1 

This file describes an affection and marker locus, the affection 
locus phenotypes being listed first in pedfile.dat. After the 
definition of the second locus comes the information concerning 
the relative positions of the two loci: here the recombination 
fraction between them is set to 0.01, indicating that the test 
position is at a recombination fraction of 0.01 with the marker, 
which might be appropriate for example if the marker were at a 
candidate gene. The corresponding unlinked.dat file would appear 
as follows: 

 2   0   0   5  << no loci, risk locus, sexlinked(if 1) 
 0  0.0  0.0  0  << mut locus, mut rate, haplotype freq(if 1) 
 1 2 << order of loci 

 1 2  << affection, #alleles [MDPN] 
 0.9 0.1  << gene freqs 
 1 << number of liability classes 
 0.00100 0.00100 0.50000 

 3 4  << numbered alleles, #alleles [5119] 
 0.25 0.25 0.25 0.25 << gene freqs 

 0 0 
  0.5 
 1 2 1 

It can be seen that this file is identical to linked.dat except 
that the recombination fraction is set to 0.5, indicating non-
linkage. 

The situation is slightly more complicated if it is desired to 
test a position between flanking markers, because then the locus 
order must be changed to indicate non-linkage. Suppose we set 
linked.dat to test a position midway between two markers: 

 3   0   0   5  << no loci, risk locus, sexlinked(if 1) 
 0  0.0  0.0  0  << mut locus, mut rate, haplotype freq(if 1) 
 2 1 3 << order of loci 

 1 2  << affection, #alleles [MDPN] 
 0.995 0.005  << gene freqs 
 1 << number of liability classes 
 0.00500 0.50000 0.50000 

 3 5  << numbered alleles, #alleles [PFCC] 
 0.2 0.3 0.1 0.1 0.3 << gene freqs 

 3 5  << numbered alleles, #alleles [DRCC] 
 0.2 0.1 0.3 0.1 0.3 << gene freqs 

 0 0 
 0.05 0.05 
 1 2 1 

Now in order to indicate non-linkage, we need to set the 
affection locus to be on one or other side of the two linked 
markers, at a recombination fraction of 0.5, so we change the 
specified locus order as well as the values for the 
recombination fractions: 

 3   0   0   5  << no loci, risk locus, sexlinked(if 1) 
 0  0.0  0.0  0  << mut locus, mut rate, haplotype freq(if 1) 
 2 3 1 << order of loci 

 1 2  << affection, #alleles [MDPN] 
 0.995 0.005  << gene freqs 
 1 << number of liability classes 
 0.00500 0.50000 0.50000 

 3 5  << numbered alleles, #alleles [PFCC] 
 0.2 0.3 0.1 0.1 0.3 << gene freqs 

 3 5  << numbered alleles, #alleles [DRCC] 
 0.2 0.1 0.3 0.1 0.3 << gene freqs 

 0 0 
 0.1 0.5 
 1 2 1 

The recombination fraction between the two markers has been set 
to 0.1, while the affection locus is at a recombination fraction 
of 0.5 with the second marker. 

If you are used to using LCP or DOLINK then the most convenient 
way to produce these data files is probably to set up a 
conventional linkage analysis and then run it with the 
"nodelete" parameter, e.g.: 

pedin nodelete 

When this parameter is provided the linkage data files are not 
deleted by the shell script, and you will be left with the 
pedfile.dat and datafile.dat files which were used in the 
conventional analysis. It is then a fairly simple matter to edit 
datafile.dat to produce linked.dat and unlinked.dat. (The latest 
version of DOLINK will incorporate a feature to set up MFLINK 
analyses automatically.) 


RUNNING MFLINK 

Once you have set up the three data files correctly, begin by 
copying either linked.dat or unlinked.dat to be called 
datafile.dat. Then run the UNKNOWN program, which produces 
speedfile.dat and ipedfile.dat ready to be used by MLINK (which 
we have renamed NOSCORE). At this point, you might like to try 
running NOSCORE directly to see that there are no errors. If it 
works OK, then run MFLINK, and it will automatically set up and 
run all the necessary likelihood calculations. 

What MFLINK does is to carry out likelihood calculations (using 
NOSCORE) under conditions of linkage and non-linkage for a range 
of transmission models. These models range between Mendelian 
recessive and a null effect and then between Mendelian dominant 
and a null effect. For each transmission model, MFLINK copies 
unlinked.dat to be called datafile.dat, but first alters the 
affection locus specification to reflect the desired 
transmission model, and then calls NOSCORE. It then reads in the 
log likelihoods for each family output by NOSCORE (these are 
contained in outfile.dat). Then it copies linked.dat to be 
called datafile.dat, again respecifying the transmission model 
parameters of the affection locus. It runs NOSCORE again and 
inputs the log likelihoods produced. For each transmission 
model, it thus obtains the overall log likelihood under the 
hypothesis of non-linkage (obtained by totalling all the family 
log likelihoods from the first set output), the log likelihood 
under the hypothesis that all families are linked (obtained by 
summing the log likelihoods from the second set) and the log 
likelihood assuming a proportion are linked (obtained by using 
the standard admixture formula maximised over alpha). 

MFLINK provides the following output in mflink.out: 

The log likelihoods under non-linkage, linkage and admixture for 
each transmission model. (These are often negative infinity for 
the Mendelian dominant model.) 

The maximum lod score assuming homogeneity, defined as the 
maximum difference between the log likelihoods under linkage and 
non-linkage for any transmission model. This is maximised over 
one parameter, the heterozygote penetrance. 

The maximum lod score assuming admixture, defined as the maximum 
difference between the log likelihoods under admixture and non-
linkage for any transmission model. This is maximised over two 
parameters, the heterozygote penetrance and the proportion of 
linked families. 

The "model-free" lod score, defined as the difference between 
the maximum log likelihood obtained under non-linkage for any 
transmission model and the maximum log likelihood obtained under 
admixture for any model. It has one degree of freedom, the 
proportion of linked families. 

Please see the paper for some discussion of the interpretation 
of these results. 


COMPILATION 

The source code is C and consists of mflink.c, linkfile.c and 
linkfile.h. Various constants are defined in linkfile.h and 
mflink.c, but I doubt any will need changing except possibly 
MAXPEDS in mflink.c. This is the maximum number of pedigrees in 
the dataset and is currently set to 200, although the DOS 
executable was actually compiled with a value of 100. I hope it 
should be straightforward to compile these files into an 
executable called mflink: 

cc mflink.c linkfile.c -o mflink -lm 

Let me know if there are any snags. I'll try to produce a proper 
makefile sometime, though it hardly needs one. 


EXAMPLE FILES 

Once you have provided a version of MLINK called NOSCORE by one 
of the means listed above and have if necessary compiled MFLINK 
itself then you should be able to run it on the three example 
files supplied called pedfile.dat, linked.dat and unlinked.dat. 
Just copy one of the locus datafiles to be called datafile.dat, 
run UNKNOWN and then run MFLINK with these files present in the 
current working directory and you should see it run through the 
likelihood calculations. The output file mflink.out will be 
created. On my system, these example files yield a maximum lod 
of 0.643, a maximum admixture lod (lod2) of 1.414, and a "model-
free" lod score of 1.414. These results may vary somewhat 
depending on the version of LINKAGE or FASTLINK you have. Note 
that although in this case the "model-free" lod score is the 
same as the admixture lod score maximised over transmission 
models the former is obtained with only one degree of freedom 
(alpha) whereas the latter incorporates two degrees of freedom 
(alpha and the heterozygote penetrance, which defines each 
transmission model). Thus in this example the "model-free" lod 
score might be taken to provide (a little) more evidence in 
favour of linkage than the admixture lod maximised over models. 


X-LINKAGE 

MFLINK handles X-linked datafiles by following the procedure 
above to define each female transmission model and then taking 
the male penetrances simply to be equal to the female homozygote 
penetrances. This would produce non-constant values for the 
overall male prevalence, but hopefully this would not be a 
serious problem in practice. 


USING MFLINK WITH VITESSE

As of version 1.2, MFLINK has been set up to work correctly with 
VITESSE, the rapid linkage analysis program developed by Jeff 
O'Connell and Dan Weeks. To invoke MFLINK to use VITESSE rather 
than MLINK, use the -v switch on the command line:

mflink -v

This causes mflink to run with two minor differences. Firstly, 
it will read the file voutfile.dat which VITESSE produces rather 
than outfile.dat which MLINK produces. Secondly, it will run a 
program called VNOSCORE rather than NOSCORE, so you must rename 
VITESSE to be called VNOSCORE using one of the methods described 
above. With the version I have of VITESSE, it isn't actually 
possible to recompile it to not calculate the likelihoods at 
theta=0.5. There is a constant called "lodscore" but setting 
this to FALSE doesn't have the desired effect and may have some 
undesirable ones, so VNOSCORE is in fact just the ordinary 
VITESSE program you would use anyway, but with a different name. 
However presumably it may be possible to recompile future 
versions of VITESSE so that the likelihood is only calculated 
for the recombination fractions specified. When you are using 
MFLINK with VITESSE there is no need to copy either of the locus 
data files to be called datafile.dat or to run UNKNOWN first 
because VITESSE does not use the output from UNKNOWN. Apart from 
these minor differences, MFLINK should run with VITESSE in the 
same way as it runs with MLINK.


CUSTOMISED MODELS 

By default, the transmission models tested are as described 
above. Five dominant, five recessive and a null effect model are 
tested. However it is possible to specify a different testing 
procedure if desired. One way to do this is simply to alter the 
number of dominant and recessive models tested and this is done 
by specifying the -n switch on the command line, for example to 
evaluate 10 of each instead of 5 one would enter: 

mflink -n10 

This would result in a finer search. Note that there is no space 
preceding the number. 

A more flexible way to specify which models are tested is to 
create a file called grid.dat which is placed in the same 
directory as the other data files. If MFLINK finds such a file, 
then it will automatically read it and evaluate the sets of 
models specified. Each line in grid.dat specifies a set of 
models according to the following format: 

s0,s1,s2 e0,e1,e2 n 

Here s0,s1,s2 consists of the penetrance values for the starting 
model and e0,e1,e2 the penetrance values for the end model, and 
n models are tested "equally spaced" between these models. The n 
models tested include the starting model but do not include the 
finishing model. To (hopefully) clarify this a bit, here's how 
grid.dat would look to specify the default analyses for a 
disease with population prevalence of 0.1: 

 0,0,1  0.1,0.1,0.1  5 
 0.1,0.1,0.1  0.1,0.1,0.1  1 
 0,1,1  0.1,0.1,0.1  5 

The first line specifies that one starts with a Mendelian 
recessive model and then also evaluates another 4 (making 5 in 
total) models "towards" but not including the model of null 
effect. The second line specifies that the null effect model 
itself be evaluated since the starting and ending points both 
define this model - the starting point will be evaluated and the 
ending point won't. The third line specifies that 5 dominant 
models be evaluated, again not including the null effect model. 

Now suppose that one were worried that one might miss something 
by not evaluating codominant models. Then one might wish to 
create the following grid.dat file: 

 0,0,1 0.1,0.1,0.1 5 
 0.1,0.1,0.1 0.1,0.1,0.1 1 
 0,1,1 0.1,0.1,0.1 5 
 0,0.5,1 0.1,0.1,0.1 5 

The additional final line means that a range of codominant 
models will be evaluated, again beginning with a model having 
complete homozygote penetrance and no phenocopies, and moving 
towards, but not including, the model where the locus has no 
effect on risk. 

Other reasons for creating a grid.dat file rather than just 
using the defaults might include wishing to restrict the 
analysis to only a limited set of plausible models, for example 
only dominant models or only models with a certain maximum 
penetrance. Alternatively one might wish to specify a wide 
variety of models in order to more fully cover the parameter 
space. However, as we discuss in our paper, we doubt that 
restricting the search to the default dominant and recessive 
models is likely to incur much risk of failing to generate a 
positive result. Please note that if you do search over a wider 
range of models then the number of degrees of freedom 
incorporated in producing the maximised lod scores may be 
increased, since the search is no longer one-dimensional. 
However the "model-free" lod score itself would still only 
incorporate one degree of freedom. 


ABOUT THE ZIP FILE 

The zip file containing the MFLINK distribution was created 
under MSDOS. This has two implications: all the file names are 
stored in upper case, and the text files (the source and example 
files) have a carriage return and linefeed character at the end 
of each line rather than just having a linefeed. If you are 
unzipping the archive on a Unix system you will probably find it 
more convenient to convert the filenames to lower case - on my 
system this is done by running unzip with the -L switch. You may 
also wish to convert the text files by stripping out the 
extraneous carriage return characters, though leaving them in 
may not have any ill effects - on my system this is done by 
running unzip with the -aa switch. 

I'll try to keep up-to-date copies of the MFLINK package at John 
Attwood's ftp site, ftp.gene.ucl.ac.uk, in 
/pub/packages/dcurtis. Sometimes there may be a slightly more 
recent version available via my homepage, by following the 
"software" link. 

Please feel very free to provide feedback - the software is in 
active development and I'm keen to see how it can be improved. 

Dave Curtis - dcurtis@hgmp.mrc.ac.uk 
http://www.gene.ucl.ac.uk/~dcurtis/ 

Dept Psychological Medicine, Institute of Psychiatry, De 
Crespigny Park, London SE5 8AF, UK. +44 171 919 3536