FASTMAP DOCUMENTATION 

        by David Curtis 

        
        FASTMAP implements an algorithm to provide an approximate 
        multipoint lod score for a disease against a number of markers 
        from supplied two point lod scores. At time of writing this 
        algorithm has been accepted for publication in Human Heredity 
        (Curtis D & Gurling HMD. A procedure for combining two-point lod 
        scores into a summary multipoint map.  Human Heredity 1993; 43; 
        173-185). You should refer to this for an account of how FASTMAP 
        works and an evaluation of its performance with real and 
        simulated linkage data. The algorithm, program and source code 
        are made freely available, though the source code may not be 
        commercially exploited. However please cite this publication 
        when writing up any work for which you have found FASTMAP 
        useful. 

        FASTMAP takes as input two-point lod scores from a number of 
        markers and as output produces a table of estimated multipoint 
        lods scores, a graph file suitable for graphing these with the 
        Shareware program EASIGRAF (supplied with EASISTAT) and a 
        debugging file which contains additional information about the 
        approximations made. The approximation is produced very quickly, 
        at least in relation to the time taken to produce a full 
        multipoint. Overall the approximation is unbiased and is usually 
        quite accurate, although occasionally there can be be a fairly 
        large difference from the true multipoint lod scores as produced 
        for example by LINKMAP.

        The version currently distributed may be regarded to some extent 
        as a prototype, although I think I have now got it working about 
        as well as I am going to. I would be extremely interested to 
        hear any comments concerning it performance. I also hope that 
        others better qualified than myself may be able to develop the 
        basic algorithm further and I would be glad to assist anyone in 
        explaining how the program is supposed to work. This 
        documentation contains some additional notes about the 
        implementation which were not included in the article submitted 
        for publication. Also included with these files are some more 
        detailed breakdowns of FASTMAP's performance in different 
        simulations, contained in the file FMAPEVAL.DOC.

        
        COPYRIGHT

        I hold the copyright to the source code. I hereby authorise 
        anyone to use, make adjustments to and redistribute this source 
        code provided only that they do not do so for profit and that my 
        original contribution is ackowledged, and that any alterations 
        from the original are clearly marked. Anyone who wishes to 
        distribute the code or programs compiled from it for profit may 
        only do so with prior agreement from me. However the algorithm 
        and ideas embodied in the source code may be freely used by 
        anybody for any purpose. Naturally I would hope that such a 
        person would acknowledge my contribution, and in particular I 
        would urge anyone who finds the procedure helpful to cite the 
        relevant reference. I would also be grateful if anyone who did 
        come up with any useful improvements might keep me informed of 
        them, although I would be very happy to see others take over 
        development of this idea. 

        
        PROGRAM INPUT 

        Input is either from the keyboard (standard input which may be 
        redirected) or from an input file specified on the command 
        line, e.g.: 

        fastmap 
        
        (then input is typed in interactively)

        or:

        fastmap < input.dat 

        or:

        fastmap input.dat 
                
        When input is from standard input the program prompts the user 
        for the required values, but the format of the input is 
        identical regardless of whether it is from the keyboard or a 
        file. 

        Line 1: 

        One to three filenames. The first is for the tabulated output 
        file of lod score[s] at each map position. The second filename 
        if specified is a graph file for input into the EASIGRAF or 
        ACE/gr program. The third filename if specified contains 
        debugging information which reports various aspects of the 
        estimates obtained by the program. 

        Line 2: 

        Values for the minimum and maximum distances (in centimorgans) 
        of the map over which lod scores are to be calculated. If in 
        the next line a number of fixed distances are given, then the 
        only effect of these two values is to define the horizontal 
        scaling of the graph. 

        Line 3: 

        Either: one number, which consists of the number of equidistant 
        points at which the lod score is to be evaluated between the 
        minimum and maximum distance given above. Or: several values 
        giving specific distances (in centimorgans) at which the lod 
        score is to be evaluated. 

        Line 4: 

        The number of pedigrees for which data will be input. If only 
        total lod scores are available then enter 1 here. However 
        FASTMAP should perform better if the individual lod scores are 
        available for each pedigree. You can get these from MLINK by 
        setting byfamily to true and then recompiling.

        Line 5: 

        The name of the disease locus (up to 20 characters), followed 
        optionally by values for the "reliability" with which genotype 
        predicts phenotype. If no value for reliability is input then 
        the program will choose best-fitting values for each pedigree. 
        If one value is input then this value will be used for every 
        pedigree. Alternatively, a number of values equal to the number 
        of pedigrees may be input, in which case each pedigree can be 
        assigned a different value. 

        There then follows for each marker (which should be entered in 
        the order they appear on the map): 

        One line: 

        The name of the marker (up to 20 characters), followed by one 
        value giving the position of the marker on the map (in 
        centimorgans) followed by either one value giving the 
        probability that the marker will be informative for a given 
        meiosis, or alternatively a number of allele frequencies (which 
        should sum to 1) from which a conventional PIC value is 
        calculated by the program. 

        Second and subsequent lines (one for each pedigree): 

        A number of pairs of values for recombination fraction (in 
        ascending order) and observed two-point lod score. To indicate 
        that a marker was uninformative this line should consist of two 
        zeros (separated by a space). If the marker was not tested in a 
        particular pedigree this should be indicated by leaving the 
        line completely blank. 

        Input finishes when the end of file is reached, or when a blank 
        line is encountered instead of a line describing the next 
        marker. Information pertaining to each marker must be entered 
        in the order in which the markers appear along the map - the 
        markers must be in order of ascending distance.


        PROGRAM CONSTANTS:

        The following constants are defined in fastmap.h:

        MAXPEDS     - the maximum number of pedigrees to be used

        MAXMARKERS  - the maximum number of markers to be used

        MAXPAIRS    - the maximum number of pairs of values for 
                      recombination fraction/lod score to be entered on 
                      each line 

        MAXDISTS    - the maximum number of specified distances at 
                      which the lod score can be evaluated (this has no 
                      effect on the number of equidistant points 
                      between the minimum and maximum if that option is 
                      chosen instead) 

        MINFRACTION - value specifying fraction of information from 
                      a given marker which can be discarded, and 
                      fractional overlap between markers which can be 
                      ignored

        If desired these constants can be altered and the program 
        recompiled.

        
        NOTES ABOUT INPUT
        
        1. Reliability values

        The "reliability" value is the probability of observing the 
        "expected" phenotype for a given genotype in one offspring of 
        an informative phase known meiosis - the combined probability 
        of the offspring not being a nonpenetrant carrier nor a 
        phenocopy. It can take values between 0.5 and 1. In the context 
        of the complex pedigree from which the two-point lod scores are 
        obtained, it provides some measure of the extent to which the 
        disease genotype is known for each individual, given all the 
        phenotypic information in the pedigree. In a large complex 
        pedigree, this reliability value may be relatively high despite 
        penetrance values being low or phenocopy rates high. This is 
        because there can often be a fairly high degree of certainty of 
        an individual's genotype, for example because of the pattern of 
        illness in his children. 

        The effect of different reliability values is to alter the 
        sharpness of curvature of graph of expected lod score against 
        recombination fraction. High values produce more sharply peaked 
        curves which (if there are any apparent recombinants) go down 
        to minus infinity at zero recombination, lower values produce 
        flattened out curves.

        If a reliability value is not specified for a pedigree, FASTMAP 
        will find the value which gives the best fit to the input lod 
        score values for all the markers. (Note that reliability values 
        can only be fitted if at least one marker contains more than 
        two pairs of recombination fraction/lod score values, otherwise 
        a reliability value of 1 will be chosen.) If you are dealing 
        with an incompletely penetrant disease or one with phenocopies 
        you should begin by letting FASTMAP generate fitted values for 
        the reliability. Such a fitted value is constrained to lie 
        between 0.51 and 0.99. If you are dealing with a fully-
        penetrant trait then you may wish to specify a reliability of 
        1. 

        Fitting the reliability values takes a considerable amount of 
        time compared to the rest of the procedure. FASTMAP outputs the 
        values that it has chosen, and if you find that with different 
        markers the same pedigree always produces about the same 
        reliability value then you can save time by specifying this 
        value in the input file. If every pedigree has the same value 
        then you can just specify one value instead of one for each 
        pedigree. I find that with moderately complex pedigrees a value 
        of 0.99 is appropriate even when dealing with a disease with 
        fairly low penetrance.


        2. PIC values, etc

        Normally, for each marker FASTMAP calculates a conventional PIC 
        value from input allele frequencies. This is supposed to 
        provide a value for the proportion of meioses informative for 
        the disease locus which can be expected to be also informative 
        for the marker. However the user does have the option of 
        entering this probability directly, and there are probably two 
        circumstances when you may wish to do this.

        The first case in which this is desirable is when the two-point 
        lod score has been derived from more than one allelic system. 
        If there are two polymorphic systems at the same locus, or very 
        close to each other, then it may be preferable to calculate 
        two-point lod scores with them jointly (e.g. with MLINK) rather 
        than to enter the results separately into FASTMAP. In this case 
        a joint PIC value should be calculated, for the probability 
        that at least one system will be informative at a given locus. 
        This is PIC=1-(1-PIC1)*(1-PIC2). (The PIC values can be 
        obtained by inspection of the debugging file after the 
        individual markers have been entered with their allele 
        frequencies.)

        The second case when one might want to consider not using the 
        conventional PIC is to my mind much more dubious, and is when 
        dealing with a recessive disease. It is true that for certain types 
        of mating the PIC value does not give the true probability for 
        a meiosis to be informative. For example if two parents who are 
        carriers of a recessive disease have the same genotype and are 
        heterozygotes, and if the disease is known to be in phase with 
        the same marker allele in each parent, then if the child is 
        affected but is heterozygous for the marker we can conclude 
        that there has been one recombinant and one nonrecombinant 
        meiosis. However for a dominant disease we would not be able to 
        conclude anything from the situation of two such heterozygote 
        parents (one affected) producing a heterozygote child. There is 
        thus a case for using a slightly higher value than the 
        conventional PIC when dealing with recessive diseases. However 
        the difference from the conventional PIC is small. It is 
        maximal for a two-allele system with equal allele frequencies, 
        when I calculate that the proportion of matings between to 
        carriers producing affected offspring which are informative is 
        0.469, compared with a conventional PIC of 0.375. However when 
        dealing with a complex pedigree information will additionally 
        be obtained from other types of matings for which the ordinary 
        PIC is probably more appropriate. I would conclude that the 
        size of the effect is likely to be negligible in practice. 
        This view is to some extent supported by the simulations 
        carried out with a recessive disease, which used conventional 
        PIC values but demonstrated performance which was overall at 
        least as good as for a dominant disease. Nevertheless, the 
        option to enter values other than the PIC is available to the 
        user if desired.
        
        3. Recombination fractions and lod scores

        FASTMAP fits a number of recombinant and nonrecombinant meioses 
        to the observed two-point lod scores, and may fit a reliability 
        value as well. There are three distinct ways in which this 
        fitting is accomplished, depending on the number of pairs of 
        values which are entered for recombination fraction and lod 
        score.

        If only one pair of values is entered then this is taken to be 
        for the recombination fraction at which the maximum lod score 
        is obtained. An exact number of recombinant and nonrecombinat 
        meioses which would produce this maximum lod can readily be 
        calculated, contingent on a reliability value. It is only 
        possible to use this form of input if there is an available lod 
        score at some recombination fraction which is positive. In 
        addition it is not possible to fit a reliability value which 
        depends on the curvature of the lod score graph. 

        If two pairs of values are entered then again it is possible to 
        find an exact solution which would produce a lod score curve 
        going through these two points. Again the solution is 
        contingent on the reliability value specified, which cannot be 
        fitted. This option can be used even when the lod scores are 
        all negative. However I would advise against only entering two 
        pairs of values. The reason is that the shape of the actual and 
        fitted curves may not be exactly the same, and it is easy to 
        imagine that producing a solution which passes exactly through 
        the two points specified may be wildly inaccurate at other 
        recombination fractions. 

        When more than two pairs of values are entered, numbers of 
        meioses are chosen to produce a line which most closely 
        approximates to the points specified. This closeness is in the 
        sense that the sum of squares distance between points on the 
        line and observed lod score values is minimised. In this 
        situation a reliability value can be fitted as well as the 
        number of recombinant and nonrecombinant meioses. Because of 
        the way the closeness of fit is measured, it is possible to 
        bias the fitting to give more priority to some recombination 
        fractions than others. For example if many pairs of values at 
        small recombination fractions were entered then more atttention 
        would be paid to getting the line to fit well at small 
        recombination fractions than large ones. Actually, since lod 
        scores at large recombination fractions are relatively small 
        anyway, it is the lod scores at smaller recombination fractions 
        which generally have more effect on the values eventually 
        arrived at. Lod scores at very small recombination fractions can 
        be very large indeed, so you are (strongly) advised to omit 
        these (e.g. at recombination fractions less than 0.01).

        To summarise, my advice for the information to input would be a 
        series of lod score values at different recombination fractions 
        ranging from 0.01 to 0.4. FASTMAP was evaluated using lod scores 
        at 0.01, 0.05, 0.1, 0.2, 0.3 and 0.4 and this gave satisfactory 
        results. If three or more pairs of values are given for at 
        least one of the markers then this allows a reliability value 
        to be fitted to the shape of the curve. Avoid entering strongly 
        negative values at very low recombination fractions to avoid 
        distorting the fitted curve too wildly (the price of this is 
        that the estimate may be inaccurate very close to the marker 
        positions, but this is unavoidable).

        EXAMPLE INPUT FILE:

UPM6DF.OUT UPM6DF.GRP UPM6DF.DBG
-20 60 
100 
3 
UP 
MS5H 0 .2 .2 .2 .2 .2
0.010 -0.9811  0.050 -0.5865  0.100 -0.3641  0.200 -0.1526  0.300 -0.0566  0.400 -0.0127  
0.010 -2.8312  0.050 -1.9729  0.100 -1.4685  0.200 -0.8191  0.300 -0.4055  0.400 -0.1670  
0.010 -2.4945  0.050 -1.7574  0.100 -1.1036  0.200 -0.3999  0.300 -0.1076  0.400 -0.0125  
L6-3 21.2 .43 .57
0.010 -0.0902  0.050 -0.0747  0.100 -0.0579  0.200 -0.0316  0.300 -0.0138  0.400 -0.0034  
0.010 -1.5290  0.050 -0.9314  0.100 -0.6415  0.200 -0.3675  0.300 -0.2186  0.400 -0.1029  
0.010  0.0000  0.050  0.0000  0.100  0.0000  0.200  0.0000  0.300  0.0000  0.400  0.0000  
HD2G 42.4 .24 .76
0.010 -0.0007  0.050 -0.0006  0.100 -0.0005  0.200 -0.0003  0.300 -0.0001  0.400 -0.0000  
0.010 -2.3743  0.050 -1.4876  0.100 -0.9106  0.200 -0.3696  0.300 -0.1342  0.400 -0.0296  
0.010 -0.8302  0.050 -0.7315  0.100 -0.4721  0.200 -0.0962  0.300  0.0329  0.400  0.0282  

        OUTPUT FILES

        FASTMAP produces up to three output files with the names 
        specified on the first line of the input file. 
        
        1. Table output

        The first file ouptut is a simple table of distance against lod 
        score - total lod score and a breakdown by pedigree. Because the 
        lod score may be evaluated at large number of positions (100 in 
        the example above) the pedigrees are arranged in columns, rather 
        than rows as might seem more natural. 

        2. Graph file output

        The latest version of FASTMAP allows preparation of graph files 
        for one of two graphing programs, EASIGRAF which runs under DOS 
        or ACE/gr which runs on workstations and terminals using the X 
        graphics system.

        a) EASIGRAF graphs

        The second file, if specified, is a graph file for input into 
        EASIGRAF, a Shareware graphing program supplied with the 
        EASISTAT package (obtainable from me or the same source as you 
        acquired FASTMAP). This displays a graph of lod score against 
        distance - again both the total lod score and for each pedigree. 
        A neat feature is that it also displays each marker on the same 
        graph. It is run by specifying the name of the graph file on the 
        command line, e.g.:

        EASIGRAF filename.grp
        
        Please consult the EASISTAT documentation for details on 
        how various aspects of the display may be altered. Essentially, 
        you can use the "Axes" menu to control aspects of the labelling 
        and scaling of the X and Y axes, and the "Data" menu to control 
        which columns are displayed from the graph file (the first 
        column corresponds to map distance, the second to total lod and 
        subsequent columns for each pedigree's lod score). If you wish 
        to only display the total lod score this can be done by pressing 
        D for the "Data" menu, then pressing 5 to select select XY 
        columns, then entering 1,2 to graph the second column against 
        the first. Then keep pressing Enter to return to the main menu.

        There are a couple of points worth mentioning specifically. The 
        marker labels are implemented as "floating titles" for EASIGRAF, 
        which means they always appear in the same position on the 
        screen. This means that if you change the horizontal scale of 
        the graph the marker labels will no longer be in the correct 
        position (you can change the vertical scale with no problems). 
        
        When the graph file is first read in by EASIGRAF the horizontal 
        scale is determined by the minimum and maximum distances which 
        were entered to FASTMAP on line 2 of the input file. If the data 
        is regraphed (for instance if you use the "Data" menu to graph 
        just the total lod score against distance, columns 1 and 2 of 
        the graph file) then the graph will be rescaled. The new minimum 
        and maximum distances will then be determined by the smallest 
        and largest distances for which a lod score was calculated. If 
        you selected the option to calculate scores at equidistant 
        points between the minimum and maximum, then the scale of the 
        graph will be unchanged. However if lod scores were only 
        calculated for specific points then the smallest and largest of 
        these distances will determine the new scale and the floating 
        titles may appear in the wrong place. If you wish to change the 
        horizontal scaling of the graph, the best way to do it is to run 
        FASTMAP again with different minimum and maximum distances 
        specified, otherwise the floating titles for the markers will 
        appear in the wrong place. 

        Another point about the marker labels is that if the markers are 
        close together then the labels may overwrite each other. To fix 
        this just alter the vertical position of the relevant floating 
        title. Select "Edit titles" from the "Titles" menu, then select 
        "Edit TITLEF's". Go through pressing Enter till you get to the 
        desired label. Leave the text unchanged, but backspace and 
        change the Y value for the position (e.g. from 0.0 to 0.1) and 
        retype the rotation to 90. Then press Enter and Escape 
        appropriately to return to the main menu. The marker label will 
        be moved up a bit, clear of the other labels. 

        b) ACE/gr graphs

        ACE/gr is a graphing program which runs on workstations and 
        terminals using X (the command used to run this program can be 
        either xvgr or xmgr). ACE/gr was written by Paul Turner and is 
        available in source form from ftp.ccalmr.ogi.edu in 
        CCALMR/pub/acegr. If desired then the graph file specified as 
        the second output file can be produced in a format suitable for 
        display by this program rather than by EASIGRAF (ACE/gr is more 
        powerful than EASIGRAF and will produce higher quality output). 
        In order to specify that the graph file should be produced in 
        ACE/gr format, FASTMAP must be run with the command line switch 
        -x (or under DOS /x). The format is as follows: 

        fastmap [input.dat] [-x[labelpos]]    (Unix)
        
        fastmap [input.dat] [/x[labelpos]]    (MSDOS)

        The -x switch can be followed immediately by a number 
        (labelpos) which determines the position of the marker names
        on the graph. By default the names of the markers will be 
        appear on the graph at a height equal to a lod score of -10, but 
        this can be changed by specifying a different value for 
        labelpos. For example, to have the marker names appear above the 
        graph at a height equal to a lod score of 3, one would enter:

        fastmap input.dat -x3

        If you have the example files EPLDALL.INP and UPM6DF.INP then 
        appropriate commands are:

        fastmap epldall.inp -x1

        and:

        fastmap upm6df.inp -x-17

        The graph is very similar to the one produced for EASIGRAF. 
        However with ACE/gr it is possible to rescale the graph both 
        vertically and horizontally because the marker names are placed 
        using the same coordinate system as the data values (instead of 
        occupying fixed points on the screen as with EASIGRAF). In order 
        to display the graph run the program (called xvgr or xmgr) and 
        select the "Read sets" command from the "File" menu. Read in the 
        file. Click on the autoscale button (AS) and the graph of lod 
        scores should be displayed with the markers in the appropriate 
        positions. As with EASIGRAF, it is possible to make adjustments 
        to the final graph either by using the facilities of the program 
        or by editing the graph file before reading it in.
        
         
        3. Debug file

        The output from this is fairly complex, and should be studied in 
        conjunction with the source code and description of the 
        algorithm. A detailed description of its contents is given later 
        in the documentation.


        LOD SCORES ASSUMING HETEROGENEITY

        As well as simply totalling lod scores across pedigrees, it is 
        possible to automatically calculate lod scores under the 
        assumption of heterogeneity - for example that a locus may 
        influence susceptibility to a disease in only a certain 
        proportion of families. This proportion is conventionally termed 
        alpha, and desired values of alpha can be specified using the 
        command line switch -a (or under DOS /a). The format is as follows: 

        fastmap [input.dat] [-aalpha1 [-aalpha2 ...]]  (Unix)

        fastmap [input.dat] [/aalpha1 [/aalpha2 ...]]  (MSDOS)

        Any number of alpha values (up to 10) can be provided. For 
        example to obtain lod scores under the assumptions that 60% or 
        80% of families might be linked one would enter:

        fastmap input.dat -a0.6 -a0.8

        The adjusted lod scores are appended to the others in both the 
        table and graph files. To obtain clear graphs you will probably 
        want to switch off display of the individual lod scores, either 
        by editing the graph files or by using the relevant functions of 
        the graphing programs themselves. 

        Note that there is some debate concerning the statistical 
        properties of the lod score under the assumption of 
        heterogeneity as a test for linkage. In addition, the properties 
        of the FASTMAP approximation have not been explored with regard 
        to this situation. The mean lod score for each family obtained 
        by FASTMAP is fairly unbiased with respect to the true 
        multipoint lod score, and this means that the total lod score 
        will also be unbiased. However, it is possible that if the 
        variance of individual FASTMAP lod scores were markedly 
        increased or reduced compared to the true lod scores then the 
        adjusted lod score obtained under the assumption of 
        heterogeneity might be different to what it would be if a full 
        multipoint analysis were performed. 


        USING FASTMAP IN PRACTICE

        Supplied with these files is a utility program called TABLE 
        which produces the pairs of recombination fractions and lod 
        scores needed to input to FASTMAP. It is run on the output of 
        MLINK, although it does assume that the output from each two-
        point analysis will be in a separate results file. To get these 
        pairs TABLE is run with the /I switch, e.g.:

        TABLE filename.res /I

        This would make a new file called filename.inp containing the 
        pairs of values at recombination fractions between 0.01 and 0.4.

        Of course you would still have to input the additional 
        information about the number of pedigrees, etc. Still things can 
        be made even easier. The setup I have is to have different files 
        containing one line of information about each locus (its name, 
        position and allele frequencies) in one subdirectory. So there 
        might be a file called F13A.INP with the following contents:

        F13A -50 .2 .2 .2 .2 .2

        (You do have to be careful that the file has one and only one 
        line feed at the end of it, otherwise you would get extraneous 
        blank lines in your input file to FASTMAP.)

        Then one can have a couple of simple batch files along the lines 
        of: 

            SETUPINP.BAT
                    
            echo %1.out %1.grp >%1.inp
            echo %2 %3 >>%1.inp
            echo %4 >>%1.inp
            echo %5 >>%1.inp
            echo %6 %7 >>%1.inp
        
        and:

            ADDINP.BAT
 
            type d:\ls4\%2.inp >>%1.inp
            table %3.res /i
            type %3.inp >> %1.inp
        
        These assume that the one line files for each locus are in the 
        directory D:\LS4.

        Then a batch file which will take all the relevant two-point 
        results files, prepare them to make an input file for FASTMAP 
        and run FASTMAP could look like this:

            DOFAST6.BAT

            CALL SETUPINP EPHDALL -80 60 100 25 EPHD 
            CALL ADDINP EPHDALL F13A EPHDF13A
            CALL ADDINP EPHDALL 6S89 EPHD6S89
            CALL ADDINP EPHDALL 6109 EPHDF109
            CALL ADDINP EPHDALL 6105 EPHDF105
            CALL ADDINP EPHDALL 6S10 EPHD6S10
            CALL ADDINP EPHDALL C4 EPHDC4
            CALL ADDINP EPHDALL DQA EPHDDQA
            CALL ADDINP EPHDALL TCTE EPHDTCTE

            FASTMAP EPHDALL.INP


        The call to SETUPINP.BAT produces the first few lines 
        EPHDALL.INP, with no "reliability" value specified. The 
        following lines, call ADDINP.BAT for each marker uses it to take 
        the one line locus description in D:\LS4\F13A.INP etc. and add 
        it to EPHDALL.INP, then run table on EPHDF13A.RES etc. and add 
        e.g. EPHDF13A.INP onto EPHDALL.INP. Finally FASTMAP is run with 
        EPHDALL.INP as input. 

        Of course you don't have to go to these lengths, but as you grow 
        more familiar with FASTMAP you might like to bear these examples 
        in mind.

        Gary Williams at HGMP Harrow has produced the following 
        equivalent shell scripts to prepare input files under Unix. To 
        produce the filename.inp files the TABLE program should be run 
        with a -i switch under Unix:

        table filename.res -i

        Then the following scripts are equivalent to the batch files 
        described above.

        File setupinp:
        
            #!/bin/csh -f
            echo $1.out $1.grp > $1.inp
            echo $2 $3 >> $1.inp
            echo $4 >> $1.inp
            echo $5 >> $1.inp
            echo $6 $7 >> $1.inp
        
        File addinp:
        
            #!/bin/csh -f
            cat $2.inp >> $1.inp
            table $3.res -i
            cat $3.inp >> $1.inp
               
        File dofast6:
        
            #!/bin/csh -f
            setupinp ephdall -80 60 100 25 ephd
            addinp ephdall f13a ephdf13a
            addinp ephdall 6s89 ephd6s89
            addinp ephdall 6109 ephdf109
            addinp ephdall 6105 ephdf105
            addinp ephdall 6s10 ephd6s10
            addinp ephdall c4 ephdc4
            addinp ephdall dqa ephddqa
            addinp ephdall tcte ephdtcte
            fastmap ephdall.inp
        
        
        All these script files should be made executable by the command: 

        chmod +x filename
         
        
        PROBLEMS WITH FASTMAP

        If FASTMAP seems to be producing poor approximations to be 
        performing poorly, there are a number of things you may want to 
        look at. Certainly it may be helpful to examine the debugging 
        file to see if any information gives a clue as to what may be 
        happening. You can check how good FASTMAP is at fitting to the 
        supplied two-point data by only inputting the data for one 
        marker at a time and checking to see how closely the output 
        corresponds to the input. If you have supplied a "reliability" 
        value then it would be worth removing this and letting FASTMAP 
        fit to the supplied lod score values with the reliability 
        uconstrained. Make sure that whenever possible you enter 
        information by pedigree, rather than as total lod scores summed 
        over all pedigrees. However there are some occasions when FASTMAP 
        will not produce a very good approximation, for example if there 
        just happens to be an unexpectedly large number of 
        recombinations between markers, or if two markers just happen to 
        be informative for all the same matings, and so on. I would be 
        interested to see examples of such bad performance, to see if 
        there are any further improvements which could be made. 
        
        
        DETAILED CONTENTS OF DEBUG FILE

        It contains the following information:

        For each marker, the proportion of meioses for which it is 
        expected to be informative. (This may either be input directly 
        by the user, or is the PIC value calculated from the allele 
        frequencies supplied instead.)

        All the following information is repeated once for each 
        pedigree.

        The reliability value is output, which may be supplied by the 
        user or fitted by the program.

        For each marker the estimated equivalent number of recombinant 
        and nonrecombinant meioses that would produce lod scores close 
        to those observed is output.

        The total estimated number of meioses informative for the 
        disease locus is output (based on the estimated number of 
        informative meioses for each marker and the probability of each 
        marker being informative). 

        For each marker, based on this total, the fraction of meioses 
        for which that marker is deemed to be actually informative.

        The following information is repeated once for every interval on 
        the map. Information pertaining to each marker to the right of 
        the disease locus goes into one column, and each to the left in 
        a row. The information consists of the number of recombinant 
        meioses which are expected to be informative for a given marker, 
        and for no other marker between it and the disease locus. 
        
        The top row and leftmost column are for the meioses which are 
        only informative for a marker in the right group or in the left 
        group (but not both). In the top row the number of 
        nonrecombinants with the each right hand marker is printed just 
        above and to the left of the number of recombinants. In the left 
        most column the numbers of nonrecombinants with the each marker 
        is two lines above the number of recombinants. 
        
        The first set of values, which concerns the first interval, will 
        all be in one row. The first pair of numbers is the estimated 
        number of nonrecombinants and recombinants for the first marker. 
        The second pair relates to the second marker, but excludes those 
        meioses for which the first marker is expected to have already 
        been informative, and so on. 

        Reading down each column and along each row into the table one 
        can see the meioses which are expected to informative for a 
        marker in the lefthand group and in the righthand group 
        simultaneously. These meioses are categorised as to whether they 
        are nonrecombinant or recombinant for each marker. Here is an 
        example debug file containing information about 1 pedigree and 3 
        markers:

DQA.prob_inf=0.600000
C4.prob_inf=0.600000
6S10.prob_inf=0.600000
ped   1, "reliability" = 0.990:
DQA:  0.837 nonrec,  0.000 rec
C4:  0.837 nonrec,  0.000 rec
D6S10:  0.000 nonrec,  1.599 rec

Estimated total informative meioses for ped   1: 1.914425
DQA.fraction_used: 0.437224
C4.fraction_used: 0.437224
D6S10.fraction_used: 0.835266


            0.837     0.471     0.000  
              0.000     0.000     0.597


            0.471     0.000  
              0.000     1.006

   0.423    0.364     0.000  
              0.000     0.049
   0.000    0.000     0.000  
              0.000     0.000


            0.000  
              1.006

   0.294    0.000  
              0.543
   0.000    0.000  
              0.000

   0.421    0.000  
              0.048
   0.000    0.000  
              0.000


   0.000 
         
   1.599 
         

   0.294 
         
   0.000 
         

   0.000 
         
   0.000 
         
        The first three lines say that each marker had a probability of 
        0.6 of being informative (this information had been entered 
        directly). The "reliability" value was set to be 0.99. From the  
        observed lod scores, the estimated equivalent numbers of meioses 
        were 0.837 nonrecombinants with no recombinants for the first 
        two markers, and 1.599 recombinants with no nonrecombinant 
        meioses for the third. The estimated total number of potentially 
        informative meioses in the whole pedigree was 1.91, yielding the 
        stated values for the fractions for which each marker actually 
        was informative. (So the third marker, with a higher estimated 
        total number of meioses, turned out to be slightly more 
        informative than expected, while the first two were slightly 
        less.)
        
        The first row shows the likely distribution of these meioses. 
        The first marker has 0.837 nonrecombinants. The second marker 
        has 0.471 remaining from its original 0.837 once we have 
        excluded the ones for which the first was informative. By the 
        time we get to the third marker there remain 0.597 of its 
        recombinant meioses for which neither of the first two were 
        informative. (We expect that some of the meioses which were 
        nonrecombinant at the position of the first and/or second 
        markers may have become recombinant by the time we get to the 
        third. Incidentally, although the distances are not shown in the 
        debugging file there is a recombination fraction of 0.01 between 
        the first two markers and 0.04 between the second and third.)

        Now we move on to the next interval. Here we see that there are 
        0.364 for which the first two markers are both nonrecombinant. 
        The first marker is now in the leftmost group. There are another 
        0.049 meioses for which it is nonrecombinant and the third 
        marker is recombinant, and there are 0.423 meioses for which it 
        is nonrecombinant and no other marker is informative. The third 
        marker is also recombinant for 1.006 meioses which are not 
        informative for either of the first two.

        In the next interval we again see the 1.006 recombinant meioses 
        for which only the third marker is informative. There are 0.543 
        meioses for which it is recombinant and the second marker is 
        nonrecombinant, and another 0.048 for which the first marker is 
        nonrecombinant. There are 0.294 meioses for which the second 
        marker is nonrecombinant and the third uninformative. There are 
        0.421 meioses for which the first marker is nonrecombinant and both 
        the others noninformative.

        In the final interval all markers are now in the lefthand group. 
        We begin with the third marker which has 1.599 recombinant 
        meioses. Excluding these, there remain 0.294 meioses which are 
        nonrecombinant for the second marker. On this occasion we 
        estimate that there are no meioses which are informative for the 
        first marker and neither of the others. 


        NOTES ABOUT IMPLEMENTATION

        FASTMAP.EXE is a DOS executable which should run on any IBM PC 
        compatible running MSDOS. If a maths coprocessor is present it 
        will speed up calculations, but it is not required. I have been 
        running it on a 486 which gives good performance - estimated 
        multipoints using 25 pedigrees and 8 markers with reliability 
        values to be fitted by the program were produced in 70 seconds.
        The Sun SPARCServer I have access to produced the same results 
        in 13 seconds. 

        The file FASTMAP.C is supplied and should compile OK on most 
        compilers with little if any modification. I have compiled it 
        with the Zortech DOS compiler and on a Sun. If you compile it on 
        a DOS machine you may want to ensure that a large stack is 
        provided, and you should use a large memory model so there is 
        room for the data tables. 

        FASTMAP.H begins with a few #defines to control compilation. You 
        may want to modify these for your own compiler. The issues are 
        whether the compiler can accept ANSI C/C++ style prototypes, 
        whether it can use enums (this is pretty unimportant), and where 
        to find a prototype for calloc (mine is in stdlib.h). There may 
        be also be compiler specific ways to modify the stack size, and 
        with the Zortech compiler this is accomplished with the 
        _stack=30000 statement. Some libraries contain the function 
        index() instead of strchr(). Both do the same thing, so you may 
        need to use the "#define strchr index" statement. 

        As well as declaring functions and variables, the header file 
        defines a few program constants (listed above) which can be 
        changed if desired.

        A general point about coding style is that I have tended to keep 
        a fair amount of information in structures, which are passed to 
        functions either by value or reference. This largely reflects my 
        exposure to C++ and an attempt to make the code somewhat object-
        orientated. This and other factors may mean that the code is not 
        as efficient as it could be, but on the other hand it should 
        make it easier to modify if improvements can be found for the 
        basic algorithm. Another slight inefficiency may be the liberal 
        use of doubles rather than floats. A major reason for this is 
        that I have used the old-fashioned argument-passing style so 
        that the code will be compatible with K & R compilers. However 
        ANSI compilers will then report errors if arguments are declared 
        as floats (what happens is that the all float arguments are 
        passed as doubles and that ANSI compilers will not make the 
        automatic cast back to float when this style of argument-passing 
        is used). Since using doubles does not actually incur a 
        prohibitive overhead, I have tended to use them throughout to 
        avoid having to worry about this problem.
        
        I have now commented the code fairly comprehensively, and I hope 
        that in conjunction with the paper it should be possible to work 
        out what is going on. 
        

        AVAILABILITY

        FASTMAP is available directly from me on receipt of a formatted 
        floppy disk. However I would prefer people to obtain it from one 
        of the software libraries on Internet listed below. The EASISTAT 
        package is available from the same sources, but requires another 
        720 K of disk space, so if you wish to obtain it from me then 
        please enclose the appropriate number of extra formatted disks.

        gene-server:
        Internet    gene-server@bchs.uh.edu
        BITNET/
        EARN        gene-server%bchs.uh.edu@CUNYVM
        UUCP        gene-server@bchs.UUCP (new style)
        Send mail with  Subject: SEND DOS HELP
        Anonymous ftp: ftp.bchs.uh.edu (in /pub/gene-server/dos)
        
        The following are mirror sites for the above collection.

        European:
        
        Anonymous FTP: nic.funet.fi
        E-mail: mailserver@nic.funet.fi
        Send mail message: HELP

        European EMBL server:
        NetServ@EMBL-Heidelberg.DE
        Send mail message: DIR DOS_SOFTWARE
        Anonymous ftp: ftp.embl-heidelberg.de (/pub/software/dos)
        Manager: Rainer Fuchs, Fuchs@EMBL-Heidelberg.DE
        Problems: NetHelp@EMBL-Heidelberg.DE

        USA anonymous FTP: ftp.bio.indiana.edu 

        Please feel very free to contact me (email preferred) with 
        comments, questions, etc. I would be very interested in people's 
        views on how well it performs and how useful (or not) it is. 

        Dave Curtis

        Academic Department of Psychiatry
        St Mary's Hospital Medical School
        Praed Street
        London W2 1NY, England                   Phone:  071 725 1638

        Janet:       dcurtis@UK.AC.CRC
        Elsewhere:   dcurtis@CRC.AC.UK
        EARN/Bitnet: dcurtis%CRC@UKACRL
        Usenet: ...!mcsun!ukc!mrccrc!D.Curtis