Documentation for SIMWALK
                     version 1.50 dated 1995.09.01
      A computer program for haplotype and location score analysis
   on pedigrees using random walk and simulated annealing algorithms.
                         Written by Eric Sobel
                         in collaboration with
        Kenneth Lange, Jeffrey R. O'Connell, and Daniel E. Weeks
                               (c) 1995

The latest version of this Fortran 77 (ANSI standard) program can be
obtained via anonymous ftp from watson.hgen.pitt.edu in the directory
pub/simwalk (the URL is ftp://watson.hgen.pitt.edu/pub/simwalk).
For more detailed distribution information, see the file README.150 .
We are maintaining a user e-mail list, so please register by sending
e-mail to dweeks@watson.hgen.pitt.edu or daniel.weeks@well.ox.ac.uk.


ABSTRACT:
 (HAPLOTYPE ANALYSIS)
  The program SIMWALK performs a random walk in the space of legal
  genetic descent states of a pedigree often containing only partial
  phenotyping of any number of codominate marker loci and, optionally,
  one trait locus. The program's input is the pedigree and locus data
  and a marker map. A first legal state is found using an iterative
  genotype elimination technique. Using simulated annealing during a
  random walk gives an estimate for the genetic descent state with the
  largest likelihood, i.e., the best haplotype vector for the pedigree,
  which is the output.

 (LOCATION SCORE ANALYSIS)
  With access to the general pedigree analysis computer package MENDEL
  version 3.30 or later, SIMWALK can perform a location score analysis.
  Location scores indicate the relative likelihood of several positions
  of the trait locus among the marker loci given the pedigree data and
  the marker map. In the location score analysis, with the estimate
  for the most likely genetic descent state as the initial position,
  a random walk is performed using the Metropolis acceptance criterion.
  By sampling from this random walk, a number of completely typed
  representative pedigrees is obtained, proportional to their true
  likelihood. These pedigrees are then used to estimate the location
  score curve for the original pedigree.


COMPILING INSTRUCTIONS:
  Executable versions of this program are available for many common
  platforms at the distribution site. However, since the source code is
  also available at the distribution site, one can create one's own
  executable given a Fortran 77 compiler for your computer system.
  If one does NOT have access to the MENDEL package, then to create the
  SIMWALK executable capable of haplotype analysis
  simply compile together the two files SIMWALK.F and NOMENDEL.F .
  (Under Unix, after obtaining the file Makefile from the distribution
  site, simply type the command 'make' to create a haplotyping SIMWALK.)
  If one does have access to the MENDEL package, then to create the
  SIMWALK executable capable of haplotype and location score analysis
  simply compile together the two files SIMWALK.F and MENDEL.F .
  (When using the Language Systems Fortran compiler to create a
  Macintosh executable, optionally, for ease of use of the executable,
  locate the string 'MAC!' in the file SIMWALK.F and uncomment the
  indicated lines.)


DATA CONSTRAINTS:
  Due to the nature of Fortran 77 (e.g., the lack of dynamic memory
  allocation) constraints on the data must be included in the program.
  These upper bounds can be increased by altering the code (at the
  'PARAMETER' statements) and then recompiling. The program will inform
  the user if the data exceeds any of the upper bounds of the program.
  There is no limit on the number of pedigrees which can be analyzed.
  The default major constraints [and their PARAMETER names] are:
   LOCUS DATA
    Maximum number of marker loci (not including trait) =  23 [MXMKLC]
    Maximum number of alleles per locus                 =  24 [MXMKAL]
    Maximum number of phenotypes per locus              =   8 [MXMKPH]
    Maximum number of genotypes per phenotype           =   8 [MXMKGN]
   PEDIGREE DATA
    Maximum number of founders per pedigree             =  32 [MXFNDR]
    Maximum number of generations per pedigree          =  16 [MXDPTH]
    Maximum number of people per pedigree               = 128 [MXPEO ]
    Maximum number of children per person               =  16 [MXKID ]
    Maximum number of spouses per person                =   7 [MXSPOU]


INPUT FILES:
  There are three input files: the locus file; the pedigree file;
    and the BATCH.DAT file which contains the user's choices for the
    program's parameters. Available at the distribution site are example
    files called respectively: LOCUS.DAT, PEDIGREE.DAT and BATCH.DAT .
    In general these files follow the same format as required by MENDEL.
    Please also see the accompanying file FORMATS.TXT which contains
    a brief description of MENDEL's file format specifications.

  The locus file is in the same format required by MENDEL except:
    all loci must be autosomal;
    all marker loci must be codominant;
    the trait locus, if present, must be the initial locus;
    the trait allele names must be at most 3 characters long;
    the trait genotypes must be unordered.
    The following two conditions, which are not internally verified,
      are also required for the TRAIT locus only:
      If some allele appears within one phenotype in combination with
      more than one allele, then all genotypes containing that allele
      must be compatible with that phenotype, e.g., 1/1 & 1/2 => 1/*,
      where * is a wildcard representing any trait allele.
      If two none overlapping genotypes appear within one phenotype,
      then all genotypes must be compatible with that phenotype,
      e.g., 1/1 & 2/3 => */*.

  The pedigree file is in the same format required by MENDEL.

  The BATCH.DAT file is similar in format to that required by MENDEL,
    i.e., the data are contained in a series of menu-driven choices:
    each instruction is formatted as a BLANK LINE followed by a line
    containing the MENU ITEM NUMBER, in I6 format, followed by the DATA
    values. Each data value is on a separate line, except in item #10.
    Unless otherwise noted each menu item has only one data value.
    The order in which the menu items appear in the BATCH.DAT file is
    arbitrary. Menu item #10 is REQUIRED, all others are optional.
    If menu item #4 is set to no, i.e., one wishes to find haplotypes
    as opposed to finding location scores, then the following menu
    items will have no effect: #1, #11, #12, #14(part 2), #15(part 2),
    #16(part 2), #17(part 2), and #19.
    The menu items are:
     #1) Problem title.
          format A40 [Default value: Linkage Analysis By Random Walk]
     #2) Locus input file name.
          format A12 [Default value: LOCUS.DAT]
     #3) Pedigree input file name.
          format A12 [Default value: PEDIGREE.DAT]
     #4) Should a location score analysis be performed?
          format A1 (Y or N) [Default value: N (i.e., only haplotying)]
     #5) An integer label for this run of the program. This label will
         be appended onto the names of the output files to make them
         unique. For example, if the label is nn, then for pedigree
         number mmm the haplotype analysis will be in file HAPLO-nn.mmm
          format I6 [Default value: 1]
     #6) Female symbol and male symbol (NOT case sensitive).
          format A1 [Default values: F and M]
          (Number of lines of data = 2.)
     #7) Number of quantitative variables.
          format I6 [Default value: 0]
     #8) Is there a trait locus listed in the locus and pedigree files?
         If so, it must be the initial locus.
          format A1 (Y or N) [Default value: Y]
     #9) Reordering of the MARKER loci from the order in the input
         locus and pedigree files to the genomic order. It is assummed
         the trait, if present, is already in the initial position.
         The trait is considered to be in position 0. The markers are
         said to be in positions 1,...,#-of-marker-loci. Labelling
         the lines of this menu item 0,1,...,#-of-MARKER-loci:
         line 0 has the number of markers, not including the trait;
         line j has the GENOMIC marker position for the marker
         appearing in the input files at position j. All output files
         and messages use the genomic ordering for the marker loci.
          format I6 (all lines) [Default values: same order as input]
           (Number of lines of data = number-of-MARKER-loci + 1.)
    #10) Recombination frequencies between the markers in their GENOMIC
         order, i.e., after they have been reordered, if necessary
         (see above). These parameters are REQUIRED. The markers are
         said to be in positions 1,...,#-of-marker-loci. Labelling
         the lines of this menu item 0,1,...,(#-of-MARKER-loci)-1:
         line 0 has the number of markers, not including the trait;
         line j has the recombination frequency between the GENOMIC
         MARKERS j and j+1, for females and then males.
          format I6 (line 0) & 2F8.5 (other lines) [No default values]
           (Number of lines of data = number-of-MARKER-loci)
    #11) Is this a continuation of a previous analysis whose results
         are in the partial-results file and into which one wishes to
         include additional pedigrees?
         (The pedigree file should now contain only the additional
         pedigrees and the locus and batch files should be identical to
         the earlier run except for this menu item and perhaps menu
         item #5. By changing the run-label in menu item #5 no output
         files will be overwritten. All references to the number of a
         pedigree in any output file or error message will reflect all
         previous pedigrees which were part of this continuation.)
          format A1 (Y or N) [Default value: N]
    #12) Number of sampled pedigrees to find for each original pedigree
          format I6 [Default value: 1000]
    #13) Number of parallel runs, i.e., the number of complete runs
         starting from the initial pedigree. At completion the single
         best result found over the set of parallel runs is reported.
          format I6 [Default value: 1]
    #14) Multiplicative factor for the number of steps:
         (1) between temperature changes during simulated annealing and
         (2) between realizations during the location score random walk
         (The number of steps = max{1000, MF*TA*P} where
         MF=this multiplicative factor &
         TA=total number of alleles over all markers &
         P=number of people in the current pedigree.)
          format I6 [Default values: 10 and 10]
          (Number of lines of data = 2.)
    #15) Mean number of transitions per step:
         (1) during simulated annealing and
         (2) during the location score random walk.
          format F6.2 [Default values: 2 and 2]
          (Number of lines of data = 2.)
    #16) Fraction of time the next transition within the same step will
         pivot on a neighboring person and locus of the previous pivot
         (1) during simulated annealing and
         (2) during the location score random walk.
          format F6.2 [Default value: 0.5 and 0.5]
          (Number of lines of data = 2.)
    #17) Multiplicative factor of the relative weight given untyped
         people versus typed people when choosing the pivot person:
         (1) during simulated annealing and
         (2) during the location score random walk.
          format I6 [Default value: 10 and 10]
          (Number of lines of data = 2.)
    #18) Output the individual pedigrees into the files INPED-nn.mmm ?
         The pedigrees will reflect any reordering of the loci, any
         renaming of the alleles and any obligate phenotype additions.
         Here nn is the integer label for this run of the program
         and mmm is the number of the pedigree in this run.
          format A1 (Y or N) [Default value: Y]
    #19) Output the location scores computed from each original
         pedigree individually in files SCORE-nn.mmm ?
         Here nn is the integer label for this run of the program
         and mmm is the number of the pedigree in this run.
          format A1 (Y or N) [Default value: N]
    #20) Output simulated annealing results in files HAPLO-nn.mmm ?
         Here nn is the integer label for this run of the program
         and mmm is the number of the pedigree in this run.
          format A1 (Y or N) [Default value: Y]
    #21) Create the PDRAW-nn.DAT file containing the estimate of the
         best haplotype vector for each pedigree in a format compatible
         with PEDPREP and thus with Ped/Draw?
          format A1 (Y or N) [Default value: N]
    #22) Include the trait locus during the simulated annealing,
         i.e., include the trait in the haplotype analysis?
         The trait is placed midway in each requested marker interval;
         see the following menu item to specify the range of intervals.
          format A1 (Y or N) [Default value: N]
    #23) First and last marker intervals in which to place the trait
         during annealing, where 'j'=interval between markers j & j+1.
         Under the default values, there will be (#-of-marker-loci)+1
         runs, each placing the trait locus in a different interval.
         Upon completion, the haplotyping results with the trait locus
         in the best supported interval is reported.
         Clearly this menu item is only relevant if the trait is to be
         included in the haplotype analysis (see previous menu item).
          format I6 [Default values: 0 and number-of-marker-loci]
          (Number of lines of data = 2.)
    #24) Number of haplotypes such that if a pedigree has more than
         this number of haplotypes with at least two recombinants each,
         then it will be placed in the RERUN-nn.PED file.
          format I6 [Default value: 2]
    #25) Number of temperature changes in simulated annealing.
          format I6, [Default value: 1000]
    #26) Factor by which the temperature changes in simulated annealing
          format F6.2, [Default value: 0.99]
    #27) Initial temperature.
          format F6.2 [Default value: 500.0]
    #28) Number of pre-simulated annealing steps.
          format I6 [Default value: 0]
    #29) Number of random steps between free runs.
          format I6 [Default value: 0 (i.e., no free runs allowed)]
    #30) Random seeds: three integers from the interval [1, 30000].
          format I6 [Default values: 27713, 2321 and 18777]
          (Number of lines of data = 3.)
    #40) Accept the problem, i.e., end of data file.


OUTPUT FILES:
  All output files reflect any requested reordering of the marker loci.
  Also for those markers with allele names longer than 3 characters
  (2 characters if the PDRAW-nn.DAT file is requested to be created),
  all their alleles are renamed sequentially starting with 1.
  After the program runs, with the user-specified label nn,
  several of the files from the following list may be available.

 (GENERAL OUTPUT FILES:)
  The ERROR.OUT file contains any error messages which were generated.
    The run completed SUCCESSFULLY only if this file does NOT exist
    after the run finishes.
  The INPED-nn.mmm files contain, in MENDEL pedigree file format,
    the original pedigrees, one per file. The pedigrees will reflect
    any reordering of the loci, any renaming of the alleles and any
    obligate phenotype additions to the pedigree.
    Creation of these files is controlled through menu item #18.
    Here nn is the integer label for this run of the program
    and mmm is the number of the pedigree.

 (HAPLOTYPING OUTPUT FILES:)
  The HAPLO-nn.mmm files contain the result of the simulated annealing
    haplotype analysis on each of the original pedigrees.
  The PDRAW-nn.DAT file contains, in a form suitable for PEDPREP,
    all the best-haplotype pedigrees from the simulated annealing runs.
    Creation of this file can be controlled through menu item #21.
  The QUICK-nn.ALL file contains a quick view of the haplotype analysis
    for each pedigree. The allele source information for each non-
    founder is given, showing the locations of the recombination events.
  The RERUN-nn.BAT file contains, in this program's BATCH.DAT format,
    the program's menu items necessary to rerun this program using the
    RERUN-nn.LOC and RERUN-nn.PED files.
  The RERUN-nn.LOC file contains, in a form suitable for rerunning this
    program, the locus file exhibiting the reordered marker loci.
  The RERUN-nn.PED file contains, in a form suitable for rerunning this
    program, the original pedigrees whose best-haplotypes had above the
    user-specified (in menu item #24) number of recombinants.
  The TABLE-nn.OUT file contains a summary table of the results of the
    haplotype analysis.

 (LOCATION SCORE OUTPUT FILES:)
  The PARTIALR.OUT file contains, in a form suitable for rerunning this
    program with additional pedigrees, the total location scores found
    up to the last completed pedigree, i.e., the partial results.
  The SCORE-nn.ALL file contains, in MENDEL output format, the overall
    location scores and the SCORE-nn.mmm files contain the location
    scores computed from each original pedigree individually.
    Creation of the latter files is controlled through menu item #19.
  The TRANS-nn.OUT file contains some statistics on the transitions
    attempted during the location score random walk and their effects.

  Several additional files are generated during execution then deleted.


 (LEGENDS FOR HAPLOTYPING OUTPUT FILES:)
  In the HAPLO-nn.mmm files the best-haplotype pedigree is written
  in MENDEL format with the following information included in order,
  for each person at each locus.
     The inferred maternal allele.
     A separator which indicates the recombination events in the
       SUBSEQUENT interval: | = no recombination;
                            / = recombination in maternal haplotype;
                            \ = recombination in paternal haplotype;
                            + = recombination in both     haplotypes.
     The inferred paternal allele.
     An asterisk if the phase at this locus is NOT fixed by the parents.
     The source of the maternal allele: 1 = mother's maternal allele;
                                        2 = mother's paternal allele.
     The source of the paternal allele: 1 = father's maternal allele;
                                        2 = father's paternal allele.
     The phenotype at this locus in the original pedigree file.
  Following the pedigree data in the HAPLO-nn.mmm files are some 
  summary statistics on this estimate of the best haplotype vector.

  In the QUICK-nn.ALL file each pedigree is included. For each
  non-founder there are two lines of data after the trait phenotype.
  The first line is the maternal marker haplotype's source information
  and the second is the paternal marker haplotype's source information.
  A '1' indicates a grand-maternal origin for the allele at this locus;
  a '2' indicates a grand-paternal origin for the allele at this locus.
  Thus a change in either haplotype from 1 to 2 or from 2 to 1 indicates
  a recombination event in that marker interval.

  In the PDRAW-nn.DAT file the output is similar to the HAPLO-nn.mmm
  files except that the asterisk used to designate whether the
  inferred phase is fixed may take on three values:
  ! = original phenotype was unknown but inferred phase is     fixed;
  * = original phenotype was   known but inferred phase is NOT fixed;
  & = original phenotype was unknown and inferred phase is NOT fixed.
  When the PDRAW-nn.DAT file is processed by PEDPREP and then the
  pedigree displayed by the Macintosh program Ped/Draw, these symbols
  are visible while the original phenotype is not visible.


USAGE NOTES:
  Please see the file EXAMPLE.TXT for an annotated example haplotyping
  session using SIMWALK.

  Since SIMWALK uses simulated annealing to search a space of often
  immense size, it may not converge to the best answer on the first run.
  It may be necessary to run SIMWALK several times on your data in order
  to be assured of finding the optimal haplotype configuration.
  If you do rerun the program with the same data and parameters,
  then remember to alter the seeds to the random number generator
  (see menu item #30); otherwise the results will be identical.
  Also you may wish to change the run label (see menu item #5)
  so that the new results do not overwrite the old output files.

  To use SIMWALK on data in LINKAGE-format, first extract the disease
  locus and the codominant marker loci using the program lsp from the
  LINKAGE package; this creates the files datafile.dat and pedfile.dat.
  Next run LINKMEND to convert from LINKAGE-format to MENDEL-format.
  This will create the files locus.dat and pedm.dat. (Remember that
  Unix systems have case sensitive file names.) Now create a BATCH.DAT
  file following the instructions above. Finally, run SIMWALK.

  To draw the pedigree data using the Macintosh program Ped/Draw,
  have SIMWALK produce a file called PDRAW-nn.DAT (see menu item #21).
  Run this file through PEDPREP to generate a Ped/Draw-format data file.

  (The programs LINKMEND and PEDPREP may be obtained via anonymous ftp
  from watson.hgen.pitt.edu .)


REFERENCES:
 If you publish results generated by SIMWALK, then please cite the first
 two articles from the following reference list.

  Sobel E, Lange K, O'Connell JR and Weeks DE (1995)
      Haplotyping algorithms; in "Genetic Mapping and DNA Sequencing"
      (IMA Volumes in Mathematics and its Applications,
      Speed TP and Waterman MS, editors)
      Springer-Verlag, New York (in press).
  Weeks DE, Sobel E, O'Connell JR and Lange K (1995)
      Computer programs for multilocus haplotyping of general pedigrees
      Am J Hum Genet 56:1506-1507.
  Sobel E and Lange K (1993)
      Metropolis sampling in pedigree analysis
      Stat Meth in Med Res 2:263-282.
  Lange K and Sobel E (1991)
      A random walk method for computing genetic location scores
      Am J Hum Genet 49:1320-1334.
  Lange K and Matthysse S (1989)
      Simulation of pedigree genotypes by random walks
      Am J Hum Genet 45:959-970.
  Lange K, Weeks DE and Boehnke M (1988)
      Programs for pedigree analysis: MENDEL, FISHER and dGENE
      Genet Epidemiol 5:471-472.
  Lange K and Goradia T (1987)
      An algorithm for automatic genotype elimination
      Am J Hum Genet 40:250-256.


Finally, please send any bug reports, queries, suggestions or comments
to one of the addresses below.

Thank you,

  -- Eric Sobel and Dan Weeks --
_________________________________________________________________

Daniel E. Weeks
 The Wellcome Trust Centre           Department of Human Genetics
 for Human Genetics                  University of Pittsburgh
 University of Oxford                Crabtree Hall, Room A310
 Windmill Road                       130 DeSoto Street
 Oxford OX3 7BN                      Pittsburgh, PA 15261
 U.K.                                U.S.A.

 Tel: (+44) 865 740 043 (desk)
 Tel: (+44) 865 742 441 (main)       Tel: 1 412 624-3066
 Fax: (+44) 865 742 196              Fax: 1 412 624-3020
 e-mail: daniel.weeks@well.ox.ac.uk  e-mail: dweeks@watson.hgen.pitt.edu


Eric Sobel
 Department of Biomathematics
 School of Medicine
 University of California, LA
 10833 LeConte Avenue
 Los Angeles, CA 90095-1766
 U.S.A.

 Tel: 1 310 825-9623
 Fax: 1 310 825-8685
 e-mail: esobel@ucla.edu