APM(1) APM GENETICS PROGRAMS APM(1) NAME apm - compute the APM statistic for multiple marker loci (from ML files) DESCRIPTION This program is designed to efficiently compute the APM statistic for a number of pedigrees over a number of marker loci. It is capable of storing the results of kinship calculations internally, greatly speeding up computations for different markers on the same pedigree. It can also store kinship information in a file to be used later. Internally storing the kinship values requires large amounts of memory. In order that this program may be used on systems with limited memory, a scheme to limit the amount of information stored has been employed. Unfortunately, some pascal compilers halt with an error when a new() can't allocate memory; this makes the memory limiting very necessary but of limited use. Users with those compilers may have to set a very small limit. The program has also been modified to exclude parent-child pairs from the APM statistic. It has been found that this yields greater power in detecting linkage. LIMITS As distributed, the program can handle up to 400 families with up to 60 members each and 20 marker loci of up to 40 alleles each. Pedigree titles can be up to 80 characters long and locus names can be up to 10 characters long. The maximum number of families, members, loci, and alleles can all be changed by altering constants at the beginning of the program. There are other constants that depend on these, as described in the comments, and they should be changed as well. The problem, of course, with just making these numbers huge is that some compilers have limits on the amount of memory used for variables in each block. INPUT FILE FORMATS The program uses a file format which supports multiple loci. It must include for each locus the locus name, the number of alleles, the allele frequencies, and the genotypes for each of the affecteds. The format of the pedigree data file is: . . . <number of members> <number of affecteds> <number of typed loci> <list of mothers> <list of fathers> <list of all affecteds> <locus number> <list of genotypes> <locus number> <list of genotypes> . . . <title of pedigree #2> . . . The locus numbers are the numerical ID's of the loci as determined from the order of their entries at the beginning of the file (the first locus is numbered 1, the second 2, etc.). The lines containing the locus number and list of genotypes must be ordered so that the locus numbers are increasing. The genotypes are thus arranged in the form of a table, with affected id's along the top in increasing order (moving to the right) and locus id's along the left in increasing order (moving downward); for example, a valid table of genotypes might be: 3 4 5 7 <- list of all affecteds 1 1 1 2 1 1 1 1 2 <- locus number and list of genotypes 3 1 2 1 1 0 0 1 1 <- "" 4 1 1 1 2 1 1 1 1 <- "" which means that, for the first locus, 3 has genotype 1/1, 4 has genotype 2/1, 5 has genotype 1/1, and 7 has genotype 1/2. For the third locus, 3 is typed as 1/2, 4 as 1/1, 5 is untyped, and 7 has genotype 1/1. And so on. The family is untyped for locus 2 and any loci after the fourth. Here is a real example data file: 2 3 3 ACK1 0.450 0.300 0.250 3 ACK2 0.550 0.200 0.250 2 ACK3 0.465 0.535 TESTPED 1 15 3 3 0 0 2 2 2 0 0 0 5 5 5 7 12 12 12 0 0 1 1 1 0 0 0 6 6 6 8 11 11 11 4 10 13 1 2 2 2 2 1 2 2 3 3 0 0 2 3 3 0 0 1 1 1 2 TESTPED 2 8 2 1 0 0 1 1 1 0 6 6 0 0 2 2 2 0 5 5 4 7 2 2 3 3 3 As previously mentioned, apm can produce a file containing kinship data, which it can later reuse. This file maintains the results of summations accumulated in the kinship calculations for each family and for each list of affecteds. Care must be taken that it does not go out-of-date with respect to the pedigree data file; the only changes that can be made in the pedigree data file that will not render the file of kinship data invalid are changes in the locus names, the number of alleles, the allele frequencies, and the list and number of affecteds for any or all pedigrees. If a pedigree name or overall structure is changed, the old data file will no longer be valid (if just the name is changed, it can also be changed in the kinship data file; doing so will make the file usable in combination with the new pedigree file). This is the format of the file of kinship data: <title of pedigree #1> <number of records for this pedigree> <first list of affected members> <E(Z), the overall mean of the mean marker similarities E(Zij)> <the sum of all Phi[(i,j,k,l)]> <the sum of all Phi[(i,j)(k,l)]> <the sum of all Phi[(i,j,k)(l)], Phi[(i,j,l)(k)], Phi[(i,k,l)(j)], Phi[(j,k,l)(i)], Phi[(i,l)(j,k)], and Phi[(i,k)(j,l)]> <the sum of all Phi[(i,j)(k)(l)] and Phi[(i)(j)(k,l)]> <the sum of all Phi[(i,k)(j)(l)], Phi[(i,l)(j)(k)], Phi[(j,l)(i)(k)], Phi[(j,k)(i)(l)]> <the sum of all Phi[(i)(j)(k)(l)]> <second list of affected members> . . . <title of pedigree #2> . . . where Phi[] is the kinship coefficient. An example file is: TESTPED1 3 10 13 1.250000e-01 3.125000e-02 0.000000e+00 3.437500e-01 6.250000e-02 4.375000e-01 1.250000e-01 4 13 6.250000e-02 1.562500e-02 0.000000e+00 2.968750e-01 3.125000e-02 4.687500e-01 1.875000e-01 4 10 13 3.125000e-01 1.250000e-01 2.343750e-02 1.898438e+00 6.406250e-01 4.203125e+00 2.109375e+00 TESTPED2 1 4 7 1.250000e-01 3.125000e-02 0.000000e+00 3.437500e-01 6.250000e-02 4.375000e-01 1.250000e-01 OUTPUT FILE FORMATS The output files out1.dat, outsqr.dat, and out1p.dat are intended for use with the sim program. The formats of all of these files are the same (they differ in the function used to weight the allele frequencies: f(p) = 1, f(p) = 1/sqrt(p), f(p) = 1/p, respectively): <magic number> <number of pedigrees> <number of loci> <number of alleles for locus #1> <name of locus #1> <frequency of allele #1> <frequency of allele #2> . . . <number of alleles for locus #2> <name of locus #2> . . . <title of pedigree #1> <number of members> <number of affecteds> <number of loci> <list of mothers> <list of fathers> <locus number> <number of affecteds at this locus> <affected #1> <affected #2> . . . <cumulative mean> <cumulative variance> <locus number> <number of affecteds at this locus> . . . <title of pedigree #2> . . . <statistic for first locus> <p-value> <statistic for second locus> <p-value> . . . The output file out1.dat for the above pedigree file looks like this: 1 2 3 3 ACK1 0.45000 0.30000 0.25000 3 ACK2 0.55000 0.20000 0.25000 2 ACK3 0.46500 0.53500 TESTPED1 15 3 3 0 0 2 2 2 0 0 0 5 5 5 7 12 12 12 0 0 1 1 1 0 0 0 6 6 6 8 11 11 11 1 3 4 10 13 1.26656 0.27363 2 2 4 13 0.44219 0.07280 3 2 10 13 0.56464 0.05909 TESTPED2 8 2 1 0 0 1 1 1 0 6 6 0 0 2 2 2 0 5 5 2 2 4 7 0.47938 0.06961 1.40212 0.08043 0.20678 0.41808 -0.26594 0.60486 In addition to those three files, a summary file is produced, named 'table.out'. It contains the locus information, the statistics and their p-values for all families, and the overall statistics and their p-values. The format is obvious; here is the file produced alongside out1.dat above: allele frequencies: ACK1 0.45000 0.30000 0.25000 ACK2 0.55000 0.20000 0.25000 ACK3 0.46500 0.53500 family mean variance observedx Na statistic TESTPED1 <--- pedigree title LOCUS 1 ACK1 f(p) = 1 1.26656 0.27363 2.00000 3 1.40212 f(p) = 1/sqrt(p) 2.12586 0.70359 3.65148 3 1.81882 f(p) = 1/p 3.62500 2.19444 6.66667 3 2.05329 LOCUS 2 ACK2 f(p) = 1 0.44219 0.07280 0.50000 2 0.21426 f(p) = 1/sqrt(p) 0.68899 0.16435 1.00000 2 0.76717 f(p) = 1/p 1.12500 0.54403 2.00000 2 1.18630 LOCUS 3 ACK3 f(p) = 1 0.56464 0.05909 0.50000 2 -0.26594 f(p) = 1/sqrt(p) 0.79652 0.11693 0.73324 2 -0.18508 f(p) = 1/p 1.12500 0.23499 1.07527 2 -0.10259 TESTPED2 <--- pedigree title LOCUS 2 ACK2 f(p) = 1 0.47938 0.06961 0.50000 2 0.07817 f(p) = 1/sqrt(p) 0.75565 0.15779 1.00000 2 0.61515 f(p) = 1/p 1.25000 0.55682 2.00000 2 1.00509 f(p) = 1 The statistic for locus ACK1 for all 1 families is 1.40212 with p-value 0.08043 The statistic for locus ACK2 for all 2 families is 0.20678 with p-value 0.41808 The statistic for locus ACK3 for all 1 families is -0.26594 with p-value 0.60486 f(p) = 1/sqrt(p) The statistic for locus ACK1 for all 1 families is 1.81882 with p-value 0.03447 The statistic for locus ACK2 for all 2 families is 0.97745 with p-value 0.16417 The statistic for locus ACK3 for all 1 families is -0.18508 with p-value 0.57342 f(p) = 1/p The statistic for locus ACK1 for all 1 families is 2.05329 with p-value 0.02003 The statistic for locus ACK2 for all 2 families is 1.54955 with p-value 0.06062 The statistic for locus ACK3 for all 1 families is -0.10259 with p-value 0.54087 NOTE: The p-values may be unreliable for small numbers of families. We recommend that you use the simulation program "sim" along with the histogram program "hist" to generate empirical p-values. BUGS They are all still hiding under their rocks. If you find one, please catch it and mail it to us! REFERENCES See the accompanying REFERENCES file. APM Release 2.0 Last change: 5 Jul 1993