PROGRAM ANNOUNCEMENT: A new version of the ANALYZE program package has been released for simplifying the performance of a large array of parametric and nonparametric tests for linkage and association on data entered in LINKAGE format pedigree and parameter files. This program uses MLINK, ILINK, LINKMAP, UNKNOWN, and LSP from the LINKAGE package (either regular version 5.* or FASTLINK version 2.*) and several programs to perform these statistical tests. All code is provided except LINKAGE, which you must have pre-installed on your workstation. UPDATES: 2/14/96 - Added support for sexlinked as well as autosomal data Corrected minor bug in hrrlamb and hrrmult maximization routine (pointed out by Michael Knapp - cf Terwilliger(1996a) ). Improved formatting for output files. Executables made available with statically linked libraries (for those without pascal compilers) Improved stability across systems. More documentation about program constants What programs are available and what do they do? ANALYSIS: 1) Lod score analysis with heterogeneity (uses MLINK, UNKNOWN, LSP, and HOMOG) 2) Transmission/disequilibrium test and likelihood based extension to multiple alleles 3) Haplotype Relative Risk test for multiallelic markers 4) Sibpair analysis 5) Lod score analysis on large pedigrees broken into nuclear pedigrees with heterogeneity 6) Polylocus linkage analysis 7) Multilocus Haplotype Relative Risk Analysis NONPARAMETRIC: 1) Sibpair Analysis 2) Haplotype Relative Risk Analysis 3) Transmission/disequilibrium test analysis 4) Multilocus Haplotype Relative Risk Analysis MULTI-ILINK Multipoint linkage analysis with intermarker distances as a nuisance parameter (uses ILINK and UNKNOWN) MULTIHOMOG Multipoint linkage analysis with heterogeneity (uses LINKMAP, LSP, UNKNOWN) DOWNFREQ Estimates allele frequencies and allows the user to downcode polymorphic marker loci EXTRACT Utility to reorder marker loci and extract subsets of marker loci from LINKAGE format pedigree and parameter files Where and how can I get the programs? The programs are distributed at two different ftp sites, officially. Either you can get them from ftp.well.ox.ac.uk in directory pub/genetics/analyze or linkage.cpmc.columbia.edu in directory software/analyze There are a number of files in these directories. OSF_analyze.tar.Z contains programs, source code, samples, and documentation for DEC Alpha OSF/1 v. 3.0 OSF_analyze_source.tar.Z contains the same without executables (can be compiled under OSF, ULTRIX) SOLARIS_analyze.tar.Z contains programs, source, samples, and documentation for SUN Solaris 2.4 SOLARIS_analyze_source.tar.Z contains the same without executables (can be compiled under Solaris, SunOS, etc.) SAMPLE_analyze.tar.Z contains sample input and output files DOC_analyze.tar.Z contains program documentation in ASCII format for each program in the set. SAMPLE FTP SESSION AT ftp.well.ox.ac.uk 1) ftp ftp.well.ox.ac.uk 2) login as anonymous, leaving your EMAIL address as a password 3) cd pub/genetics/analyze 4) binary 5) get OSF_analyze.tar.Z 6) close 7) quit Then on your own machine 8) uncompress OSF_analyze.tar.Z 9) tar xvf OSF_analyze.tar (Installation instructions are given in the README file) Please leave your correct EMAIL address as password to ensure you will be added to the mailing list - otherwise, send me EMAIL at joe@well.ox.ac.uk, so I can manually add you to receive information about program updates as they are available. Why should I use the ANALYZE package? The ANALYSIS program set makes the process of data analysis trivially simple. Once the programs are installed, you must merely have 2 input files, pedin.dat and datain.dat in LINKAGE format containing your disease locus and many allele numbers marker loci, in chromosome order. Then, you type "analysis" on your workstation, and then the program will manipulate all your files and do all the different statistical tests you would normally do when screening the genome for a new disease locus, and then as output you get a detailed summary of each statistic, and a summary table, like the following for the sample data: Summary of statistical results from ANALYZE Autosomal Data: Loc x(cM) Z(t) Theta HetChi Z(a,t) Theta Alpha Sibpair TDT HRR-LRT HRR-2xn 2 0.00000 0.909 0.38 3.849 0.10 0.35 0.005694 0.000034 0.017889 0.015534 3 0.00502 5.583 0.28 33.511 0.08 0.39 0.000000 0.000000 0.000000 0.000000 Loc x(cM) N-Z(t) N-HetChi N-Z(a,t) P-Z(t) P-HetChi P-Z(a,t) MP-Z(t) MP-HetChi MP-Z(a,t) 2 0.00000 3.376 0.012 1.243 5.639 0.601 5.622 3 0.00502 8.841 0.337 5.583 33.511 5.505 33.607 Maximum Multilocus HRR = 53.50765 ~ Chi-Square(1) - One-sided Loc = Locus Number x(cM) = Map Distance from the first marker (Locus 2) in Haldane cM Z(t) = Maximum Lod Score HetChi = Chi-Square for Heterogeneity when Z(t) >= 3 Z(a,t) = Maximum Lod Score with Heterogeneity - when Z(t) < 3 Sibpair = p-value for Affected Sib Pair Mean Test TDT = p-value for Likelihood-Based TDT HRR-LRT = p-value for Likelihood-Based Haplotype Relative Risk HRR-2xn = p-value for 2 x n Contingency Table Chi-Square HRR test N-Z(t) = Maximum Lod Scores When Extended Pedigrees are Broken into Nuclear Pedigrees N-HetChi = Chi-square for Heterogeneity (Nuclear Pedigrees) N-Z(a,t) = Maximum Lod Score with Heterogeneity (Nuclear Pedigrees) P-Z(t) = Polylocus Maximum Lod Score P-HetChi = Chi-square for Heterogeneity (Polylocus) P-Z(a,t) = Maximum Polylocus Lod Score with Heterogeneity MP-Z(t) = Maximum Multipoint Polylocus Lod Score MP-HetChi = Chisquare for Heterogeneity (Multipoint Polylocus) MP-Z(a,t) = Maximum Multipoint Polylocus Lod Score with Heterogeneity This summarizes all the statistics for each marker along the chromosome, and even includes the map distance of each marker from the first marker in Haldane cM. There is also a more wordy output file, which gives a detailed synopsis of the tests performed, which is attached at the bottom of this announcement. The NONPARAMETRIC program works in the same manner, and produces the following sample output file (with detailed version available as well. Summary of statistical results from NONPARAM Autosomal data: P-VALUES -------------------------------------------- Loc x(cM) Sibpair TDT HRR-LRT HRR-2xn 2 0.00000 0.005694 0.000034 0.017889 0.015534 3 0.00503 0.000000 0.000000 0.000000 0.000000 Maximum Multilocus HRR = 53.50765 ~ Chi-Square(1) - One-sided Loc = Locus Number x(cM) = Map Distance from the first marker (Locus 2) in Haldane cM Sibpair = p-value for Affected Sib Pair Mean Test TDT = p-value for Likelihood-Based TDT HRR-LRT = p-value for Likelihood-Based Haplotype Relative Risk HRR-2xn = p-value for 2 x n Contingency Table Chi-Square HRR test MULTI-ILINK and MULTIHOMOG are equally simple to use, and give also very easy-to-interpret output files as well with a minimum of effort. Documentation describing the detailed function of each of these program sets is available from the ftp servers above: I hope you will find the new version of these programs to be a useful part of your genome screening operations. Joseph D. Terwilliger, Ph.D. joe@well.ox.ac.uk jdt3@columbia.edu jterwilliger@ktl.fi DETAILED OUTPUT FILE: ANALYZE.FINAL ******************************************************************************** MLINK Pedigree File : pedin.tpd Parameter File : datain.tdt Output Pedigree File : pedfile.dat Output Parameter File : datafile.dat Log File : lsp.log Stream File : lsp.stm Date Run : 12-Feb-96 19:20:41 Sex Difference : 0 Recomb. Fraction to Vary : 1 Increment Value : 0.02000000 Stop Value : 0.48000000 Locus Order : 1 2 Male Recomb. Fractions : 0.00000000 ******************************************************************************** Output of analysis with HOMOG For significance testing use the following standard critical values: HOMOGENEITY: Z(theta) > 3 HETEROGENEITY: Z(alpha,theta) > 3.3 Under homogeneity - Maximum Lod Score = 0.909022 Theta = 0.380 Under heterogeneity - Maximum lod score = 3.849152 Theta = 0.100 Proportion of Linked Families, Alpha = 0.350 Under Heterogeneity there is significant evidence of linkage. 3.3-lod-unit support interval for alpha and theta is as follows: Alpha: ( 0.01, 1.00) Theta: ( 0.00, 0.44) ******************************************************** ***** ***** ***** You are using TDTLIKE - Alpha Test Version ***** ***** for computing TDT-like likelihood ratio ***** ***** statistics based on an algorithm of ***** ***** J. Terwilliger (AJHG 56:777-787 (1995)) ***** ***** ***** ******************************************************** Locus 1 Alleles which appear at least 5 times shown. Multiple test corrected ORIG # CASE CONTROL TDT One-Sided P-Value 1 209 129 18.9349117279 0.0000397356 2 46 83 10.6124029160 1.0000000000 3 76 129 13.7024393082 1.0000000000 4 91 93 0.0217391308 0.9880542409 5 87 75 0.8888888955 0.6593401134 Multiallelic Statistic - Based on Terwilliger (AJHG - March 1995) Maximum Likelihood Estimate of TDT Lambda = 0.62000 -2ln(L) difference = 15.8929973831 P-Value = 0.0000338507 ************************************************************* * * * Program HRRLAMB - Version 2.1 (1/31/96) * * * * AJHG 56:777-787 (1995) * * * ************************************************************* Disease allele frequency = 0.01000000 ========================================= CASE | 43. | 1. | 13. | 12. | 11. | CONTROL | 27. | 10. | 14. | 18. | 11. | ========================================= Estimated parameters for likelihood ratio test: Allele frequencies: Allele H0: H1 1 0.43750000 0.34604554 2 0.06875000 0.08098217 3 0.16875000 0.19567004 4 0.18750000 0.21743885 5 0.13750000 0.15986340 Lambda 0.00000000 0.291021 LRT Chi-Square = 4.40863 p-value = 0.017888760783279 Lambda = 0.291021 NO SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY LRT TEST 2 x n table Chi-square = 12.25782 P-value = 0.015533597133660 NO SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY 2 x N TABLE CHI-SQUARE TEST ********************************************************************* * Program SIBPAIR - Sibpair analysis on Nuclear Families * ********************************************************************* P-values 0 1 2 | SHARED NOT | 0 vs 2 Mean test | | NA/NA 6.0 15.0 7.0 | 34.0 34.0 | 0.390755713 1.000000000 NA/A 20.7 38.8 17.2 | 102.3 116.0 | 0.284681141 0.177501172 A/A 17.3 53.0 28.3 | 158.7 116.7 | 0.051785618 0.005694087 Overall Chisquare For 0-2 Sharer Comparisons = 2.97342 p-value = 0.1130575 Overall Chisquare For Mean Test Comparisons = 7.26225 p-value = 0.0132432 Affected Sib-Pair Mean Test P-value = 0.005694087 Output of analysis with HOMOG For significance testing use the following standard critical values: HOMOGENEITY: Z(theta) > 3 HETEROGENEITY: Z(alpha,theta) > 3.3 Under homogeneity - Maximum Lod Score = 3.375640 Theta = 0.220 Under homogeneity there is significant evidence of linkage. Test for heterogeneity GIVEN Linkage Chi-square for heterogeneity = 0.011801 Theta = 0.220 Alpha = 0.970 No significant evidence of heterogeneity. ******************************************************************************** MLINK Pedigree File : pedin.tpd Parameter File : datain.tdt Output Pedigree File : pedfile.dat Output Parameter File : datafile.dat Log File : lsp.log Stream File : lsp.stm Date Run : 12-Feb-96 19:26:57 Sex Difference : 0 Recomb. Fraction to Vary : 1 Increment Value : 0.02000000 Stop Value : 0.48000000 Locus Order : 1 3 Male Recomb. Fractions : 0.00000000 ******************************************************************************** Output of analysis with HOMOG For significance testing use the following standard critical values: HOMOGENEITY: Z(theta) > 3 HETEROGENEITY: Z(alpha,theta) > 3.3 Under homogeneity - Maximum Lod Score = 5.582551 Theta = 0.280 Under homogeneity there is significant evidence of linkage. Test for heterogeneity GIVEN Linkage Chi-square for heterogeneity = 33.510799 Theta = 0.080 Alpha = 0.390 Significant evidence of heterogeneity at p < 0.0001 level! ******************************************************** ***** ***** ***** You are using TDTLIKE - Alpha Test Version ***** ***** for computing TDT-like likelihood ratio ***** ***** statistics based on an algorithm of ***** ***** J. Terwilliger (AJHG 56:777-787 (1995)) ***** ***** ***** ******************************************************** Locus 1 Alleles which appear at least 5 times shown. Multiple test corrected ORIG # CASE CONTROL TDT One-Sided P-Value 1 271 71 116.9590606689 0.0000000000 2 81 169 30.9759998322 1.0000000000 3 172 284 27.5087718964 1.0000000000 Multiallelic Statistic - Based on Terwilliger (AJHG - March 1995) Maximum Likelihood Estimate of TDT Lambda = 0.79000 -2ln(L) difference = 122.5419672207 P-Value = 0.0000000000 ************************************************************* * * * Program HRRLAMB - Version 2.1 (1/31/96) * * * * AJHG 56:777-787 (1995) * * * ************************************************************* Disease allele frequency = 0.01000000 ============================= CASE | 39. | 9. | 32. | CONTROL | 1. | 23. | 56. | ============================= Estimated parameters for likelihood ratio test: Allele frequencies: Allele H0: H1 1 0.25000000 0.02719901 2 0.20000000 0.25945108 3 0.55000000 0.71334991 Lambda 0.00000000 0.474385 LRT Chi-Square = 50.69729 p-value = 0.000000000000561 Lambda = 0.474385 SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY LRT TEST 2 x n table Chi-square = 48.77045 P-value = 0.000000000025682 SIGNIFICANT EVIDENCE OF LINKAGE DISEQUILIBRIUM BY 2 x N TABLE CHI-SQUARE TEST ********************************************************************* * Program SIBPAIR - Sibpair analysis on Nuclear Families * ********************************************************************* P-values 0 1 2 | SHARED NOT | 0 vs 2 Mean test | | NA/NA 2.0 8.0 4.0 | 24.0 25.0 | 0.207109630 0.943191707 NA/A 22.2 39.0 12.8 | 103.0 128.0 | 0.057323437 0.049996715 A/A 6.7 43.3 43.0 | 201.7 89.3 | 0.000000129 0.000000000 Overall Chisquare For 0-2 Sharer Comparisons = 29.06830 p-value = 0.0000002 Overall Chisquare For Mean Test Comparisons = 46.06913 p-value = 0.0000000 Affected Sib-Pair Mean Test P-value = 0.000000000 Output of analysis with HOMOG For significance testing use the following standard critical values: HOMOGENEITY: Z(theta) > 3 HETEROGENEITY: Z(alpha,theta) > 3.3 Under homogeneity - Maximum Lod Score = 8.841280 Theta = 0.180 Under homogeneity there is significant evidence of linkage. Test for heterogeneity GIVEN Linkage Chi-square for heterogeneity = 0.337201 Theta = 0.020 Alpha = 0.450 No significant evidence of heterogeneity. Locus Map Position 2 0.000000 3 0.005025 OUTPUT FROM PROGRAM POLYLOCUS Analysis with Polylocus method of Terwilliger and Ott Genetic Epidemiology 10(6):477-82 (1993) Primary Locus is now Locus Number 2: Polylocus Maximum Lod Score = 1.243439 Theta = 0.380000 Polylocus Maximum Lod Score with Heterogeneity = 5.639427 Theta = 0.060000 Alpha = 0.300 MULTIPOINT Polylocus Maximum Lod Score = 0.601390 Map Position = -0.458145 MULTIPOINT Polylocus Maximum Lod Score with Heterogeneity = 5.621728 Map Position = 0.057705 Alpha = 0.290000 Primary Locus is now Locus Number 3: Polylocus Maximum Lod Score = 5.582537 Theta = 0.280000 Polylocus Maximum Lod Score with Heterogeneity = 12.859310 Theta = 0.080000 Alpha = 0.390 MULTIPOINT Polylocus Maximum Lod Score = 5.505377 Map Position = -0.346574 MULTIPOINT Polylocus Maximum Lod Score with Heterogeneity = 12.803120 Map Position = 0.116597 Alpha = 0.410000 ************************************************************* * Program HRRMULT - Version 2.1 (2/5/96) * * AJHG 56:777-787 (1995) * * Multilocus Haplotype Relative Risk Analysis * ************************************************************* Alpha N Inter. Position Map Position Lod Score -2ln(LR) 1.000000 485. 0 0 -4.25866 0.00000 0.00000 1.000000 20. 0 1 -0.10034 4.41145 20.31549 1.000000 20. 0 2 -0.08935 5.51186 25.38307 1.000000 20. 0 3 -0.07859 6.76540 31.15581 1.000000 20. 0 4 -0.06807 8.10525 37.32604 1.000000 20. 0 5 -0.05776 9.34492 43.03493 1.000000 20. 0 6 -0.04766 10.15622 46.77110 0.903380 20. 0 7 -0.03775 10.26342 47.26480 0.749131 20. 0 8 -0.02804 10.25928 47.24572 0.622254 20. 0 9 -0.01852 10.25520 47.22692 0.517765 20. 0 10 -0.00917 10.25120 47.20852 0.431452 20. 1 0 0.00000 10.24726 47.19038 0.431175 20. 1 1 0.00100 10.41172 47.94774 0.430277 20. 1 2 0.00200 10.56382 48.64819 0.999986 396. 1 3 0.00301 11.47395 52.83950 0.604671 239. 1 4 0.00402 11.61899 53.50741 0.474436 142. 1 5 0.00503 11.61904 53.50765 0.474436 142. 2 0 0.00503 11.61904 53.50765 0.999994 87. 2 1 0.01420 11.50964 53.00385 0.999997 46. 2 2 0.02355 11.23010 51.71653 0.999996 31. 2 3 0.03307 11.07330 50.99442 0.994983 23. 2 4 0.04278 10.97677 50.54991 1.000000 20. 2 5 0.05268 10.85818 50.00378 1.000000 20. 2 6 0.06278 10.08478 46.44213 1.000000 20. 2 7 0.07309 8.81444 40.59199 1.000000 20. 2 8 0.08362 7.38083 33.99000 1.000000 20. 2 9 0.09437 6.01230 27.68767 1.000000 20. 2 10 0.10536 4.80767 22.14015