Documentation for 'theta'. -------------------------- (August 1996) Contents -------- 1. Description and Purpose. 2. Options. 3. File formats. 4. Hints and tips. 5. References. 1. Description and Purpose -------------------------- The 'theta' program can be used to estimate log-likelihood ratios for recombination fractions. It is applied to output from the 'block' program, and can perform three different types of estimations : - Given output from 'block' obtained at a single run at theta0, it can estimate the log-likelihood ratio of another recombination fraction (theta1) over theta0. It can do this for all recombination fractions between 0 and 0.5, thus creating a graph, visually showing the most probable recombination fraction given these results. - Given output from 'block' obtained at two runs with first theta0, then theta1, estimate the log-likelihood ratio of theta1 over theta0 using the square-root method described in [1]. - Like the above, but using the iterative method described in [1]. A two-point linkage analysis can thus be performed by running 'block' on a linkage problem, and then analyzing the output with 'theta'. 2. Options ---------- The 'theta' program is run from the command line, and can be supplied some options. In the following these options will be explained. Help can also be obtained with the '-h' option. This is the format of the 'theta' program : theta [-hDPGFVE] [-X] [-s] [-e] [] [theta1] 'file1' contains the name of a file with results obtained by running 'block' for some time, and must always be given. 'file2' contains the name of a second file with results from 'block', and must be given if option '-X2' or '-X3' is used. If '-X1' is used, only 'file1' should be specified. Option Description -D When this option is given, the program will calculate log-likelihood ratios for theta values between 0 and 0.5, and attempt to display the graph with 'gnuplot'. If 'gnuplot' is not present on your system, you can access the output directly in the 'gnuplot.input' file. If 'gnuplot' is present, you can access the produced output graph (as a Postscript file) in 'graph..ps'. -e When option -D or -P is used, log-likelihood ratios are usually calculated for theta values between 0 and 0.5. With this option and '-s', it is possible to specify a different starting and ending point. '-e' is obviously used to specify the ending point. Thus, options '-s' and '-e' can be used to magnify pieces of the graph. -E When using option -D or -P, the endpoints of theta values sometimes produce extreme values, making it hard to see the graph. This option forces the program not to compute log-likelihood ratioes for these endpoints. -F The numbers that are output to the 'gnuplot.input' file, are printed using the C printf("%f") format. This is the default, and usually works well with 'gnuplot'. -G Like the above, except that the numbers are output using the C printf("%g") format. -h Print a help page with short descriptions of the options. -P This option is similar to option '-D', except that the graph will not be displayed. -s Like option '-e', but specifies starting point of the graph. -V The standard deviation is computed for each log-likelihood ratio. The method described in [2] is used. -X This option is used for specifying which method to use : 1 - one file is used, containing the output from a linkage problem run with 'block' at a specific theta-value (theta0). A graph over the log-likelihood ratios can be obtained using option '-D' or '-P', or the log-likelihood ratio for theta1 over theta0 can be found by specifying 'theta1'. 2 - the square-root method described in [1] is used for computing a log-likelihood ratio of theta1 over theta0, using the results from runs at both theta1 and theta0, supplied as 'file1' and 'file2', see above. 3 - like '-X2', except that the iterative method described in [1] is used here. Again, results from two runs with 'block' must be supplied. 3. File formats --------------- In this section, the few files that are used, and output by the 'theta' program are described. 'file1' - This file must always be specified. It should contain the results from the 'block' program when applied to a linkage problem. In the first line of it, the recombination fraction at which it was produced, is listed. In the second line, the number of blocks in this scheme is listed, and then follows number of recombinations, number of non-recombinations, and a list of estimated recombination fractions for bottom level individuals. 'file2' - This file must be specified if option '-D' or '-P' is used. Otherwise, it should not be specified. Like 'file1' it should contain results from 'block' when applied to a linkage problem. The results in 'file2' should be produced with a different recombination fraction than those of 'file1'. 'gnuplot.input' - This file contains the input to the 'gnuplot' program. The first column contains the 'theta1' value, i.e., the log-likelihood ratio is found for theta1 and theta0, where the results are obtained by sampling using theta0. The second column contains the estimated log-likelihood ratio for this theta1 value. If option '-V' is given, three more columns are present. The two first contains the log-likelihood ratio minus and plus the standard deviation, and the last contains the standard deviation. The standard deviation is estimated using the method described in [2]. 'graph..ps' - This file contains the postscript format graph over the log-likelihood ratio. The highest point on this curve shows the estimated most likely recombination fraction. 4. Hints and tips ----------------- The highly useful 'gnuplot' program can be obtained by anonymous ftp from, e.g., prep.ai.mit.edu:/pub/gnu. 5. References ------------- [1] Claus S. Jensen, Augustine Kong : "Linkage Analysis in Large Pedigrees With Many Loops - Blocking Gibbs", unpublished, to be submitted. [2] Charles J. Geyer : "Markov Chain Monte Carlo Maximum Likelihood", In Proceedings to the 23rd Symposium on the Interface Between Computer Science and Statistics, 1991. Pages 156-163.