********************************************************************************** QR2 Version 1.0 ***************** QR2 is an algorithm which approximates in the least-squares sense a dissimilarity, observed on a finite set of n objects, by a tree distance. Topological constraints on the structure of the tree may be incorporated. The principle is described in: O. Gascuel & D. Levy, "A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance", to appear in Journal of Classification, 1995, available on request. The implementation given here is written in C++. It may be used on any system using gcc 2.4.5 or any later version. Its interface works on any terminal. It requires about 400K of disk space. This research was supported in part by the GIP GREG, the GDR1029 and the IA2 network. December 1994, Olivier Gascuel & Denise Levy. Departement d'Informatique Fondamentale, LIRMM 161 Rue Ada 34392 Montpellier Cedex 5, France Email: gascuel@lirmm.fr & levy@lirmm.fr ********************************************************************************** Copyright Notice ****************** (c) Copyright 1994-1995 by Olivier Gascuel & Denise Levy. Permission is granted to copy and use this program, provided that no fee is charged for it and that this copyright notice is not removed. ********************************************************************************** Contents of this package ************************** file 1 ReadMe - this description file 2 Makefile - the "Makefile" to be used file 3 qr2.cc - the source code file 4 qr2.in - an example of dissimilarity input file file 5 qr2.top - an example of topological constraints file file 6 qr2.out - the tree distance obtained from qr2.in with the default values for every parameters. ********************************************************************************** How to transfer and install QR2 ********************************* Transfer: ********* 1/ Connect to our system by typing "ftp lirmm.lirmm.fr" or "ftp 193.49.104.10", 2/ in response to the "Name" prompt type "anonymous", 3/ to the "Password" prompt type your full e-mail address, 4/ move into the directory "pub/genome/phylo" by typing "cd pub/genome/phylo", 5/ tell ftp that the data will be binary by typing "bin", 6/ if you have gzip, get the compressed package with "get qr2.tar.gz" (12kb), if you have compress, get the compressed package with "get qr2.tar.Z" (21kb), otherwise, expand and get the package in the same time with "get qr2.tar" (49kb), 7/ leave ftp with "bye". Expand: ******* If you have gzip, use "gzip -dc qr2.tar.gz | tar xvf -", if you have compress, use "uncompress -c qr2.tar.Z | tar xvf -", otherwise use "tar xvf qr2.tar" for the uncompressed version. A directory qr2 with all the files of the package will be created in your working directory, move into this directory with "cd qr2". Compile: ******* Modify, if necessary, in the "Makefile" file the INCFLAGS and LIBS options, the defaults options being given for a SPARC under SunOS. Then compile with "make". Install: ******** Modify in the "Makefile" file the parameter INSTALL-DIR by giving the name of the directory you want to use for qr2. Then copy in this directory all the necessary files with "make install". ********************************************************************************** How tu use QR2 **************** What the program does: ********************** QR2 reads in an input file a dissimilarity matrix D0 and then constructs a sequence of matrices D1,D2,...D* which are closer and closer to a tree distance. The process stops when the result's relative precision has reached a minimum required, or when the number of iterations has reached a maximum required. The result D* is stored in an output file. Topological constraints on the tree distance to be constructed may be given in a topology file. Overview of the input and output formats: ***************************************** The dissimilarity matrix must be given in the input file in the following format: The 1st line, reserved for a title or a comment is not to be considered. The following lines must each contain a label of whatever length, followed by at least one blank, then (for the line i) i-1 integer or real values separated by at least one blank, the rest of the line is not to be considered. The reading stops at the 1st line starting by an RC. For example, here is a dissimilarity input file: The famous Case's nine frogs Aurora Boylii 10 Cascadae 13 7 Muscosa 12 7 7 Temporaria 57 50 40 45 Pretiosa 22 9 11 15 48 Catesbeiana 86 65 54 48 85 54 Pipiens 89 67 66 49 83 55 54 Tarahumarae 97 72 79 67 107 60 59 48 The result is stored in the same format in the output file: The famous Case's nine frogs + qr2 1-Aurora 2-Boylii 13.95304 3-Cascadae 18.33281 5.50947 4-Muscosa 22.55778 9.49069 7.03523 5-Temporaria 61.08271 47.98097 45.40525 41.25273 6-Pretiosa 27.31592 14.15935 11.70401 7.55764 40.64007 7-Catesbeiana 74.97141 61.83940 59.31320 55.06092 86.81488 54.00019 8-Pipiens 78.30700 65.17381 62.65567 58.39711 90.07399 57.29771 51.09470 9-Tarahumarae 89.47934 76.34250 73.83752 69.58285 101.74937 68.10315 61.90530 48.00000 Topological constraints must be given in a topology file in the following format: The 1st line, reserved for a title or a comment is not to be considered. The following lines must each contain 2 sets of nodes separated by a `/' which have to be included in 2 distinct subtrees in the tree distance to be built. In each set, nodes must be indicated by their number, separated by at least one blank. One can give in the second set a `*' to indicate `all remaining nodes'. The reading stops at the 1st line starting by an RC. For example, here is a topological constraint file: Examples of topological constraints 1 4 / 2 3 5 5 8 6 /* The 1st constraint expresses that the tree to be built must be composed of 2 subtrees one containing leaves 1 and 4, and the other subtree leaves 2, 3 and 5. The 2nd constraint expresses that the tree must contain a subtree whose leaves are 5, 8 and 6. +--------------7-Castebiana +-----------------1-Aurora .___| | | | +-------------8-Pipiens |------4-Muscosa | +---| | _| +-------------------9-Tarahumarae | +----2-Boylii | |----| | +--------------------5-Temporaria -| +--3-Cascadae | | | +-----------| +---6-Pretiosa | +------------7-Castebiana | | | +------------| +-| +-4-Muscosa | | +-------9-Tarahumarae | | |___| +---| +-3-Cascadae | +----6-Pretiosa |____| | | | +---------1-Aurora +---|-------------------------5-Temporaria +--| | +-2-Boylii +------------------------8-Pipiens The 9 fogs'tree obtained by QR2, without topological constraint The same one with the the above constraints These unrealistic constrains leed to a degenerate tree. Here is a more realistic example with the constraints: 1 2 3 4 5 /* : +--------------7-Castebiana +------------7-Castebiana ___| ___| | | +-------------8-Pipiens | | +-----------8-Pipiens | +---| | +---| -| +-------------------9-Tarahumarae _| +-----------------9-Tarahumarae | | | +--------------------5-Temporaria | +---6-Pretiosa +-----------| | | | +---6-Pretiosa +-----------| +-------------------5-Temporaria | | | | +-| +-4-Muscosa +--| +-4-Muscosa | | | | +---| +-3-Cascadae +--| +-3-Cascadae |____| |____| | +---------1-Aurora | +--------1-Aurora +--| +--| +-2-Boylii +-2-Boylii The 9 fogs'tree without topological constraint The same one with the constraints The options and how to invoke them: *********************************** QR2 checks its options in the command line. Each option name is a specific letter which must be preceded by `-' and followed by an argument if required. They may be given in any order. qr2 -h gives the following explanations: usage: qr2 [-i in_file] [-1 stop1] [-2 stop2] [-t top_file] [-o out_file] [-p print] in_file: dissimilarity file name default: qr2.in stop1: minimum precision required default: 0.01 stop2: maximum number of iterations default: 3*n (n is the number of species) top_file: topological constraints file name example: qr2.top out_file: tree distance file name default: qr2.out print: 1,2,3 for: i/o, data, iterations default: 1 Warning: to obtain a relative precision of 0.01, 3*n iterations are usually enough, but if topologies constraints are given, a greater number of iterations may be necessary to obtain the required result. Evaluation criteria ********************* Two evaluation criteria are given. A metric one, the variance accounted for which is directly in relation with the least-squares criterion, and a topological one, a report of all the changing and maintaining topologies of quadruples. In the given dissimilarity, each quadruple has a particular `natural' topology, in the tree distance obtained, these topologies may be maintained or changed, moreover when topology constraints are given some quadruple are forced to change for a particular topology or to maintain their natural one. A global report of all these changing/maintaining is calculated. With print options > 1, topological, metric and time execution reports are given. Speed with different numbers of objects and different machines: *************************************************************** SPARC2 /32M/ SunOS n=10 -> User time used: 0.11 sec - System time used: 0.0 sec n=20 -> User time used: 4.23 sec - System time used: 0.0 sec n=30 -> User time used: 34.98 sec - System time used: 0.1 sec n=40 -> User time used: 154.80 sec - System time used: 0.3 sec SPARC5 /70Mhz, 8M/ Solaris2 native n=40 -> User time used: 152.17 sec - System time used: 0.8 sec SPARC5 /70Mhz, 8M/ SunOS migrated n=40 -> User time used: 79.21 sec - System time used: 0.5 sec SPARC10 /32M/ SunOS n=40 -> User time used: 68.62 sec - System time used: 1.4 sec **********************************************************************************