Multalin documentation ====================== ( version 5.0, 5.1, 5.2 ) Introduction ------------ Welcome to Multalin! This is a software that will allow you to align simultaneously several biological sequences on computer which use UNIX system. What is a Multiple sequence alignment? It is the arrangement of several protein or nucleic acid sequences with postulated gaps so that similar residues are juxtaposed. A score is attached to identities, conservative or non-conservative substitutions (the score measuring the similarity) and a penalty to gaps; an ideal program would maximize the total score, taking account of all possible alignments and allowing for any length gap at any position. Unfortunately the computing requirements, both of time and memory, grow as the nth power, where n is the sequence number, so this ideal alignment can be found only for two sequences or three short sequences. In the general case, to be practicable programs must restrict the conditions of the optimization. Nevertheless it is undeniably useful to have an automatic system available for multiple sequence alignment to provide a starting point for a more human analysis. Multalin creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. The method used is described in "Multiple sequence alignment with hierarchical clustering", F.Corpet, 1988, Nucl. Acids Res. 16 10881-10890. NEW in version 5.0 ------------------ Comparison tables can include negative entries. GCG tables can be used. Gap penalty can be length dependent. Gap at sequence extremities can be scored or not. NEW in version 5.1 ------------------ There is a maximal number of iterations set to 10 (see F. Iteration). A bug has been fixed that prohibited the comparison of two sequences only. SCO and sco, CLU and clu are now valid extensions for score files and clustering files. Portability has been tested for more platforms. NEW in version 5.2 ------------------ The similarity coefficient at a position is still the mean of all pairwise coefficients at this position, BUT only the sequences for which the position is internal are counted. Example: CCPC50 QDG DAAKGEKEFN .KCKACHMI QAPDGTDII. KGGKTGPNLY CCRF2C ..G DAAKGEKEFN .KCKTCHSI IAPDGTEIV. KGAKTGPNLY CCRF2S QEG DPEAGAKAFN .QCQTCHVI VDDSGTTIAG RNAKTGPNLY CCQF2R .EG DAAAGEKVSK .KCLACHTF DQGGAN.... ...KVGPNLF CCQF2P .AG DAAVGEKIAK AKCTACHDL NKGGPI.... ...KVGPPLF | Multalin sequence # 245 5555555555 555555555 5555555555 5555555555 Clustalv sequence # 245 5555555555 155555555 5555553331 3335555555 I think that it is important to take the mean over all sequences for new gaps to be preferentially inserted at the same position as old gaps. But this is a problem when sequence lengths are inhomogenous, so I have made this modification. Cautions -------- Before aligning large sequences, you may test Multalin with shorter sequences, and look at the system occupation ( see 'ps' UNIX command ) during alignment. When the swap partition of the hard disk is full, Multalin can use an internal swap mode on user partition. Portability ----------- The file "portab.h" contains preprocessor commands which allow to compile and run program on different computers and operating systems. Compilation ----------- Source files: afichseq.c aligne.c basproc1.c cluster.c coefdisk.c commande.c copright.c disk.c drivers.c fast.c gcgdisk.c lstfseq.c ma.c maglob.c mbgbdisk.c msfdisk.c msgerr.c muldisk.c parametr.c portab.c swapd.c util.c To create 'ma' program in UNIX environment: 1/ Compile these files to obtain object files: cc - -c 2/ Build the executable file 'ma' by linking all objects files: cc - afichseq.o aligne.o basproc1.o cluster.o coefdisk.c commande.o copright.o disk.o drivers.o fast.o gcgdisk.o lstfseq.o ma.o maglob.o mbgbdisk.o msfdisk.o msgerr.o muldisk.o parametr.o portab.o swapd.o util.o -o ma where is the option which allows compilation according to the ANSI C directives (1988). Installation ------------ Two possibilities: 1 Copy 'ma', 'ma.MSG' and symbol comparison tables (files '.tab') into the same directory. If you run ma from another directory, use its full name (do not use the PATH). 2 Copy 'ma.MSG' and the *.tab files into the same directory and set the environment variable MULTALIN to its name, e.g.: DOS: set MULTALIN=C:\MULTALIN.DIR\ Unix : setenv MULTALIN /usr/Multalin/ Copy 'ma' into a directory that is included in the PATH. ________________________________ | | | Running Multalin | |________________________________| You can run Multalin in two modes: * command line mode. * interactive mode which helps you to select program parameters and options. Help: type 'ma -h' or 'ma -?' to obtain help screen. 1/ Command line mode ===================== Syntax: ma [[