Readme is this file. It contains description of all files on the floppy and
instructions about how to run the program. The programs perform phylogenetic
alignment of homologous sequences (DNA/Protein) as described in the articles
"A Unified Approach to Alignments and Phylogenies" in Methods of Enzymology,
Spring 90 (VOL 183) and two articles in Mol.Biol.Evol(Nov.89). The first 
article is the better starting point. If you have received this floppy after a
reprint request, but not the reprint it is because I don't have any more
reprints.  I don't have any reprints for the Meth.Enz. article.  


This program was written, when I worked in different groups.  I am
grateful for support thanks to Norman Kaplan and Charles Langley for 18 months
in North Carolina, a similar period at UCSD thanks to Russell Doolittle, and
two shorter periods with Michael Waterman and Temple Smith. Besides these people
my understanding of these problems have gained from discussions with Joseph 
Felsenstein and David Sankoff. Tom Darden has been helpfull in answering
my questions about C.


Please
refer to the date of the version, that you are using. If the program crashes, 
which it very rarely does and only with very complex data, I will be interested
in receiving following: the data set, the parameters used, the value of 
MEMORY in typedef.h and name of computer that you use. Then I will try to
correct it in a future version.  Please communicate errors to me, this is a 
major help in improvements if you do. 

  I presently use an IRIS 4D/220S. There might be problems on other machines
as I have not tested the programs on other computers recently. If you use the
the result in some publication, I would be interested in receiving a reprint. 
This is not to check that I have been quoted, but I should like to know about
the results obtained by the program (besides generally being interested in 
evolutionary biology).
 
My Workaddress is Jotun Hein,National Institue of Genetics, Mishima, Shizuoka-
Ken, 411 Japan
My E-mail address is JHEIN@NIGUTS.NIG.AC.JP.

(c) Copyright Nov. 1990 by Jotun Hein.  These programs can be copied as long as
   this message is not deleted and the programs are distributed free of charge.

This file has following content:
WHAT THE PRODRAM DOES AND DOESN'T
FILES
THE FUNCTION OF DIFFERENT MODULES
HOW TO RUN THE PROGRAM
  :BEFORE COMPILATION
  :COMPILATION
  :AN EXAMPLE
  :OUTPUT TO SCREEN
LIMITS IN PROGRAM
NEW IN THIS VERSION
FUTURE DEVELOPMENTS
  :THIS PROGRAM
  :OTHER PROGRAMS


------- WHAT THE PROGRAM DOES AND DOESN'T-------------------------------

WHAT IS DOES:

  It solves following two problems simultaneously:

Parsimony Tree Reconstruction:  A very predominant method of choosing a 
phylogeny for a set of sequences is to find the phylogeny that postulates the
least change.  Such programs have been made since the late 60s/early 70s, but 
they could only be applied to prealigned sequences or sequences, that for some
reason had not been subject to insertion-deletions.  

Pairwise alignment: Since late 60s/early 70s (again) methods have existed that
could find the set of substitutions and insertion-deletions that represented the
least evolution or the two sequences.  This is equivalent to align two 
sequences.  Unfortunately this was restricted to two sequences.

It can analyze both DNA and protein (either or, though).  I have run all 
existing globins (>300) and up to 20 complete retroviuses (8-10kb (DNA). The 
last analysis took about 2hrs CPU time, and in this case it is regrettable, that
it cannot analyze sequences that is a mix of DNA and protein. But it should be
able to handle most reasonable alignment problems. 

WHAT IT DOESN'T:

  There are things the program cannot do or should not be used for:

i: It cannot test whether homologies are statistically significant,
it can only reconstruct phylogenies and make alignments.  The sequences should
be homologous and there should be a history to be found. If you give it a set
of COMPLETELY unrelated sequences, it is possible that it will not be able to
align them, since it cannot allocate enough memory.   

ii: The sequences should not vary in length because they have been sequenced
unequally much.  Length differences should be due to evolution. Thus it should
not be used to look for local homologies.

iii: Sequences that has been subject to recombinations or gene conversions
should be treated with caution.

iv: It is part of the the principle of reconstructing the history of a set
of sequences that nucleotides are only put in the same column if the are
same column are connected by a path in the tree where the nucleotide has
neither been inserted or deleted. This has consequences, that occasionnally
gives an alignment, that people feel could be improved, i.e. could be changed
a little to look 'nicer'. To illustrate the phenomena, assume 100 sequences
is to be aligned.  98 have length 0 and 2 have length 1 and consists of a 'T'.
How should these 100 sequences be aligned.  If the 2 'T's are brothers in the
tree, they will be in the same column, their immediate ancestor is a 'T' and
all other positions in the column are occupied by gap signs.  The weight of
the alignment corresponds to one insertion-deletion.  On the other hand if the
2 'T's are at tips very distant in the tree, they should be in two different
columns, and the weight of the alignment will correspond to 2 insertion-
deletions. Had they been put in the same column all internal nodes in the
path connecting them will be assigned 'T's and a large set of insertion-
deletions would be postulated, since the 'T's only survived in two tips.

-------------------------FILES------------------------------------------
Program.
The program is written in the programming language C consists of following
modules:

au.c
dis.c
draw.c
graph.c
hash.c
inout.c
main.c
mat.c
metric.c
pars.c
pre.c
story.c
stati.c
tree.c
typedef.h

The floppy contains following additional files:

Makefile that will automatically compile the modules under unix.
mut.dat
par.dat
test1.pro
test2.dna
user.tree


----------------------  FUNCTION OF DIFFERENT MODULES -------------------

au.c
  It contains a series of standard functions like finding the minimum of 2 or 3
  variables, checks if memory allocation works, freeing of dynamically
  allocated memory etc.

dis.c
  This performs nearest neighbour interchanges on the distance tree to improve
  its fit do the distance data.

draw.c
   Makes drawing of phylogeny.

graph.c
  This performs certain standard operations on graphs. It adds edges to the
  graph that represents deletions (dummy-edges). It prunes the graph, i.e tries
  to find a smaller subgraph that generates the same sequences.  This algorithm
  is not perfect at present, it can cut down the graph by too much, but almost
  never occurs.  The pruning is necessary only for DNA because runs of identical
  nucleotides can be aligned in many ways all equally good, creating a large
  redundancy in the graph.

hash.c
   Hashes sequences to find a fast prelimenary alignment.

inout.c
   Reads sequences from file and writes out the final results.

main.c
  This contains main, reads in the parameters to be used and declares global
  variables to be used in many modules.

mat.c
 This performs the dynamic programming algorithm comparing sequences and the
  algorithm comparing graphs.  These algorithms are kept separate, as sequence
  comparison is much faster than graph comparison.  This also performs the 
  backtracking, reconstructing sequence graphs. It is done by
  a breath first search in the matrix created by mat.c, never visiting the same
  node twice.  The graph is created as minimal nodes and edges are encountered.

metric.c
  This uses the metric space properties of sequences to determine, which
  distances should be calculated and which not. Upper and lower bounds can
  be determined on a distance between two sequences, using other known
  distances.  Such a distance to be calculated if it is small (upper
  bound low), or if it is very uncertain (difference between upper and lower
  bound is big) and not too big (lower bound high).
  This method is only used for more than 30 sequences.

pars.c
  Nearest neighbour interchanges on the parsimony tree to make it more
  parsimonious.  Internal edges that are long or next to long edges are not
  investigated, since parsimony is bad guidance for these.

pre.c
  Finds statistically significant common segments to the
  two sequences, that is used to speed up the computations. 
 
story.c
  Chooses between the set of equally parsimonious history only one.

stati.c
  Calculates different statistics like mutations and insertion-deletetions
  and branch lengths in proposed tree.

tree.c
  This constructs a distance tree from the sparse distance matrix created by
  metric.c or it reads user specified tree from user.tree if this option is 
  used.

typedef.h
  Have definitions of records and constants declarations.


--------------- HOW TO RUN THE PROGRAM -------------------------

---------------------------  :BEFORE COMPILATION -------------------------

You must have access to a computer with a C compiler. I am not sure that
the program will run on PC's.  I suspect it is difficult to run it on
computers with less computing power than a VAX 11/730.
It is possible that you should change the parameter ,MEMORY, in typedef.h. The
more powerfull computer, the higher can this number be. On a VAX 11/730 I would
guess that 25 000 is appropriate, on VAX 11/780 100 000. This number is the area
of the array used for the dynamical programming algorithm. Each entry in the 
area corresponds to 3 short int, i.e 6 bytes on most machines. So 25 000 implies
that the comparisons will try to done by allocating 150 000 bytes. Another 
major consumer of memory is the sequences and sequence graphs.  If you have set 
MEMORY too low, the program will come to a halt and will advice you to increase
MEMORY. Try with 50 000 first.  

Three files must be present to run the program:
i) par.dat.
ii) file with sequences.
iii) mut.dat file with mutation distance matrix. Mut.dat is not shown as there
     is no choice presently for the user to be made concerning this file.

a fourth file must be present if the option usertree is to be used:
iv) file with user specified tree.

Decide with which parameters the program should be run and write these
parameters in the file par.dat.  The program is completely non-interactive.
Make the file with the input sequences and be careful that they have the right
format.

Next line is the first line in an example of par.dat, the first input file:
1  4   8   3     
0  0  
test1.pro
test.tree
test.ali
user.tree

PRO(1)/DNA(0) nuseq a b  
ancestorout usertree 
fileseq
filetree
fileali
usertreefile


COMMENTS ON PARAMETER VALUES IN THE par.dat EXAMPLE

protein or DNA corresponds to 1 and 0, respectively. Thus protein in this case

nuseq is number of sequences to be found on fileseq. 4 in this case.

a,b: gap penalty function is gk=a+k*b.  8 + 3*k in this case. b should not
     be put to less than 3. a should just be non-negative.

ancesterout: only present sequences in alignment (0) or also ancestral
             sequences (1).

usertree: should the program reconstruct the phylogeny (0) or will it be
          supplied (1). No in this case, phylogeny must be reconstructed.
          It is possible to use this option and then try to find a better
          tree by setting parcyc larger than 0.

fileseq: file with sequences on it. test1.pro in this case.

filetree: file where  phylogeny will be written. test.tree in this case.

fileali: file where main output will be written. test.ali in this case.

user.tree: name of file with usertree. user.tree in this case, but not needed.

--------------------  :COMPILATION -----------------------------------------

If you run under UNIX: 
1: write make.   This will create align. To run then 
2: write align.  The program should then be running.

If you run under VMS:
 1: write define lnk$library sys$library:vaxcrtl.olb
 2: write cc main
    write cc mat
    etc. for all modules.
3  write link main,mat, ..., (all modules) 
   you should then have create main.exe, that you can run.
4  write run main.  The program should then be running.

---------------------AN EXAMPLE----------------------------------------

Next line is first line on test1.pro, the second inputfile
>
HEMOGLOBIN ALPHA CHAIN - HUMAN AND CHIMPANZEES
 V L S P A D K T N V K A A W G K V G A H A G E Y G A E A L E
 R M F L S F P T T K T Y F P H F D L S H G S A Q V K G H G K
 K V A D A L T N A V A H V D D M P N A L S A L S D L H A H K
 L R V D P V N F K L L S H C L L V T L A A H L P A E F T P A
 V H A S L D K F L A S V S T V L T S K Y R *

>
HEMOGLOBIN BETA CHAIN - HUMAN, CHIMPANZEES, AND GORILLA
 V H L T P E E K S A V T A L W G K V N V D E V G G E A L G R
 L L V V Y P W T Q R F F E S F G D L S T P D A V M G N P K V
 K A H G K K V L G A F S D G L A H L D N L K G T F A T L S E
 L H C D K L H V D P E N F R L L G N V L V C V L A H H F G K
 E F T P P V Q A A Y Q K V V A G V A N A L A H K Y H *

>
MYOGLOBIN - HUMAN
 G L S D G E W Q L V L N V W G K V E A D I P G H G Q E V L I
 R L F K G H P E T L E K F D K F K H L K S E D E M K A S E D
 L K K H G A T V L T A L G G I L K K K G H H E A E I K P L A
 Q S H A T K H K I P V K Y L E F I S E C I I Q V L Q S K H P
 G D F G A D A Q G A M N K A L E L F R K D M A S N Y K E L G
 F Q G *

>
LEGHEMOGLOBIN I - YELLOW LUPIN
 G V L T D V Q V A L V K S S F E E F N A N I P K N T H R F F
 T L V L E I A P G A K D L F S F L K G S S E V P Q N N P D L
 Q A H A G K V F K L T Y E A A I Q L E V N G A V A S D A T L
 K S L G S V H V S K G V V D A H F P V V K E A I L K T I K E
 V V G D K W S E E L N T A W T I A Y D E L A I I I K K E M K
 D A A *
The line above is the last line read on test1.pro, as it contains end of
fourth sequence.

A sequence must obey following format.: A ">" on FIRST position on a line,
the sequence starts on second line after that and continues until a "*" is
encountered. Always check if the sequences obey the format and there are enough
sequences on the specified file.

Next line is the only line in user.tree, the fourth and optional inputfile.
((1,2)6,(3,4)5)7;

This describes the topology of the phylogeny in test.tree.

Next line is first line in test.ali, the first output file:
1 EMOGLOBIN ALPHA CHAIN - HUMAN AND CHIMPANZEES
2 EMOGLOBIN BETA CHAIN - HUMAN, CHIMPANZEES, AND GORILLA
3 YOGLOBIN - HUMAN
4 EGHEMOGLOBIN I - YELLOW LUPIN

RELATIONSHIP BETWEEN PAIRS OF SEQUENCES: 
     lower triangle distance between pairs
     upper triangle percent homology in alignment
        1     2     3     4

     1       45    27    16
     2   218       25    17
     3   313   339       20
     4   381   394   379   


weight of insertion-deletion of length k: 8 + k*3
number of sequences: 4   
total number of pairwise comparisons used: 9   
163 is the total length of alignment

 TABLES OF GENETIC EVENTS: 


   lower triangle: symmetrisized substitutions
   upper triangle: mutation distance matrix

  C  S  T  P  A  G  N  D  E  Q  H  R  K  M  I  L  V  F  Y  W

C    2  4  4  4  3  4  5  6  5  4  4  6  4  4  4  4  3  3  3 

S 0     1  2  1  1  1  2  3  3  3  3  3  4  4  4  2  3  3  4 

T 0  7     2  1  2  2  3  3  3  4  3  2  3  3  4  2  4  4  5 

P 0  3  0     1  2  4  3  3  3  3  3  4  4  4  3  2  3  4  4 

A 2  8  6  4     1  3  2  2  3  4  4  3  3  4  4  1  3  4  4 

G 0  5  5  2 13     3  2  2  4  5  3  4  5  4  4  2  4  4  3 

N 0  3  1  0  2  1     1  2  3  2  3  2  5  4  5  4  4  3  6 

D 0  3  1  2  4  3  5     1  2  3  4  3  4  5  5  3  5  4  6 

E 0  1  2  2  2  8  2  8     2  4  3  2  4  5  5  2  4  5  5 

Q 0  2  0  0  5  0  0  2  1     2  3  2  4  5  4  4  5  4  5 

H 0  3  0  0  2  1  4  2  0  3     2  3  4  4  3  5  4  3  5 

R 0  1  1  0  0  0  0  0  1  0  2     1  4  4  4  4  5  5  4 

K 0  3  4  1  3  4  1  2  6  7  3  3     4  4  4  3  5  5  5 

M 0  0  0  1  0  0  0  0  0  0  1  0  0     2  1  2  3  4  3 

I 0  2  1  0  2  1  1  0  0  0  0  0  0  0     1  4  2  3  3 

L 0  1  0  1  4  0  0  0  0  0  3  1  4  5  7     4  2  3  2 

V 2  4  2  0  9  3  0  1  5  0  0  0  3  0  1  7     2  3  3 

F 0  0  0  0  2  0  0  1  0  0  0  0  1  0  0 11  5     1  3 

Y 0  0  0  0  0  0  2  0  0  0  0  1  0  0  1  2  1  3     3 

W 0  0  1  0  1  0  0  0  0  0  0  0  0  0  0  1  1  1  0    


INSERTION-DELETIONS : 

 lengths:   1  2  4  6
 numbers:   5  2  2  2

total weight of history: 794   


ALIGNMENT OF SEQUENCES: 

V--LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF------DLSHGSAQVKGHGKKV  62  1
VH-LTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKV  67  2
G-VLTDVQVALVKSSFEEFNANIPKNTHRFFTLVLEIAPGAKDLFSFLK--GSSEVPQNNPDLQAHAGKV  67  4
G--LSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATV  68  3
   *       *                          *     *                    *   *


AD-ALTNAVAHVDD----MPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDK  127  1
LG-AFSDGLAHLDN----LKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQK  132  2
FKLTYEAAIQLEVNGAVASDATLKSLGSVHVSKGVVDAHFPVV-KEAILKTIK----EVVGDKWSEELNT  132  4
LT-ALGGILKKKGH----HEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNK  133  3
                         *   *  *                                     


FLASVSTVLTSKYR-------  141  1
VVAGVANALAHKYH-------  146  2
AWTIAYDELAIIIKKEMKDAA  153  4
ALELFRKDMASNYK-ELGFQG  153  3
                     

number of completely conserved sites: 9
Above line was the last line of test.ali

Comments:
  Not all paiwise distance are calculated.  The ones not calculated
appear as -1 in the distance matrix.
  The numbers after each line in the alignment is the number of the last
position in the sequence and the sequence number.
  When more than 10 sequences are analyzed a histogram of mutability will 
appear over the sequences.

Next line is the first line of test.tree, the second output file:
1 EMOGLOBIN ALPHA CHAIN - HUMAN AND CHIMPANZEES
2 EMOGLOBIN BETA CHAIN - HUMAN, CHIMPANZEES, AND GORILLA
3 YOGLOBIN - HUMAN
4 EGHEMOGLOBIN I - YELLOW LUPIN
((1:94,2:124)6:167,(4:262,3:117)5:29)7:0;

 CAUTION: This tree is rooted assuming a clock, which might not be justified.
 The exact branch lengths can be found in the nested
 parenthesis representation just above.

                   |--------------------------------------|                     
                   |                            |---------5---------|           
                   |                            |                   |           
                   |                            |                   |           
                   |                            |                   |           
         |---------6---------|                  |                   |           
         |                   |                  |                   3           
         |                   |                  |                               
         |                   |                  |                               
         |                   |                  |                               
         1                   |                  |                               
                             2                  4                               
                                                                                
                                                                                
Above line was the last line of test.tree

test.tree contains following information: four globins and the numbers that
represent them in the program and the resulting phylogeny.  The phylogeny says
that 1 and 2 are sisters, the edge leading to 1 is 94 long, the edge leading to
2 is 124 long, the name of the common ancestor is called 6 and has an edge of
length 167 leading to it. Analogous for 3 and 4.  7 is an arbitrary root for all
of them and naturally has a 0-length edge above it.

Comment: The unrooted tree resulting from this method has been rooted by 
assigning a root to the midpoint of the longest path found in the tree. 
Several factors can cause this rooting to be wrong: 

  Unequal rates of evolution.

  Parsimony will underestimate the length of long branches more severly
than the length of shorter branches, thus scewing the tree.  In the above
example, the true root should probably have been on the branch leading to
leghemoglobin.

  Groups of sequences with many members will seem to have experienced more 
evolution than groups with fewer members.  This factor is probably be of less
importance than the first two mentioned.

  You can probably make a more beatiful drawing yourself.   

--------------------------- OUTPUT TO SCREEN. -------------------------------

analysis of 4 protein sequences
gap penalty: 8 + k* 3
The program will reconstruct the phylogeny
The alignment will be written on t.a
The phylogeny will be written on t.t
finished reading parameters
finished reading sequences first time
finished reading sequences second time
calculated 4 pairwise distances of 6 possible
finished constructing distance tree using 2 comparisons
finished improving distance tree using 0 comparisons
finished making and improving parsimony tree using 3 comparisons
finished analysis

If the first two lines don't appear immediately on the screen something is
wrong, either the files were misnamed or the sequences didn't have the right
format.  The numbers appearing in the above example are naturally specific for
that example.

If the program cannot analyze the data, it should give a message about why not:

"serious alloproblems" means that the computer was not able to allocate 
sufficient dynamic memory. 

is making bigger array NMAXL 21000
This means that the dynamical programming array has been increased to 21000

is expanding sequence graphs totall 40000
This means that the memory for sequences (and graphs) can now hold 40 000
nucleotides or amino acids.  This increases as the computations progresses and
ancestral sequences are determined, and must be stored.
 
-------LIMITS IN THE PROGRAM-------------------------------------------

There are a few internal limits in the program:

  If more then 40 sequences are analyzed, the program will not draw the tree. 
This is determined by the constant PHYSIZE in typedef.h

  If more than 22 sequences are analyzed, the program will not write out a
distance matrix. This is determined by the constant MATSIZE in typedef.h
 
-------NEW FEATURES RELATIVE TO 90,MAY version-------------------------

It now reads signs that indicates uncertainty about which nucleotide or
amino acid is actually at a given position.  This seems to be especially
relevant for sequences obtained by a consensus of three sequences 
determined by PCR.

It used less memory and some bugs have been tracked down and corrected.


--------------------FUTURE DEVELOPMENTS---------------------------------

  :THIS PROGRAM

I am in the middle of adding following two features:

The requirement that sequences must be complete sequences is too restrictive,
especially as laboratories increasingly sequences large pieces of genomic
DNA.  This restriction will thus be removed.

An option will be included that allows genomic sequences, DNA with coding 
regions embedded, to be compared.  I am convinced that this will be of
increasing importance.  This has been completed.  It is just a question of
getting the program sufficiently userfriendly and debugged. One interesting
complication arising when allowing for DNA with coding readings is the
occurance of overlapping reading frames.  How do you for instance compare 
two pairs of proteins, that is coded for by the same DNA, or one protein with a
pair of proteins coded by the same DNA. This can be solved by a generalization
of the traditional dynamical programming algorithm.  The problem arising is
that when two proteins are coded for by the same DNA it is not possible to
make an insertion (or deletion) without also making it in the other. 

Besides this I will later do:

More information about the tree will be written out:

For every edge following is written:  DIST, means that this edge in the
tree was determined solely by distance considerations. If the edge was not 
determined by distance considerations, the weights of the three posssible
configurations will be written out, like (768,781,784).  This means that 
the configuration in the tree chosen was 768, while the weights of the two  
trees that could obtained by rearranging the four subtrees radiating from
edge concerned, were less parsimonious (781 and 784). The larger this
difference is the more sure the parsimony criteria chooses the 
present configuration around this edge.

It writes a tree with multiple events correction on branchlengths, which 
should lead to a better rooting of the tree.

It draws a tree, that is the best fit of a molecular clock tree to the 
above tree, which potentially could be usefull in determining, which 
brances have experienced an acceleration of molecular evolution.

  I intend to continue to improve this program as I become aware of better 
algorithms or come up with ideas or options that should be incorporated in 
the program.  Following is planned:

Incorporate an option for using weighting in the alignment and phylogeny
reconstruction.  The central idea is to use the observed mutability of a 
position to assign a weight to it.  The effect should be that highly conser-
ved position will be aligned with similar residues, which is frequently not the
case with present algorithms, that thus have serious difficulties in aligning
very distant molecules correctly.

Memory requirements of the program can be reduced and will be. There is no why
this program should not be able to run on PCs.  I will soon test it on PCs.

The treedrawing program can be improved.

An additional tree will be drawn, that has multiple events corrected branch-
lengths which should give a better rooting in certain cases.  Then you can 
choose the tree you prefer.

The program condense.c, that could simplify large phylogenies and was mentioned
in the articles on this program is not functional at present.  It will soon be.

  :OTHER PROGRAMS

Write a program that finds recombination events. The central idea is described
in a forthcoming article in Mathematical Biosciences.  I hope that this will be
very useful as present methods, that only finds one phylogeny for a set of mole-
cules, can fail seriously when used to analyze sequences that has experienced
recombinations or gene conversions in their evolutionary history. This has     
almost been completed. I sincerely hope to be able to distribute early in 
1991.  If you want it you should contact me then.

Write another program that analyzes a single string (DNA/Protein) for a dupli-
cations and sorts out in what order they have occurred.