User's Guide to

		      GRAIL    and     GENQUEST

		Sequence Analysis, Gene Assembly And
		     Sequence Comparison Systems

			  E-mail Servers

				 &

	    	     XGRAIL	and     XGENQUEST
		 (Version  1.2)       (Version 1.1)

		      Client-Server  Systems

			   (July, 1994)


			Informatics  Group
		   Oak Ridge National Laboratory
			Oak Ridge, Tennessee
			      U.S.A.


			   ----------
			   HIGHLIGHTS
                           ----------

GRAIL

	GRAILs (1, 1a, 2)		Protein Coding Regions

	GAP				Gene Modeling

	PROTEIN TRANSLATIONS

	FUNCTIONAL 			PolyA sites
	SITES				Pol II Promoters
					CpG Islands

	HUMAN REPETITIVE		
	DNA ELEMENTS
			
	GRAIL ANNOTATION REPORT


GENQUEST
					DATABASES		METHODS

	Database Searches 		Swiss-Prot		Fasta
	and Alignments			PDB			Blast
					Prosite			Smith-Waterman
					GSDB			BLIMPS
					BLOCKS			QuikSrch
					dbEST
					Human Repetitive DNA

									      
			TABLE OF CONTENTS
			-----------------


	GRAIL OVERVIEW


	GRAIL E-MAIL SERVER USER MANUAL


	GENQUEST OVERVIEW


	GENQUEST (Q) E-MAIL SERVER USER MANUAL


	XGRAIL CLIENT-SERVER SYSTEM USER MANUAL


	XGENQUEST CLIENT-SERVER SYSTEM USER MANUAL


	ACKNOWLEDGEMENTS


	SOFTWARE SUPPORT


	GRAIL PUBLICATIONS


	REFERENCES							      
									      
--------------
GRAIL OVERVIEW
--------------

GRAIL is a suite of tools designed to provide analysis and putative annotation
of DNA sequences both interactively and through the use of automated 
computation. The capabilities of GRAIL are available by several methods. These 
include an e-mail server at ORNL, which processes DNA sequence(s) contained in 
e-mail messages, and an interactive graphical X-based client-server system 
called XGRAIL, which supports a wide range of analysis tools, including gene 
modeling.

The current e-mail implementation of GRAIL provides analysis of protein coding
potential of a DNA sequence, and an option for protein sequence database 
searches of putative coding regions. 

GRAIL VERSIONS:

The coding recognition portion of the system uses a neural network which 
combines a series of coding prediction algorithms. There are three basic
versions of this neural network, GRAIL 1, GRAIL 1a and GRAIL 2. 

GRAIL 1 has been in place for about three years. It uses a neural network 
described in PNAS 88, 11261-11265, which recognizes coding potential within a 
fixed size (100 base) window. It evaluates coding potential without looking for
additional features (information such as splice junctions, etc).

GRAIL 1a is an updated version of GRAIL 1. It uses a fixed-length window to
locate the potential coding regions and then evaluates a number of discrete
candidates of different lengths around each potential coding region, using
information from the two 60-base regions adjacent to that coding region, to 
find the "best" boundaries for that coding region.

GRAIL 2 uses variable-length windows tailored to each potential exon candidate,
defined as an open reading frame bounded by a pair of start/donor, 
acceptor/donor or acceptor/stop sites. This scheme facilitates the use of more
genomic context information (splice junctions, translation starts, non-coding
scores of 60-base regions on either side of a putative exon) in the exon 
recognition process. GRAIL 2 is therefore not appropriate for sequences without
genomic context (when the regions adjacent to an exon are not present).

These changes have improved the overall performance compared to GRAIL 1,
particularly for short exons. 

All three systems have been trained to recognize coding regions in human DNA
sequences, although they also work well on a number of other organisms, 
particularly other mammals. 

[For convenience we use the term "exon" to refer to coding regions, and a note 
of caution is that non-coding exons, or non-coding portions of exons will not 
be recognized by the system.]
									      
GRAIL PERFORMANCE STATISTICS

GRAIL 1 typically finds about 90% of coding regions greater than 100 bases with 
performance falling off for shorter exons. GRAIL 1 has been tested on a set of
human genes containing 102kb of sequence. This set contained 70 coding exons and
the system identified 62 (89%) and assigned them all to the correct strand. Of
the eight missed 6 were less than 100 bases long. In a larger test set strand 
assignment was 90-95% correct. The preferred reading frame assignment was
correct for 60 (95%) of these exons while the frame assignment for the other 
two had some ambiguity.

Of the predicted exons with a quality  score of "excellent" all were actual
coding exons. Of predicted exons scoring "good" 69% were real and of the
predicted exons with a score of "marginal" only 16% were real. Though this is a 
rather limited test set, the results of this analysis give some guidance for 
interpreting GRAIL 1 output.

GRAIL 1a performs much better than GRAIL 1 in finding true exons and eliminating
false positives. It is also better than GRAIL 1 in terms of finding the 
boundaries (edges) of coding regions. GRAIL 1a has been tested on a set 137
sequences containing 954 exons. The system recognized 82% (787) of the exons in 
the set, with a false positive rate of 11%. 

Of the 954 exons in the set, 711 exons were greater than 100 bases long. The 
system recognized 95 % (675) of these exons. The frame assignment was correct
virtually always (greater than 98% of the time).

GRAIL 2 finds about 91% of all coding regions, with a performance that is close
to being independent of exon size. Its false positive level is similar or even
slightly better than GRAIL 1. GRAIL 2 has been tested on a set of 137 sequences 
containing 954 exons. The system recognized 91% (857) of the exons in the set, 
with an apparent false positive rate of 8.6% (most of these were outside the 
domain of the known genes and some may actually be real). 

Of exons less than 100 bases long GRAIL 2 found 102 out of 122 or 84%. GRAIL 2
provides the best candidate for a given coding region in a manner which includes
splice junctions (or translation start/stop) at the candidates edges, so the 
user will note that the edge of the candidates in the initial and summary tables
correspond to putative edge signals. In the test set, about 61% of the 
recognized exons had both edges exactly correct (the right splice junctions
picked) and about 96% had at least one edge correct. GRAIL 2 is perhaps better
at estimating the true extent of an exon compared to GRAIL 1 and this additional
accuracy may help in experimental protocols such as those involving PCR.      
									      
-------------------------------
GRAIL E-MAIL SERVER USER MANUAL
-------------------------------

The GRAIL e-mail server finds potential protein coding regions in anonymous DNA
sequences and provides a means of searching the translations of these regions
against protein and motif databases.

To have sequences analyzed by e-mail, send e-mail to: GRAIL@ornl.gov

Please note that:

(i)   GRAIL is case-insensitive,

(ii)  More than one sequence can be sent in an e-mail message,

(iii) The length of a sequence must be at least 100 bases (for GRAIL 1) and at
      most 100 kilo-bases, and

(iv)  The sequence must consist of letters A, C, G, T or U. U is converted to T.
      Any other character is converted to C. Blanks are ignored.


The first line of the message MUST be in the following format: 

Sequences NUM_SEQ [-1a / -2] [-S / -E / -P / -p / -B / -b]

The word Sequences, followed by the number of sequences in the message, followed
by OPTIONAL switches:

   (a) one of -1a and -2 and 

   (b) one of -S, -E, -P or -p or -B or -b.


The first line is followed by the sequences in the following format:

>sequence_name
sequence

A typical message is shown below:

Sequences  3  -2  -E

>seq1name
AAAATTTCGGG........


>seq2name
GGCTGTTCATG........


>seq3name
ATTGCAGACAG
									      
OPTIONAL SWITCHES
-----------------

One of the following two:

-1a switch specifies that GRAIL 1a will be used for the analysis.

-2 switch specifies that GRAIL 2 will be used for the analysis. 

   The default is GRAIL 1. 

and one of the following six:

-S switch specifies that translations of all potential coding regions be
   searched against SwissProt using an implementation of the Smith-Waterman 
   algorithm on an Intel iPSC/860 parallel computer.

-E switch is same as -S, except that only "excellent" potential coding regions
   be considered for the searches.

-P switch is same as -E, except that instead of Swiss-Prot, Prosite Database is
   searched.

-p switch is same as -P, except that abbreviated Prosite Database search output
   is returned.

-B switch is same as -E, except that Blast method is used instead of Smith-
   Waterman. Top 40 database hits are returned.

-b switch is the same as -b, except that top 10 database hits are returned.

The database search hits provide an indication of homology between recognized
exons and existing proteins.

RETURN MESSAGE
--------------

For each sequence the following information will be returned:

1.  Initial Coding Scores: 

GRAIL 1 reports the score for the coding potential for each position analyzed on
each strand (the f-(forward) strand represents the sequence as received, and the
r-(reverse) strand represents the reverse compliment).

These scores range from  0.0 to 1.0 and a score greater than 0.5 identifies a
region with protein encoding potential.  Non-coding regions often have a score
of 0.000. To reduce the output, only regions with scores of at least 0.01 are 
reported.

GRAIL 1a and GRAIL 2 use a somewhat more concise format appropriate for their
design and implementation. Instead of a position by position score, they report 
a table for the forward strand and a table for the reverse strand, which lists
potential exon candidates and their scores. 

Sometimes a single exon is perceived in both the forward and reverse direction,
and the issue of which is the coding strand is resolved in a later step
(described below).							      
									      
2.  Frame:

In calculating the coding potential, the system calculates the reading-frame
which is "preferred" in the window over which the calculation is done
(100 bases for GRAIL 1 and the exon candidate length for GRAIL 1a and GRAIL 2). 

In GRAIL 1 this information is  returned for positions with scores over 0.5, 
while in GRAIL 1a and GRAIL 2 each candidate exon has an associated frame.  

In GRAIL 1 the translation frame predicted is true for about 95% of true exons,
while in GRAIL 1a and GRAIL 2, it is true virtually always (greater than 98% of
the time).

3.  ORF:  

The limits between which the preferred frame is open is returned for windows
with scores over 0.5 (GRAIL 1) or exon candidates (GRAIL 1a and GRAIL 2).

4. EXON Summary Table: 

The second part of the output is the system's interpretation of the raw data
(neural net outputs). This summary table provides the estimated limits of the
coding exon, the most likely strand for the exon with a probability for the 
correctness of the strand assignment, the preferred reading frame for the exon 
and a quality assessment.

An interesting phenomenon we have noted is that some exons seem to have coding
character on both strands, so be aware that strand assignments are not always 
correct, and it is sometimes useful to consider both strands as possible. 
Strand assignment is correct about 95% of the time in GRAIL 1 and greater than
98% of the time in GRAIL 1a and GRAIL 2.

Any exon with a quality score of "excellent" is worth further consideration.
									      
-----------------
GENQUEST OVERVIEW
-----------------

GENQUEST is an integrated sequence comparison server which allows users to make
use of a wide variety of sequence comparison methods and target databases,
through either e-mail or an X-based client server system, XGENQUEST. 
GENQUEST can also be transparently accessed from XGRAIL.
The purpose of the system is to allow rapid and sensitive comparison of DNA and
protein sequences to existing DNA and protein sequence databases.

The databases which can be accessed from the GENQUEST server include:

GSDB (Genome Sequence Database): DNA sequence database satellite maintained at
ORNL and updated daily from the primary database at Los Alamos National 
Laboratory),

SWISSPROT[1],

PROSITE[2] (a library of protein motifs),

PDB[4] (Protein Databank sequences of proteins with solved structures),

BLOCKS[9] (Protein motif database based on conserved blocks),

dbEST (Expressed Sequence Tag database), and

a library of human repetitive DNA sequences (from J. Jurka[3]).

GENQUEST uses a specialized parallel computing environment at Oak Ridge National
Laboratory and is supported and curated by a number of groups in the Genome
community.  

As new analysis tools become available, the modular nature of the GENQUEST
server will facilitate their implementation and broaden their accessibility to
the research community.

The GENQUEST server not only allows the user to access multiple databases but
also allows several databases to be queried from the same message. 

The GENQUEST server also supports a number of methods for database searching.
									      
----------------------------------
GENQUEST E-MAIL SERVER USER MANUAL
----------------------------------

GenQuest can be accessed by sending e-mail to: Q@ornl.gov

Messages to GENQUEST begin with a set of keywords which specify the options to
be used in the search. Two key words are mandatory: TYPE and SEQ. The remainder
are optional or have default settings. GENQUEST is case insensitive.

EXAMPLE of a typical query:

TYPE DNA6
TARGET SwissProt
METHOD SW -g 13
MATRIX PAM120
SCORE 50
ALIGN 20
SEQ
ATCTATCGTCGAGCTGGTGTCTGTGCTAGTCCACAGACAGHCTCGCTATATATGCT
CGTTTTAAAGCTCGTATATATGCTCTCGCTAGTCCGATCGATGCTCGATCGCTAGTA
TCGTATGATTCTTG
END

This example translates the given DNA sequence in 6 frames and searches
SwissProt, using Smith-Waterman with gap penalty of 13, PAM120 matrix, and
showing top 50 matches and top 20 alignments.

KEYWORDS: The keywords and options supported by the server are listed below:

1] TYPE ( DNA / DNA6 / PROTEIN ): the type of sequence being submitted. 

PROTEIN specifies that the input is an amino acid sequence. 

DNA6 specifies that the input sequence is DNA and to be translated in all 6
reading frames for search against protein databases.

DNA specifies a DNA input type which can be searched against DNA target 
databases or if a protein database is selected as target, translated only in the
frame of the first base in the sequence and searched against protein databases.

The DNA6 options requires quite a long search time and is not recommended for
DNA sequence of more than 1000 to 2000 bases.

2] TARGET ( GSDB | REPETITIVE | dbEST | SWISSPROT | PDB | PROSITE | BLOCKS ): 
specifies the database to which the sequence will be compared. 

Multiple targets are allowed to specify comparison against more than one 
database.

a) SWISSPROT: Swissprot protein sequence database (updated quarterly).  

b) GSDB: Genome Sequence Database, a daily updated DNA sequence database.

c) PDB[4]: Structure database, (Brookhaven) Protein Databank. Hits represent 
homologous proteins of known structure.
									      
d) PROSITE[2]: Protein motif library which can provide clues as to protein
function or classification.  

e) REPETITIVE: Comparison of DNA against a library of human repetitive DNA from
J. Jurka and helps provides annotation of repetitive DNA elements.

f) BLOCKS[9]: Protein motif database based on conserved blocks. 
BLIMPS (Blocks IMproved Searcher) search tool is used for BLOCKS database 
searches.

g) DBEST: Expressed Sequence Tag database.

NOTE: The version of the database searched is listed in the results from 
GENQUEST.

3] METHOD ( SW / FASTA / BLAST / FLASH ): specifies the comparison algorithm to 
be used in the search.  The options are Smith-Waterman (SW) [5], FASTA [6], 
BLAST [7], and FLASH [11]. 

The default method is SW.

Exceptions: For BLOCKS and PROSITE databases, no method needs to be specified, 
since special methods are used for searching those database.

The defaults for FASTA and BLAST are the standard defaults used by these 
programs. Blast and FASTA options can also be set on this line. Descriptions for
these are available by sending "help fasta" or "help blast" e-mail to the
grail@ornl.gov address.

The gap penalty used in the SW program is set on this line using -g.  
For example, SW - g 10 sets the gap penalty to 10. The default is 13.

4] MATRIX ( PAM [n] / Blosum [m] ): specifies the matrix used for protein 
sequence comparison. 

[n] specifies any valid PAM matrix, viz. a multiple of ten, within the range
10 to 250. For example, PAM 250 [8].

[m] can be 62 or 80 [9]. The default is Blosum 62.  

These are not used for DNA-DNA comparison.

5] FILTER: specifies that repetitive DNA elements recognized in the query 
sequence should be masked so as not to lead to unwanted matches against the DNA 
sequence database.  This filtering system use a library of human repetitive DNAs from J. Jurka.
The default is no filtering.  

A DNA query which is then translated and searched against the protein databases
avoids spurious hits that can arise from the translation of repetitive elements.
The utility of such a filter is well documented [10].

6] SCORE num_score: specifies the number of hits to be reported. Default is 10.

7] ALIGN num_align [-g]: specifies the number of hits for which alignment should be performed. 
Default is 10. Usually for proteins 10 to 200 is an appropriate range.	
									      
The SCORE value should be greater than or equal to the ALIGN value.
The program normally does a local alignment, however, a global alignment maybe
requested using -g on this line. 

For example, ALIGN 10 -g returns global alignments of the top 10 hits.

The keywords SCORE and ALIGN apply only to method SW (parallel implementation
of Smith-Waterman).

8] COMMENT comment: specifies one line of text to be prepended to the return 
message from GENQUEST. 

9] SEQ 
   sequence
   ......
   END 

SEQ and END are keywords which specify where the sequence starts and ends in
the e-mail message. 

The sequence must begin on the line following the keyword SEQ (not on the same
line as SEQ).

The sequence can be either standard single letter protein or DNA sequence. 
The length of the sequence lines should be less than 512 characters. 

In DNA sequences, any characters other than A, C, G, T and U are converted to X
(and therefore will be filtered out). U is converted to T.

Blanks are ignored in DNA and Protein sequences.

ADDITIONAL EXAMPLES
-------------------
The examples below illustrate typical queries for various types of searches:

i) This example searches the given protein sequence against SwissProt, using
FASTA with default parameters and default BLOSUM 62 matrix; and, also searches
Prosite (using a special method). 

TYPE Protein
TARGET SwissProt
TARGET Prosite
METHOD FASTA
COMMENT this is my protein sequence comparison run
SEQ
LYSEGRTAAGLVPPRTYILGREFWAAGLUTRYTHISPLEASE
END

ii) This example searches the given DNA sequence against GSDB  and the 
Repetitive DNA library (using the SW default).

TYPE DNA
TARGET GSDB
TARGET REPETITIVE
SEQ
ATAGATAAAGGGTGCTGTTTGGCGAAATATTGCTGCTGGCGCCGTAGATATATAG
CTGTGCTGTGATGTCGCTCGTAGATATAGCTAGTCTAGTCGATCG
END									      
									      
---------------------------------------
XGRAIL CLIENT-SERVER SYSTEM USER MANUAL
---------------------------------------

XGRAIL is a client-server implementation of a group of analysis tools for 
sequence exploration and gene discovery.  It allows the user to find protein
coding regions in anonymous DNA sequences, to assemble gene models, translate
part or all of these models, and search these translations against various
databases. 

Database searches of a region of a DNA sequence against various databases are
also supported. XGRAIL also provides information about GC content, and the
location of several types of functional sites (splice junctions, polyA sites,
Pol II promoters and CpG Islands) and a variety of human repetitive DNA
sequences.

All the information generated during the analysis of a DNA sequence can be saved
for future retrieval and further processing. 

Additionally, an annotation tool is provided within XGRAIL, which facilitates 
marking (annotating) items of significance to the user, and generating an 
annotation report which can then be saved to a file or printed.

Currently the client software has been tested on SPARCstations running Open 
Windows 3.0 and SunOS 4.1.3.

Connection of the user's machine to the Internet is required.		      
									      
OBTAINING AND INSTALLING XGRAIL (Version 1.2) CLIENT SOFTWARE
-------------------------------------------------------------

1. Create a subdirectory in which you wish to install XGRAIL (Version 1.2).

     % mkdir XGRAIL_1.2

   Go to that subdirectory

     % cd XGRAIL_1.2

2. Obtain the XGRAIL (version 1.2)  distribution by anonymous ftp, as follows:

     % ftp arthur.epm.ornl.gov (or ftp 128.219.9.76)

     Name: anonymous

     Password: [your internet address]

     ftp> cd pub/xgrail/sun/ver1.2

     ftp> binary

     ftp> get README

     ftp> get xgrail.sun.ver1.2.tar.Z

     ftp> quit

3. Extract the files from xgrail.sun.ver1.2.tar.Z

     % zcat xgrail.sun.ver1.2.tar.Z | tar xvf -

4. At this point, there should be following files in XGRAIL subdirectory :

     Manual.grail-genquest.July94  (Grail-Genquest User Manual)

     README

     testseqs                      (Subdirectory containing test sequences)

     xgrail_1.2

     xgrail.sun.ver1.2.tar.Z       (Can be deleted at this point)

5. You can start up the xgrail program:

   From the command line        % xgrail_1.2 &

   OR 

   From the file manager by double-clicking on the xgrail_1.2 icon.
									      
DESCRIPTION OF XGRAIL (Version 1.2):
------------------------------------

This section has been organized in the form of a step-by-step tutorial. The best
way to understand the operation and capabilities of XGRAIL is to read the 
following description while running XGRAIL with one of the sample DNA sequences 
provided with the software.

MAIN WINDOWS:

There are three main windows in XGRAIL:  the (top) XGRAIL window, the (middle)
DNA Sequence window and the (lower) ANALYSIS window.

When XGRAIL is started on the client machine, it first contacts the GRAIL server
to check for any informational messages. If there are any, they are retrieved
and displayed in a notice window.

On clearing this window, the empty XGRAIL window is displayed. Across the top of
the window is a menu bar with a number of buttons, menus and controls. 

A button can be selected by clicking on it with the left mouse button.

A menu is indicated by an inverted triangle. The menu options can be viewed by
holding down the right mouse on the menu. 

A menu option can be selected by holding down the right mouse button and moving
the cursor to the appropriate option, and then releasing the button.

Clicking with the left mouse on a menu results in the selection of the default 
(typically the first) menu option in the list.

Initially only the File Menu is enabled, since a sequence must be loaded before
any other actions can be taken.

MENUS IN XGRAIL WINDOW:

   FILE MENU (LOAD & SAVE):  The first step in using XGRAIL is to load a DNA 
sequence file into the system. Selecting the menu option Load pops up a sequence
directory window which displays subdirectories, sequence files (.seq) and XGRAIL
(Version 1.2) analysis files (.xgr.1.2).

A file or subdirectory can be selected by double clicking (left mouse) on the 
name. Alternatively, clicking (left mouse) on the file name and then clicking
on the Load button at the bottom of the directory window loads the file.  

If an analysis file (.xgr.1.2) is selected or a sequence (.seq) file is selected
and an analysis file (.xgr.1.2) exists for it, then the information from the
analysis file is read in and displayed.

If a .seq file is selected and no analysis file (.xgr.1.2) exists for it, then
the sequence is read from the file, sent to the GRAIL server for calculation of
coding probability, exon prediction and polyA functional sites prediction.

[For purposes of this discussion and on the XGRAIL Display, the term exon is 
used interchangeably with coding region. Non-coding exons or portions of 
non-coding exons are not currently recognized by the system.]		      
									      
Depending on the size of the sequence and the load on the GRAIL server, it may 
take a few seconds to a few minutes for the results to come back from GRAIL 
server.

At this point, the other menus and controls are enabled and the GRAIL analysis
displayed in several windows:

XGRAIL WINDOW displays the GRAIL analysis of the query sequence, identifying
potential coding exons on the forward and reverse strands which are color coded
for quality with green = "excellent" (about 90% probable), blue = "good" (about
60% probable) and red = "marginal" (about 20% probable). Gene models are also 
represented in this window by a set of linked cyan bars. Several other features 
which will be described below are also displayed in this window. This window is 
initially 10kb wide and longer sequences can be fit into the window by using the
zoom feature. Dragging the zoom indicator with left mouse changes the zoom.

DNA SEQUENCE WINDOW displays 100 bases of DNA sequence from both strands. The
position of this sequence is indicated by the double vertical green lines in the
central regions of the XGRAIL window.  The position of this blow-up region can
be moved by clicking at the desired location on the central horizontal band
(the gray-scale band showing GC content) of the XGRAIL window or by clicking the
arrows on either side of the DNA Sequence window (left mouse). This window also 
displays exons from the Exon Table as color coded horizontal bars and exons from 
gene models similarly in cyan. Translations of exons are also shown in the 
central region of this window (described later). Other features (PolyA sites, 
Promoter regions, CpG Islands, Repetitive DNA elements) are displayed as 
color-coded sequence characters.

ANALYSIS WINDOW: This window displays information about exons and gene
models found in the sequence, in three subwindows: 

Exon Table (leftmost) subwindow displays information for each of the exons found
by GRAIL: Strand (Forward or Reverse), reading frame, position of the exon on 
the sequence, limits between which the preferred reading frame is open, quality 
score, and the number of database searches done. 

Model Exon Table (central) subwindow displays information for each of the exons
in the currently selected gene model, assembled by GRAIL: reading frame,
position of the exon on the sequence, quality scores of translation start, 
acceptor and donor splice junctions used in building the gene model, and the 
number of database searches done.

A * in front of the first model exon score indicates that this score is for 
translation start, not acceptor junction, and an absence of * means that the 
assembly program did not find a suitable start site. A * and blank score after 
the last exon indicates a suitable stop codon has been found, while a numerical
score and absence of a * indicates that this is a donor junction, and no stop 
codon has been found. 

Gene Model Table (rightmost) subwindow displays information for each of the gene
models assembled by GRAIL: date of assembly, strand (Forward or Reverse), region
of the sequence considered in assembling the model, score, number of exons in 
each model and the number of database searches done.

Any time during the session, the user has the option to save the current state
of analysis to the analysis file, by selecting Save option from File Menu.
(PLEASE NOTE that the previous analysis file for that sequence is overwritten).
									      
GRAIL 1-1a-2 MENU:  Clicking on 1, 1a or 2 in this menu results in the display 
of information related to that version of GRAIL analysis. 
(GRAIL 2 is the default).

The difference between the three versions is as follows: 

GRAIL 1 recognizes coding potential without using other signals and is perhaps 
best suited for those cases when small fragments are to be evaluated or when 
genomic context is considered to be inappropriate (as in cDNA sequences). 

GRAIL 1a is an updated version of GRAIL 1. It first uses a fixed-length window 
to locate the potential coding regions and then evaluates a number of discrete
candidates of different lengths around each potential coding region, using
information from the two 60-base regions adjacent to that coding region, to find
the "best" boundaries for each such region. 

GRAIL 1a, like GRAIL 1, is more useful for non-genomic sequences (like cDNA 
sequences).

GRAIL 2 identifies exons by using signals such as splice junctions and other 
genomic context.  It is therefore best suited for analysis of genomic sequences.

Please note that: 
(a) Models of genes can be constructed only from GRAIL 2 exons.
(b) Database searches and protein translations can be done from any version.

   WINDOWS MENU: Clicking the right mouse button on Windows menu displays the
list of several additional windows: DNA Sequence, Analysis, Features, 
Annotations, Range Markers, Sketch and Grail Publ windows.

Releasing the right mouse button on one of the options results in the display of
the corresponding popup window.

FEATURES WINDOW: This window displays the list of features (of the currently
selected feature type) found in the sequence by GRAIL: PolyA sites, Promoters, 
CpG Islands and Repetitive DNA elements. The feature type to be displayed can be
selected from a selection menu found on left side of the window. A specific 
feature item can be highlighted by clicking on its entry in the list. The item 
is highlighted in the XGRAIL and Features Windows.

All functional features supported by GRAIL are described later in the manual.

ANNOTATIONS WINDOW: This window displays items selected by the user for 
inclusion in an annotation report. An item can be selected for annotation by
clicking with the right mouse button on its entry in the relevant Table: 
Exon Table, Gene Model Exon Table or Gene Model Table in Analysis Window; 
Feature Table(PolyA, Promoter, CpG Island or Repetitive DNA) in Features Window;
Database Search Table in Database Search Info Window. 

The Annotation Tool is described in detail later in the manual.

RANGE MARKERS WINDOW: This window displays positions of the markers which
set the limits for various operations, viz. constructing a single gene model,
performing a database search for a region of the DNA sequence. The markers are 
the blue arrows at the ends of the central region of the XGRAIL window which can
be pulled to any position along the sequence using the sliders on this window.
Alternatively the arrows can themselves be dragged on the main XGRAIL window. 
									      
SKETCH WINDOW:  This window is overlaid on XGRAIL window and displays the 
coding probability over the entire sequence and provides a reference for the 
user's location in the whole sequence. The red horizontal marker in the Sketch
window corresponds to the portion of the sequence displayed in the larger 
XGRAIL window.

GRAIL PUBL WINDOW: This window displays all GRAIL-related publications.

   FEATURES MENU: This menu toggles on and off the display of any of the feature
types in the XGRAIL and DNA Sequence windows, viz. PolyA sites, Promoters, CpG 
Islands or Repetitive DNA elements.

Clicking on an individual feature item in XGRAIL Window highlights it and the 
corresponding entry in the Features Window. 

   ASSEMBLE MENU: is used to construct gene models within specified regions of 
the sequence. The region for assembly is defined using the Gene Assembly Markers
window (described earlier). 

There are three options for Assembly:  Auto Select which allows the program to
pick the "best" model, Forward Strand which assembles exons on the forward
strand and Reverse Strand, which assembles exons on the reverse strand.

This version of the gene assembly program, GAP III, uses dynamic programming 
and heuristics, and takes only a few seconds to run.

The results of model construction can be viewed in the XGRAIL window as a 
series of linked cyan bars and in the DNA Sequence window as cyan bars. The 
details of the model are listed in the Model Exon Table and Gene Model Table.

Selection of Exons and Models: For a number of operations including translation
of individual exons or models, and database searches for individual exons or 
models, a particular exon or model must first be selected. 

Exon selection is done by clicking on the desired exon bar in the XGRAIL or DNA
Sequence window or by clicking on the corresponding row in the Exon Table, or 
Model Exon Table.  

A particular gene model can be selected by clicking on the corresponding row of
the Gene Model table.

   TRANSLATION MENU: This displays the translation of exons in the exon table, 
gene model exons, or entire gene models based on a choice in the Translation
submenu.

For exons in the exon table, a translation is provided in only the statistically
preferred reading frame (one frame for a given exon). This frame is listed in 
the Exon Table window. 

In GRAIL 2 and GRAIL 1a, the choice of this translation frame is correct greater
than 98% of the time, while in GRAIL 1 it is about 95% correct.

For gene models, the frame appropriate to the exon and model is used (frames 
listed in Model Exon table). Since the gene model is constructed in a manner 
which is reading frame consistent with the initial statistical estimates of 
frame, the frame used here is virtually always the same as in the original exon
table.									      
									      
The resulting translation appears in a Translation pop-up window.

The extent of the exons and their translations can also be viewed in the DNA
Sequence window, in the central horizontal area between the two DNA sequence 
strands. Yellow single letter protein translation is displayed when an exon in 
the exon table is selected. 

Selecting a gene model exon results in display of the translation in cyan, 
overlying the yellow translation from the same exon in the exon table. 

If there is a frame discrepancy at a given location both translations will 
appear simultaneously.

   SEARCH DATABASE MENU: There are two options in the submenu:

GENQUEST SEARCH: allows the user to access the GENQUEST (Q) sequence comparison
server. A GENQUEST Search Options window comes up and displays all the available
options.

Here, a multitude of options are possible including search of exons, gene model
exons, and gene models, as well as other selected parts of the DNA sequence 
against SwissProt, Prosite, PDB (protein structure database), the Genome 
Sequence Database (GSDB), BLOCKS, dbEST and the repetitive DNA library, using a
number of algorithms.

Other details for these options are described in the GENQUEST manual.  The 
results of QuikSrch and GENQUEST searches are displayed in a pop-up window.

QUIKSRCH: searches selected translated exons or models against SwissProt. 
The choice of exon, gene model exon, or gene model is made through use of the 
QuikSrch submenu. This search uses a Fasta-like prescreen followed by a 
second optimization step based on the Smith-Waterman method. 

   DB SEARCH INFO MENU: Tracks database searches and allows one to find and 
display previous database search results. The submenu allows selection of the 
GRAIL Exon, Model Exon, or Gene Model search list. The selected list is 
displayed in the Search Info pop-up window from which the results of a given 
search may be chosen for display. These results appear in a pop-up window which 
lists matches and the target database used for the search. Search results can be
deleted from the Search Info window. 

   ZOOM-TO-FIT BUTTON: Between the Db Search Info and Zoom Slider is a circular
button which, when selected, automatically fits the sequences within the XGRAIL 
window.

   ZOOM:  The Zoom slider allows for rescaling of the loaded sequence in the 
XGRAIL window. The default zoom value is 1 and corresponds to 10 kb per screen 
width.  The zoom can be changed by dragging the zoom slider.

   QUIT BUTTON:  Ends client-server interaction after allowing the user the 
option of saving new analysis and changes made during the session. Basically the
current state of analysis, including database searches, can be saved in a 
.xgr.1.2 file.								      
									      
DESCRIPTION OF FEATURES: GRAIL can find the following functional sites in a DNA
sequence:

   POLYA Site: The vertical cyan bars above and below the GC band of the XGRAIL 
window mark the positions of potential poly-A addition signals.

   PROMOTERS: Pol II Promoter regions are displayed as hollow, yellow rectangles
with a red vertical bar (representing 'TATA' location) above or below the GC 
band, in the XGRAIL window. The current version of promoter recognition 
software is trained to recognize only Pol II promoters having TATA-like 
elements.

   NOTE:  The Pol II Promoter recognition system [7] is a prototype. The current
system is trained to recognize only Pol II promoter regions with TATA-like
elements containing the subsequences TATA or ATA. The system detects about 60% 
of Pol II promoter regions with TATA-like elements. The false positive rate is 
approximately 1 per 7100 bases of DNA sequence. The statistics have been 
calculated based on annotated GSDB sequences. The false positive rate may be 
lower due to possible unannotated promoters.

   CpG ISLANDS: CpG Islands are displayed as hollow, purple rectangles with 
vertical tabs superimposed over the GC band, in the XGRAIL window.

   REPTTV DNA: An option for locating various repetitive DNA elements is 
provided and these elements are indicated by centrally located yellow hollow 
rectangles with vertical tabs and cyan arrow-heads indicating their orientation.
Analysis of repetitive DNAs requires detailed sequence comparison using Smith-
Waterman and may take some time especially for very long sequences or those 
with many repetitive elements (about 6 minutes for a 21kb sequence with 23 
hits).

Once the repetitive analysis is done this feature can be toggled on and off the
display like any other feature type using the Features menu. The human 
repetitive DNA annotations come from a library of 65 elements provided by 
J. Jurka. 

   GC CONTENT: is represented by gray shading in a central horizontal band in
the XGRAIL window. This band reflects the GC content of a sliding 50 base region
with white being high GC and black low GC.

   ALTERNATE EXONS: In cases where exons overlap on both strands, GRAIL 1 & 2 
incorporate a strand-determination algorithm to determine the more likely 
strand. The rejected exon's coding probability is, nonetheless, displayed 
(by default). It can, however,  be toggled, using this option.
									      
DESCRIPTION OF ANNOTATION TOOL:

Select Menu: The annotation of only a single type of item is displayed, at a 
time. The type of item to be displayed can be selected from this menu in the
Annotation window. 

User-Input: Selecting "User-Input" option from this menu displays a window with 
fields which can be filled by the user.Unlike all other items, User-Input is 
always a part of the annotation.

Sequence: Selecting "Sequence" option from this menu displays (the first and 
last 250 bases of) the DNA sequence itself. The entire sequence will be 
included in the annotation report. 

Grail Publications: Selecting "Grail Publ" option from this menu displays the 
list of GRAIL-related publications.

Grab, Ungrab: All features of the currently selected (in annotation menu) 
feature type can be brought into the annotation report by clicking on the "Grab" 
button. Similarly, they can all be "deannotated" by clicking on the "Ungrab" 
button, in this window.

Protein Translations: To include the protein translation of Exons, Gene Model 
Exons or Gene Models, click with the right mouse button on its entry in the 
annotation window, when that particular item type is being displayed. A (T) is
displayed to the left of that entry, indicating that the protein translation of
that exon (or Gene model) will be included in the annotation report.

Incl/All: The user can select the item types to be included in the annotation 
report by checking the boxes next to the corresponding Menu options (under the 
Incl column), and then selecting the "Incl" option (from Incl-All menu). The 
Incl option allows the user to include only the item types of interest in the
annotation report. Selecting "All" (from Incl-All menu) overrides the
checkmarks, and includes all the annotated items from all item types in the 
annotation report.

Print/Save: The annotation report can be printed by clicking on the "Print" 
button or saved to a file by clicking on the "Save" button.

Annotation File: The annotation report is saved in a file, which is stored in 
the same directory as the sequence file. The annotation report file name 
consists of the sequence file name, appended with .subset.anno.1.2 (for "Incl" 
option) or .full.anno.1.2 (for "All" option), followed by the current date
(e.g. humactga.seq.subset.anno.1.2.07_26_1994).


NOTES:          

i)   Windows can be moved by grabbing and dragging their edges.  
ii)  Popup Windows can be removed by  "unpinning".  
iii) Some windows can be resized by grabbing and dragging the corners.
iv)  Selecting the File button allows another sequence to be chosen for    
     analysis.
v)   Any popup window can be dismissed by clicking on the pushpin in the upper
     left corner of the window and popped up again using the Windows menu.
									      
CHANGES IN XGRAIL (Version 1.2)
-------------------------------

1. Message from Grail Staff: (On startup, if any)

   When XGRAIL (version 1.2) is started up, it first tries to get messages
   from the server, if any. This provides a mechanism for Grail Staff to inform
   the user of any new versions, or any relevant information regarding GRAIL.

2. The program now ignores digits, and can therefore read sequence files with 
   sequence base numbers.

3. Features Highlighting: 

   A feature can be highlighted by clicking on its graphical representation
   (in XGRAIL or DNA SEQUENCE window) or on its entry in FEATURES window.

4. Features Window:

   This window displays a list of all features of the selected feature type.

5. DNA sequence database search on reverse strand: 

   In version 1.1, database searches on DNA sequence were limited to
   forward strand. In this version, searches can be performed on reverse
   strand, also.

6. Alt Exons Toggle:

   In cases where exons on both strands overlap, GRAIL 1 and 2 use a strand-
   determination algorithm to determine the more likely strand. The rejected
   exon's coding probability is, nonetheless, displayed (by default). It can,
   however be toggled on or off, using "Alt Exons" option under "Features" menu
   in XGRAIL window.

7. CpG Islands:

   This version of XGRAIL incorporates an algorithm to determine CpG Islands in
   a DNA sequence. The algorithm is based on the definition of CpG Islands by
   Gardiner-Garden and Frommer (J. Mol. Bio 196:261-282, 1987). 

8. Annotation:

   A new annotation tool is incorporated in this version of XGRAIL. It allows
   the user to mark items of interest, generated in the process of analysis,
   to be included in an annotation report. The annotation report can then be
   saved to a file or printed.

9. INCOMPATIBILITY with previous XGRAIL (Versions 1 & 1.1) analysis files (.xgr
   and .xgr.1.1):

   The analysis is saved in an analysis file. The name of the analysis file
   consists of .xgr.1.2 appended to the sequence file name.

   Since several analysis algorithms have been altered (& improved), the older
   analysis files (.xgr & .xgr.1.1) are no longer supported by xgrail_1.2.

   You can still access the old analysis files, using the previous versions,
   viz. xgrail (or xgrail_1.1, respectively).
									      
------------------------------------------
XGENQUEST CLIENT-SERVER SYSTEM USER MANUAL
------------------------------------------

XGENQUEST is a client-server implementation of the integrated sequence
comparison system.

Currently the client software has been tested on Sparc stations running Open
Windows 3.0 and SunOS 4.1.3.

Connection of the user's machine to the Internet is required.


Differences between XGENQUEST & GENQUEST E-mail Access
------------------------------------------------------

GENQUEST server expects the query to be in a specified format, described in the
USER MANUAL FOR GENQUEST E-MAIL SERVER, described above.

XGENQUEST client software formats the query based on the options selected by the
user, thus relieving the user from that responsibility.

XGENQUEST allows only a single database target to be specified in a query,
whereas e-mail query can specify multiple database targets.

XGENQUEST does not support the IBM FLASH method of sequence comparison.


File Management in XGENQUEST
----------------------------

XGENQUEST allows the user to browse the filesystem and displays all filenames
with extensions .seq (for DNA sequences) and .prt (for protein sequences), and
subdirectory names. When the user selects a file for loading (by double-
clicking on the filename), the sequence is displayed in a pop up window.

File Format: XGENQUEST expects the .seq and .prt files to be in FASTA format;
please refer to examples included in the software distribution.

The user can save searches to the disk. The searches are saved in individual
files (the filename for a search file is the name of the sequence file,
appended with .gqr, and search number, e.g. humvpnp.seq.gqr1). The user can
select a search file (using the browser), to be displayed in a popup window.
The user can also delete a search file, using the Delete Search button in this
popup window.
									      
OBTAINING AND INSTALLING XGENQUEST (Version 1.1) CLIENT SOFTWARE
----------------------------------------------------------------

1. Create a subdirectory in which you wish to install XGENQUEST (Version 1.1).

     % mkdir XGENQUEST

   Go to that subdirectory

     % cd XGENQUEST

2. Obtain the XGENQUEST (ver 1.1) distribution by anonymous ftp, as follows:

     % ftp arthur.epm.ornl.gov (or ftp 128.219.9.76)

     Name: anonymous
     Password: [your internet address]

     ftp> cd pub/xgenQuest/sun/ver1.1
     ftp> binary
     ftp> get README
     ftp> get xgenQuest.sun.ver1.1.tar.Z
     ftp> quit

3. Extract the files from xgenquest.sun.ver1.1.tar.Z

     % zcat xgenQuest.sun.ver1.1.tar.Z | tar xvf -


4. At this point, there should be following files in XGENQUEST subdirectory :

     Manual.grail-genquest.July94   (Grail-Genquest User Manual)
     README
     testseqs                       (Subdirectory containing test sequences)
     xgenQuest_1.1
     xgenQuest.sun.ver1.1.tar.Z     (Can be deleted at this point)

5. You can start up the xgenQuest_1.1 program:

   From the command line        % xgenQuest_1.1

   OR

   From the file manager by double-clicking on the xgenQuest_1.1 icon.
									      
----------------
ACKNOWLEDGEMENTS
----------------

GRAIL Research and Development is supported by the Office of Health and
Environmental Research, United States Department of Energy under contract No.
DE-AC05-840R21400 with Martin Marietta Energy Systems, Inc.

DATABASES
---------
We thank the administrators of the following databases:

SWISS-PROT (Bairoch and Boeckmann, 1992)
PDB (Brookhaven National Laboratory)
PROSITE (Bairoch, 1993)
GSDB (Bilofsky and Burks, 1988)
BLOCKS/BLIMPS (Henikoff and Henikoff, 1991)
DBEST (Boguski et al., 1993)
HUMAN REPETITIVE DNA (Jurka, 1990; Jurka, Walichiewicz and Milosavljevic,1992;
Jurka et al., 1993)

METHODS
-------
We thank the authors of the following methods:

FASTA (Pearson and Lipman, 1988)
BLAST (Altschul, 1990)
Smith-Waterman (Smith and Waterman, 1981)

SERVERS
-------
We thank IBM T. J. Watson Research Center for the use of their dFLASH server.

----------------
SOFTWARE SUPPORT
----------------

A copy of this Manual can be obtained by sending a message to GRAIL@ornl.gov or
Q@ornl.gov with the word HELP on the subject line or on the first text line. 

Questions, Suggestions and Help:

If you have any questions or suggestions, or need further help with any GRAIL
system, please send an e-mail to the GRAILMAIL@ornl.gov address.
									      
------------------
GRAIL PUBLICATIONS
------------------

1.   E. C. Uberbacher and R. J. Mural, "Locating protein-coding regions in 
     human DNA sequences by a multiple sensor-neural network approach," 
     Proc. Natl. Acad. Sci. USA, vol. 88, pp. 11261-11265 (December 1991).

2.   R.J. Mural, J. R. Einstein, X. Guan, R. C. Mann and  E.C. Uberbacher, 
     "An Artificial Intelligence Approach to DNA Sequence Feature Recognition,"
     TIBTECH, Vol 10 (Jan-Feb 1992).

3.   X. Guan, R.J. Mural, J.R. Einstein, R.C.Mann, and E.C. Uberbacher, 
     "GRAIL: An Integrated Artificial Intelligence System for Gene Recognition 
     and Interpretation," Proc., The Eighth IEEE Conference on AI Applications,
     pp. 9-13 (1992).  

4.   E. C. Uberbacher, J. R. Einstein, X. Guan, R. J. Mural,"Gene Recognition
     and Assembly in the GRAIL system: Progress and Challenges," Proceedings of
     the Second International Conference on Bioinformatics, Supercomputing, and
     Complex Genome Analysis, eds. Lim, H. A., Fickett, J. W., Cantor, C. R. 
     and Robbins, R. J. (World Sci., USA), pp. 465-476 (June 1992).

5.   Y. Xu, R. J. Mural, M. B. Shah and E. C. Uberbacher,"Recognizing Exons in
     Genomic Sequence Using GRAIL II," Genetic Engineering: Principles and 
     Methods, Jane Setlow (Ed.), Plenum Press, Vol 15 (June 1994). (In press)

6.   Y. Xu, J. R. Einstein, R. J. Mural, M. B. Shah and E. C. Uberbacher, "An
     Improved System for Exon Recognition and Gene Modeling in Human DNA
     Sequences", Proceedings of The 2nd International Conference on Intelligent
     Systems for Molecular Biology, AAAI Press (August 1994). (In press)

7.   Y. Xu, R. J. Mural and E. C. Uberbacher, "Constructing Gene Models from
     Accurately-predicted Exons: An Application of Dynamic Programming," 
     CABIOS (In press).

8.   S. Matis, R. J. Mural, M. B. Shah and E. C. Uberbacher, "An Artificial 
     Intelligence Method for Locating Promoters in Human DNA Sequences," 
     To be submitted to Nucleic Acids Research.

9.   M. B. Shah, X. Guan, J. R. Einstein, S. Matis, Y. Xu, R. J. Mural and 
     E. C. Uberbacher, "User's Guide to GRAIL and GENQUEST (Sequence Analysis,
     Gene Assembly And Sequence Comparison Systems) E-mail Servers and XGRAIL
     (Version 1.2) and XGENQUEST (Version 1.1) Client-Server Systems," 
     Available by anonymous ftp to arthur.epm.ornl.gov (128.219.9.76) from 
     directory pub/xgrail or pub/xgenQuest as file Manual.grail-genquest.July94 
     (July 1994).
									     
----------
REFERENCES
----------

[1]     Bairoch, A. and B. Boeckmann. 1992. Nucl. Acids Res., 20: 2019-2022.

[2]     Bairoch, A. 1993. Nucl. Acids Res., 21: 3097-3103.

[3]     Jurka, J., Walichiewicz, J. and A. Milosavljevic. 1992. J. Mol. Evol.
	35: 286-291.

[4]     Abola, E.E.,Bernstein, F.C., Bryant, S.H., Koetzle, T.F. and J. Weng.
	1987. Protein data bank.  pp. 107-132 in "Crystallographic Databases-
	Information Content, Software Systems, Scientific Applications," F. H.
 	Allen, G. Begerhoff and R. Sievers, eds. Data Commission of the
        International Union of Crystallography, Cambridge.

[5]     Smith, T. F., and M. Waterman. 1981. Advan. Appl. Math. 2: 482-489.

[6]     Pearson, W. R. and D. J. Lipman. 1988. Proc. Natl. Acad. Sci. USA, 85:
 	2444-2448.

[7]     Altshcul, S. F., Gish, W., Miller, W., Myers, E. W. and D. J. Lipman.
	1990. J. Mol. Biol, 215: 403-410.

[8]     Dayhoff, M. O., Schwartz, R. M. and B. C. Orcutt. 1978. In "Atlas of
	Protein Sequences and Structure," (Dayhoff, M. O. ed) Vol. 5, Suppl. 3,
 	pp. 345-352.  Nat. Biomed. Res. Found., Washington, D. C.

[9]     Henikoff, S. and Henikoff, J.G.  1992.  Proc. Natl. Acad. Sci. USA
	89:10915-10109.

[10]    Claverie, J-M. and States, D. J.  1993.  Computers Chem. 17:1919-201.

[11]    Califano, A. and Rigoutsos, I.  1993.  In: "Proceeding of the First
	International Conference on Intelligent Systems for Molecular Biology"
 	July, 1993, Bethesda, MD.