File: README *|***************************************************************************|* *| |* *| Programs: phd2seqfasta and phd2qualfasta |* *| Version: 0.960108 |* *| |* *| Copyright (C) 1995-1997 by Phil Green and Brent Ewing. |* *| All rights reserved. |* *| |* *| This software is a beta-test version of the phd2fasta |* *| program set. |* *| It should not be redistributed or used for any commercial |* *| purpose, including commercially funded sequencing, without |* *| written permission from the author and the University of |* *| Washington. |* *| |* *| This software is provided ``AS IS'' and any express or |* *| implied warranties, including, but not limited to, the |* *| implied warranties of merchantability and fitness for a |* *| particular purpose, are disclaimed. In no event shall |* *| the authors or the University of Washington be liable for |* *| any direct, indirect, incidental, special, exemplary, or |* *| consequential damages (including, but not limited to, |* *| procurement of substitute goods or services; loss of use, |* *| data, or profits; or business interruption) however caused |* *| and on any theory of liability, whether in contract, strict |* *| liability, or tort (including negligence or otherwise) |* *| arising in any way out of the use of this software, even |* *| if advised of the possibility of such damage. |* *| |* *|***************************************************************************|* This document discusses the programs "phd2seqfasta" and phd2qualfasta". You can avoid dealing directly with these programs by using the perl script "phred_and_phrap.perl". See the documentation supplied with "consed". Program: phd2seqfasta The program "phd2seqfasta" reads PHD files, extracts the sequences, and creates a single FASTA file containing the sequences. The PHD files may be created either by Phil Green's base calling program "phred" or by the sequence editing program "consed". Each read may be represented by one or more versions of a PHD file in the same directory as follows. Initially, "phred" creates the first version of a PHD file, whose name ends with ".phd.1". Subsequently, "consed" may write edited versions of the PHD file, whose names end with a digit that indicates the version number; for example, ".phd.2". The program "phd2seqfasta" finds the highest numbered version of each PHD file that lies within the directory specified on the command line, extracts the sequence from each file, and builds a single FASTA file wherein each sequence begins with a FASTA header followed by the sequence of called bases. The name of the output sequence FASTA file is specified as the second command line parameter. Usage: % phd2seqfasta name of the directory containing the PHD files to process. name of the output sequence FASTA file. Program: phd2qualfasta The program "phd2qualfasta" reads PHD files, extracts the quality data, and creates a single FASTA file containing the quality data. The PHD files may be created either by Phil Green's base calling program "phred" or by the sequence editing program "consed". The program "phd2qualfasta" finds the highest numbered version of each PHD file that lies within the directory specified on the command line, extracts the quality information from each file, and builds a single FASTA file wherein the quality data for each sequence begins with a FASTA header followed by a list of the quality scores. The name of the output quality FASTA file is specified as the second command line parameter. Usage: % phd2qualfasta name of the directory containing the PHD files to process name of the quality FASTA output file. Notes: 1. The sequence and quality FASTA files are used as input to Phil Green's sequence comparison program called "cross_match" and his sequence assembly program called "phrap". An example of the data flow begins with the transfer of the chromat files from the ABI MacIntosh to a UNIX workstation. Sunsequently, the bases are called using "phred", the vector sequence is screened out using "cross_match", and the reads are assembled using "phrap". A more detailed summary of the data flow consists of the steps o create a directory called "chromat_dir" for the chromat files and a directory called "phd_dir" for the PHD files on the UNIX workstation where further processing will occur using the command % mkdir chromat_dir phd_dir o transfer chromat files from the ABI MacIntosh to the directory named "chromat_dir" using the MacIntosh program "fetch". o run "phred" in the directory above "chromat_dir" and store the PHD files that "phred" creates in the directory "phd_dir" using the command % phred -id chromat_dir -pd phd_dir o run phd2seqfasta with the command line parameters % phd2seqfasta phd_dir cosmid_fasta to create a sequence FASTA file called "cosmid_fasta", which can be read by "cross_match" and "phrap". o run phd2qualfasta with the command line parameters % phd2qualfasta phd_dir cosmid_fasta.qual to create a quality FASTA file, which can be read by "cross_match" and "phrap". o run "cross_match" to screen vector sequence out of the sequence FASTA file using the command % cross_match cosmid_fasta vector.seq -minmatch 12 \ -penalty -2 -minscore 20 -screen > screen.out to create the vector screened FASTA file named "cosmid_fasta.screen". o run "phrap" to assemble the reads into contigs using the command % phrap cosmid_fasta.screen -ace > phrap.out A significant improvement in assembly is gained by using "phrap" with the base quality evaluation performed by "phred". 2. The program "consed" allows one to examine the assembled sequence, to examine the individual reads as they were aligned by "phrap" in the assembly, to examine the trace data corresponding to any base used by "phrap" in the assembly", and to edit the base calls. "consed" requires a ".ace" file generated by "phrap" during the assembly and the PHD files and the ABI chromat files for the reads used in the assembly. 3. "cross_match" and "phrap" are available from Phil Green Phil Green Department of Molecular Biotechnology University of Washington Box 357730 Seattle, WA 98195-7730 phg@u.washington.edu "consed" is available from David Gordon David Gordon Department of Molecular Biotechnology University of Washington Box 352145 Seattle, WA 98195-7730 gordon@mbt.washington.edu "phred", "phd2seqfasta", and "phd2qualfasta" are available from Brent Ewing Brent Ewing Department of Molecular Biotechnology University of Washington Box 357730 Seattle, WA 98195-7730 bge@u.washington.edu End: README