The Applications (programs)

 

Contents

Introduction

The programs are listed in alphabetical order, divided into four sections. Look at the individual applications or go to the GROUPS page.

Applications in the current release

If you have any comments about any of the programs, mail the EMBOSS open discussion list emboss@emboss.open-bio.org.
Bug reports should be sent to the EMBOSS Support Team emboss-bug@emboss.open-bio.org

Program nameAuthor(s) Description
aaindexextractRFCGR Extract data from AAINDEX
abiviewRFCGR Reads ABI file and display the trace
acdcSanger Tests definition files for any EMBOSS application.
antigenicRFCGR Finds antigenic sites in proteins
backtranseqRFCGR Back translate a protein sequence
bananaSanger Bending and Curvature Plot in B-DNA
biosedRFCGR Replace or delete sequence sections
btwistedRFCGR Calculates the twisting in a B-DNA sequence
caiRFCGR CAI codon usage statistic
chaosSanger Create a chaos plot for a sequence.
chargeRFCGR Protein charge plot
checktransEBI ORF property statistics
chipsRFCGR Codon usage statistics
cirdnaNorway Draws circular maps of DNA constructs
codcmpRFCGR Codon usage table comparison
coderetRFCGR Extract CDS, mRNA and translations from feature tables
compseqRFCGR Counts the composition of dimer/trimer/etc words in a sequence
consRFCGR Creates a consensus from multiple alignments
cpgplotRFCGR Plot CpG rich areas
cpgreportRFCGR Reports CpG rich regions
cuspRFCGR Create a codon usage table
cutgextractHGMP Extract data from CUTG
cutseqRFCGR Removes a specified section from a sequence.
danRFCGR Plot melting temperatures for DNA.
dbiblastSanger Database indexing for BLAST 1 and 2 indexed databases
dbifastaRFCGR Index a fasta database
dbiflatSanger Database indexing for flat file databases
dbigcgSanger Database indexing for GCG formatted databases
dbxfastaRFCGR Database b+tree indexing for fasta file databases
dbxflatRFCGR Database b+tree indexing for flat file databases
dbxgcgRFCGR Database b+tree indexing for GCG formatted databases
degapseqRFCGR Removes gap characters from sequences
descseqRFCGR Alter the name or description of a sequence.
diffseqRFCGR Find differences between nearly identical sequences
digestRFCGR Protein proteolytic enzyme or reagent cleavage digest
distmatRFCGR Creates a distance matrix from multiple alignments
dotmatcherSanger Produces a dotplot of two sequences.
dotpathRFCGR Displays a non-overlapping wordmatch dotplot of two sequences
dottupSanger DNA sequence dot plot
dregSanger Regular expression search of a nucleotide sequence
einvertedSanger Finds DNA inverted repeats
embossdataRFCGR Finds or fetches the data files read in by the EMBOSS programs
embossversionRFCGR Writes the current EMBOSS version number
emowseRFCGR Protein identification by mass spectrometry
emmaRFCGR Multiple alignment program
entretRFCGR Reads and writes (returns) flatfile entries
epestfindAustria Finds PEST motifs as potential proteolytic cleavage sites
eprimer3RFCGR Picks PCR primers and hybridization oligos
equicktandemSanger Finds tandem repeats
est2genomeSanger Align EST and genomic DNA sequences
etandemSanger Looks for tandem repeats in a nucleotide sequence.
extractfeatRFCGR Extract features from a sequence
extractseqRFCGR Extract regions from a sequence.
findkmRFCGR Calculates Km and Vmax for an enzyme reaction
freakRFCGR Residue/base frequency table or plot
fuzznucRFCGR Nucleic acid pattern search
fuzzproRFCGR Protein pattern search
fuzztranRFCGR Protein pattern search after translation
garnierEBI Predicts protein secondary structure
geeceeSanger Calculates the fractional GC content of nucleic acid sequences
getorfRFCGR Finds and extracts open reading frames (ORFs)
helixturnhelixRFCGR Finds nucleic acid binding domains.
hmomentRFCGR Hydrophobic moment calculation
iepRFCGR Calculates the isoelectric point of a protein
infoalignRFCGR Information on a multiple sequence alignment
infoseqRFCGR Displays some simple information about sequences
isochoreSanger Plots isochores in large DNA sequences
jembossctlRFCGR Jemboss Authentication Control
lindnaNorway Draws linear maps of DNA constructs
listorRFCGR Writes a list file of the logical OR of two sets of sequences
marscanRFCGR Finds MAR/SAR sites in nucleic sequences
maskfeatRFCGR Mask off features of a sequence
maskseqRFCGR Mask off regions of a sequence.
matcherSanger Local alignment of two sequences
megamergerRFCGR Merge two large overlapping nucleic acid sequences
mergerRFCGR Merge two overlapping sequences
msbarRFCGR Mutate sequence beyond all recognition
mwcontamRFCGR Shows molwts that match across a set of files
mwfilterRFCGR Filter noisy molwts from mass spec output
needleRFCGR Needleman-Wunsch global alignment.
newcpgreportEBI Report CpG rich areas
newcpgseekEBI Reports CpG rich regions
newseqRFCGR Type in a short new sequence.
noreturnRFCGR Removes carriage return from ASCII files
notseqRFCGR Excludes a set of sequences and writes out the remaining ones
nthseqRFCGR Writes one sequence from a multiple set of sequences
octanolSanger Displays protein hydropathy
oddcompNorway Finds protein sequence regions with a biased composition.
palindromeRFCGR Looks for inverted repeats in a nucleotide sequence.
pasteseqRFCGR Insert one sequence into another.
patmatdbRFCGR Matching a Prosite motif against a Protein Sequence Database.
patmatmotifsRFCGR Compares a protein sequence to the PROSITE motif database.
pepcoilRFCGR Predicts coiled coil regions
pepinfoRFCGR Plots simple amino acid properties in parallel
pepnetRFCGR Protein helical net plot
pepstatsRFCGR Protein statistics
pepwheelRFCGR Shows protein sequences as helices
pepwindowSanger Displays protein hydropathy
pepwindowallSanger Displays protein hydropathy of a set of sequences
plotconRFCGR Plots the quality of conservation of a sequence alignment
plotorfRFCGR Plot potential open reading frames
polydotSanger Multiple dotplot
pregSanger Regular expression search of a protein sequence
prettyplotSanger Displays aligned sequences, with colouring and boxing.
prettyseqRFCGR Output sequence with translated ranges
primersearchRFCGR Searches DNA sequences for matches with primer pairs
printsextractRFCGR Preprocesses the PRINTS database for use with the program PSCAN
profitRFCGR Scan a sequence or database with a matrix or profile
prophecyRFCGR Creates matrices/profiles from multiple alignments
prophetRFCGR Gapped alignment for profiles
prosextractRFCGR Extracts ID, AC, and PA lines from the PROSITE motif database.
pscanRFCGR Locates fingerprints (multiple motif features) in a protein sequence.
psiphiRFCGR Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file
rebaseextractRFCGR Extract data from REBASE
recoderRFCGR Find and remove restriction sites but maintain the same translation
redataRFCGR Isoschizomers, references and Suppliers for Restriction Enzymes
remapRFCGR Display a sequence with restriction cut sites, translation etc..
restoverSloan-Kettering Cancer Center Finds restriction enzymes that produce a specific overhang
restrictRFCGR Finds Restriction Enzyme Cleavage Sites
revseqRFCGR Reverse and complement a sequence.
seealsoRFCGR Finds programs sharing group names
seqmatchallSanger Does an all-against-all comparison of a set of sequences
seqretSanger Reads and writes (returns) a sequence.
seqretsplitRFCGR Reads and writes (returns) sequences in individual files
showdbRFCGR Displays information on the currently available databases
showalignRFCGR Display a multiple sequence alignment
showfeatRFCGR Show features of a sequence.
showorfRFCGR Pretty output of DNA translations
showseqRFCGR Display a sequence with features, translation etc
shuffleseqRFCGR Shuffles a set of sequences maintaining composition
sigcleaveRFCGR Predicts signal peptide cleavage sites
silentRFCGR Silent mutation restriction enzyme scan
sirnaRFCGR Finds siRNA duplexes in mRNA
sixpackLION Display a DNA sequence with 6-frame translation and ORFs
skipseqRFCGR Reads and writes (returns) sequences, skipping the first few
splitterRFCGR Split a sequence into (overlapping) smaller sequences.
stretcherSanger Global alignment of two sequences.
stssearchSanger Searches a DNA database for matches with a set of STS primers
supermatcherSanger Finds a match of a large sequence against one or more sequences
sycoRFCGR Synonymous codon usage Gribskov statistic plot
tcodeRFCGR Fickett TESTCODE statistic to identify protein-coding DNA
textsearchRFCGR Search sequence documentation text. SRS and Entrez are faster!
tfextractRFCGR Extract data from TRANSFAC
tfmRFCGR Displays a program's help documentation manual
tfscanRFCGR Scans DNA sequences for transcription factors.
tmapSanger Predict transmembrane proteins
tranalignRFCGR Align nucleic coding regions given the aligned proteins
transeqRFCGR Translates nucleic acid sequences.
trimestRFCGR Trim poly-A tails off EST sequences
trimseqRFCGR Trim ambiguous bits off the ends of sequences
twofeatRFCGR Finds neighbouring pairs of features in sequences
unionLION Reads sequence fragments and builds one sequence
vectorstripRFCGR Strips out DNA between a pair of vector sequences
waterRFCGR Smith-Waterman local alignment.
whichdbRFCGR Search all databases for an entry
wobbleRFCGR Wobble base plot
wordcountSanger Counts words of a specified size in a DNA sequence.
wordmatchSanger Finds all exact matches of a given size between 2 sequences
wossnameRFCGR Finds programs by keywords in their one-line documentation.
yankLION Reads a range from a sequence, appends the full USA to a list file

EMBASSY

EMBOSS is GPL licensed. The libraries are under the Lesser GPL (LGPL).
Programs which have been included from third parties who have their own licencing terms are kept apart under the EMBASSY grouping.

This allows the EMBOSS libraries to link to other software, and only requires that software to have an LGPL-compatible licence. Phylip, for example, fits this model.

But, to the user they do look exactly like EMBOSS aplications.

EMBASSY - PHYLIP

The PHYLIP programs in this EMBASSY package were ported from release 3.572.

PHYLIP 3.61 has being converted as PHYLIPNEW and was released with EMBOSS 3.0.0 as a beta version.

The first release was named PHYLIP. We have not yet decided on the final name. See below for more details.

EMBASSY - PHYLIPNEW

The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.61 (August 2004).

The PHYLIPNEW versions of these programs all have the prefix "f" to distinguish them from the original programs. Although we take care to check that the EMBOSS versions will give the same results as the original programs, we recommend that if the results are used for publication you should check that you get the same results with both for your specific inputs.

Program nameAuthor(s)Description
fcliqueJoe Felsenstein Largest clique program
fconsenseJoe Felsenstein Majority-rule and strict consensus tree
fcontmlJoe Felsenstein Continuous character Maximum Likelihood method
fcontrastJoe Felsenstein Continuous character Contrasts
fdiscbootJoe Felsenstein Bootstrapped discrete sites algorithm
fdnacompJoe Felsenstein DNA compatibility algorithm
fdnadistJoe Felsenstein Nucleic acid sequence Distance Matrix program
fdnainvarJoe Felsenstein Nucleic acid sequence Invariants method
fdnamlJoe Felsenstein Estimates phylogenies from nucleic acid sequence Maximum Likelihood
fdnamlkJoe Felsenstein Estimates phylogenies from nucleic acid sequence Maximum Likelihood with molecular clock
fdnamoveJoe Felsenstein Interactive DNA parsimony
fdnaparsJoe Felsenstein DNA parsimony algorithm
fdnapennyJoe Felsenstein Penny algorithm for DNA
fdollopJoe Felsenstein Dollo and polymorphism parsimony algorithm
fdolmoveJoe Felsenstein Interactive Dollo and Polymorphism Parsimony
fdolpennyJoe Felsenstein Penny algorithm Dollo or polymorphism
fdrawgramJoe Felsenstein Plots a cladogram- or phenogram-like rooted tree diagram
fdrawtreeJoe Felsenstein Plots an unrooted tree diagram
ffactorJoe Felsenstein Multistate to binary recoding program
ffitchJoe Felsenstein Fitch-Margoliash and Least-Squares Distance Methods
ffreqbootJoe Felsenstein Bootstrapped sequences algorithm
fgendistJoe Felsenstein Compute genetic distances from gene frequencies
fkitschJoe Felsenstein Fitch-Margoliash method with contemporary tips
fmixJoe Felsenstein Mixed parsimony algorithm
fmoveJoe Felsenstein Interactive mixed method parsimony
fneighborJoe Felsenstein Phylogenies from distance matrix by N-J or UPGMA method
fparsJoe Felsenstein Discrete character parsimony
fpennyJoe Felsenstein Penny algorithm, branch-and-bound to find all most parsimonious trees
fpromlJoe Felsenstein Protein maximum Likelihood program
fpromlkJoe Felsenstein Protein maximum Likelihood program with molecular clock
fprotdistJoe Felsenstein Protein distance algorithm
fprotparsJoe Felsenstein Protein parsimony algorithm
frestbootJoe Felsenstein Bootstrapped sequences algorithm
frestdistJoe Felsenstein compute distance matrix from restriction sites or fragments
frestmlJoe Felsenstein Restriction site maximum Likelihood method
fretreeJoe Felsenstein Interactive tree rearrangement
fseqbootJoe Felsenstein Bootstrapped sequences algorithm
fseqbootallJoe Felsenstein Bootstrapped sequences algorithm
ftreedistJoe Felsenstein Distances between trees
ftreedistpairJoe Felsenstein Distances between trees

EMBASSY - DOMAINATRIX

The DOMAINATRIX programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a work in progress.

Program nameAuthor(s)Description
scopparseRFCGR Reads raw SCOP classification files and writes a DCF file.
cathparseRFCGR Reads raw CATH classification files and writes a DCF file.
domainresoRFCGR Removes low resolution domains from a DCF file.
domainseqsRFCGR Adds sequence records to a DCF file.
domainnrRFCGR Removes redundant domains from a DCF file. The file must contain domain sequence information which can be added by using DOMAINSEQS.
domainsseRFCGR Adds secondary structure records to a DCF file.
ssematchRFCGR Searches a DCF file for secondary structure matches. The file must contain domain secondary structure information which can be added by using DOMAINSEQS.

EMBASSY - DOMALIGN

The DOMALIGN programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a work in progress.

Program nameAuthor(s)Description
domainrepRFCGR Reorder DCF file so that the representative structure of each user-specified node is given first.
domainalignRFCGR Generates structure-based sequence alignments for nodes in a DCF file.
seqalignRFCGR Reads a DAF file and a DHF and writes a DAF file extended with the hits.
allversusallRFCGR Does an all-versus-all global alignment for each set of sequences in an input directory and writes files of sequence similarity values.

EMBASSY - DOMSEARCH

The DOMSEARCH programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a work in progress.

Program nameAuthor(s)Description
seqsearchRFCGR Generate DHF files of database hits (sequences) from a DAF file (or other file of sequences) by using PSI-BLAST.
seqfraggleRFCGR Removes fragments from DHF files (or other files of sequences).
seqsortRFCGR Reads DHF files of database hits (sequences) and removes hits of ambiguous classification.
seqnrRFCGR Removes redundancy from DHF files (or other files of sequences).
seqwordsRFCGR Generates DHF files of database hits (sequences) from Swissprot matching keywords from a keywords file.

EMBASSY - SIGNATURE

The SIGNATURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a work in progress.

Program nameAuthor(s)Description
siggenRFCGR Generates a sparse protein signature from an alignment and residue contact data.
sigscanRFCGR Generates a DHF of hits (sequences) from scanning a signature against a sequence database.
libgenRFCGR Generates various type of discriminator for each alignment in a directory.
libscanRFCGR Generates hits (sequences in a domain hits file) from searches of various types of discriminator (HMMs, profiles etc) against a sequence database. Or generates hits from screening sequences against a library of such discriminators.
roconRFCGR Reads a DHF file of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a "hits file" for the hits, which are classified and rank-ordered on the basis of score.
rocplotRFCGR A generic and flexible tool for interpretation and graphical display of the performance of predictive methods using receiver Operator Characteristic (ROC) analysis.
matgen3dRFCGR Generates a 3D-1D scoring matrix from CCF files (clean coordinate files).
siggenligRFCGR Generates ligand-binding signatures from a CON file (contacts file) of residue-ligand contacts.
sigscanligRFCGR Generates a LHF (ligand hits file) of hits (sequences) from scanning a sequence against a library of ligand-binding signatures

EMBASSY - STRUCTURE

The STRUCTURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a work in progress.

Program nameAuthor(s)Description
pdbparseRFCGR Parses PDB files and writes CCF files for proteins.
pdbplusRFCGR Add records for residue solvent accessibility and secondary structure to a CCF file.
domainerRFCGR Reads CCF files for proteins and writes CCF files for domains in a DCF file.
contactsRFCGR Reads CCF files and writes CON files of intra-chain residue-residue contact data.
interfaceRFCGR Reads protein CCF files and writes CON files of inter-chain residue-residue contact data.
sitesRFCGR Reads CCF files and writes CON files of residue-ligand contact data for domains in a DCF file.
hetparseRFCGR Converts raw dictionary of heterogen groups to EMBL-like format.
pdbtospRFCGR Convert raw swissprot:PDB equivalence file to EMBL-like format.

EMBASSY - RECONSTRUCT

Reconstruct is a research project by Damian Counsell at the Rosalind Franklin Centre for Genomics Research in Hinxton, UK.

Program nameAuthor(s)Description
alignrunnerDamian Counsell, RFCGR Runs alignment program on all sequence pairs in a directory
comparatorDamian Counsell, RFCGR Compare contact scores of two sequence alignments
contactalignDamian Counsell, RFCGR EMBOSS implementation of Damian Counsell's 2.5-D alignment algorithm
contactcountDamian Counsell, RFCGR Count specific versus non-specific contacts
degapseqrunnerDamian Counsell, RFCGR Runs degapseq program on all sequence pairs in a directory
nawalignDamian Counsell, RFCGR Damian Counsell's implementation for protein sequences of the Needleman and Wunsch alignment algorithm
nawalignrunnerDamian Counsell, RFCGR Runs nawalign alignment program on all sequence pairs in a directory
needlerunnerDamian Counsell, RFCGR Runs needle alignment program on all sequence pairs in a directory
scorecmapdirDamian Counsell, RFCGR Contact scores for cleaned protein chain contact files
scorerDamian Counsell, RFCGR Scores accuracy of protein-protein sequence alignment against gold standard structure-structure alignment
scorerrunnerDamian Counsell, RFCGR Runs scorer to compare ordered pairs of substituted seqs in two directories and write the scores to a third
substituteDamian Counsell, RFCGR Substitutes matches from first (query) sequence of two aligned sequences in a trace into second (template sequence)
substituterunnerDamian Counsell, RFCGR Runs substitute on directory of traces and writes substituted sequences to another

EMBASSY - HMMER

The HMMER programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.1.1 (converted in August 2001).

The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs. Although we take care to check that the EMBOSS versions will give the same results as the original programs, we recommend that if the results are used for publication you should check that you get the same results with both for your specific inputs.

Program nameAuthor(s)Description
ealistatSean Eddy Statistics for multiple alignment files
ehmmalignSean Eddy Align sequences with an HMM
ehmmbuildSean Eddy Build HMM
ehmmcalibrateSean Eddy Calibrate a hidden Markov model
ehmmconvertSean Eddy Convert between HMM formats
ehmmemitSean Eddy Extract HMM sequences
ehmmfetchSean Eddy Extract HMM from a database
ehmmindexSean Eddy Index an HMM database
ehmmpfamSean Eddy Align single sequence with an HMM
ehmmsearchSean Eddy Search sequence database with an HMM

EMBASSY - OTHERS

Program nameAuthor(s)Description
emnuRFCGRSimple menu of EMBOSS applications
esim4Liliana Florea Align an mRNA to a genomic DNA sequence
memeTimothy Bailey Motif detection
mseSangerConversion of Will Gilbert's MSE editor
topoSangerConversion of Susan Jean Johns' TOPO
crystalballEBIAnswers every drug discovery question you have about this sequence