HMMER Manual hmmpfam(1) NAME hmmpfam - search one or more sequences against an HMM database SYNOPSIS hmmpfam [_o_p_t_i_o_n_s] _h_m_m_f_i_l_e _s_e_q_f_i_l_e DESCRIPTION hmmpfam reads a sequence file _s_e_q_f_i_l_e and compares each sequence in it, one at a time, against all the HMMs in _h_m_m_f_i_l_e looking for significantly similar sequence matches. _h_m_m_f_i_l_e will be looked for first in the current working directory, then in a directory named by the environment variable _H_M_M_E_R_D_B. This lets administrators install HMM library(s) such as Pfam in a common location. There is a separate output report for each sequence in _s_e_q_f_i_l_e. This report consists of three sections: a ranked list of the best scoring HMMs, a list of the best scoring domains in order of their occurrence in the sequence, and alignments for all the best scoring domains. A sequence score may be higher than a domain score for the same sequence if there is more than one domain in the sequence; the sequence score takes into account all the domains. All sequences scoring above the -_E and -_T cutoffs are shown in the first list, then _e_v_e_r_y domain found in this list is shown in the second list of domain hits. If desired, E- value and bit score thresholds may also be applied to the domain list using the --_d_o_m_E and --_d_o_m_T options. OPTIONS -h Print brief help; includes version number and summary of all options, including expert options. -n Specify that models and sequence are nucleic acid, not protein. Other HMMER programs autodetect this; but because of the order in which hmmpfam accesses data, it can't reliably determine the correct "alphabet" by itself. -A <_n> Limits the alignment output to the <_n> best scoring domains. -A0 shuts off the alignment output and can be used to reduce the size of output files. HMMER @RELEASE@ Last change: @RELEASEDATE@ 1 HMMER Manual hmmpfam(1) -E <_x> Set the E-value cutoff for the per-sequence ranked hit list to <_x>, where <_x> is a positive real number. The default is 10.0. Hits with E-values better than (less than) this threshold will be shown. -T <_x> Set the bit score cutoff for the per-sequence ranked hit list to <_x>, where <_x> is a real number. The default is negative infinity; by default, the threshold is controlled by E-value and not by bit score. Hits with bit scores better than (greater than) this thres- hold will be shown. -Z <_n> Calculate the E-value scores as if we had seen a sequence database of <_n> sequences. The default is arbitrarily set to 59021, the size of Swissprot 34. EXPERT OPTIONS --acc Report HMM accessions instead of names in the output reports. Useful for high-throughput annotation, where the data are being parsed for storage in a relational database. --compat Use the output format of HMMER 2.1.1, the 1998-2001 public release; provided so 2.1.1 parsers don't have to be rewritten. --cpu <_n> Sets the maximum number of CPUs that the program will run on. The default is to use all CPUs in the machine. Overrides the HMMER_NCPU environment variable. Only affects threaded versions of HMMER (the default on most systems). --cut_ga Use Pfam GA (gathering threshold) score cutoffs. Equivalent to --globT --domT , but the GA1 and GA2 cutoffs are read from each HMM in _h_m_m_f_i_l_e indi- vidually. hmmbuild puts these cutoffs there if the alignment file was annotated in a Pfam-friendly align- ment format (extended SELEX or Stockholm format) and the optional GA annotation line was present. If these HMMER @RELEASE@ Last change: @RELEASEDATE@ 2 HMMER Manual hmmpfam(1) cutoffs are not set in the HMM file, --cut_ga doesn't work. --cut_tc Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to --globT --domT , but the TC1 and TC2 cut- offs are read from each HMM in _h_m_m_f_i_l_e individually. hmmbuild puts these cutoffs there if the alignment file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and the optional TC annotation line was present. If these cutoffs are not set in the HMM file, --cut_tc doesn't work. --cut_nc Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT --domT , but the NC1 and NC2 cutoffs are read from each HMM in _h_m_m_f_i_l_e individually. hmmbuild puts these cutoffs there if the alignment file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and the optional NC annotation line was present. If these cutoffs are not set in the HMM file, --cut_nc doesn't work. --domE <_x> Set the E-value cutoff for the per-domain ranked hit list to <_x>, where <_x> is a positive real number. The default is infinity; by default, all domains in the sequences that passed the first threshold will be reported in the second list, so that the number of domains reported in the per-sequence list is consistent with the number that appear in the per-domain list. --domT <_x> Set the bit score cutoff for the per-domain ranked hit list to <_x>, where <_x> is a real number. The default is negative infinity; by default, all domains in the sequences that passed the first threshold will be reported in the second list, so that the number of domains reported in the per-sequence list is consistent with the number that appear in the per-domain list. _I_m_p_o_r_t_a_n_t _n_o_t_e: only one domain in a sequence is abso- lutely controlled by this parameter, or by --domT. The second and subsequent domains in a sequence have a de facto bit score threshold of 0 because of the details of how HMMER works. HMMER requires at least one pass through the main model per sequence; to do more than one pass (more than one domain) the multidomain align- ment must have a better score than the single domain HMMER @RELEASE@ Last change: @RELEASEDATE@ 3 HMMER Manual hmmpfam(1) alignment, and hence the extra domains must contribute positive score. See the Users' Guide for more detail. --forward Use the Forward algorithm instead of the Viterbi algo- rithm to determine the per-sequence scores. Per-domain scores are still determined by the Viterbi algorithm. Some have argued that Forward is a more sensitive algo- rithm for detecting remote sequence homologues; my experiments with HMMER have not confirmed this, how- ever. --informat <_s> Assert that the input _s_e_q_f_i_l_e is in format <_s>; do not run Babelfish format autodection. This increases the reliability of the program somewhat, because the Babelfish can make mistakes; particularly recommended for unattended, high-throughput runs of HMMER. Valid format strings include FASTA, GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the User's Guide for a complete list. --null2 Turn off the post hoc second null model. By default, each alignment is rescored by a postprocessing step that takes into account possible biased composition in either the HMM or the target sequence. This is almost essential in database searches, especially with local alignment models. There is a very small chance that this postprocessing might remove real matches, and in these cases --null2 may improve sensitivity at the expense of reducing specificity by letting biased com- position hits through. --pvm Run on a Parallel Virtual Machine (PVM). The PVM must already be running. The client program hmmpfam-pvm must be installed on all the PVM nodes. The HMM database _h_m_m_f_i_l_e and an associated GSI index file _h_m_m_f_i_l_e.gsi must also be installed on all the PVM nodes. (The GSI index is produced by the program hmmindex.) Because the PVM implementation is I/O bound, it is highly recommended that each node have a local copy of _h_m_m_f_i_l_e rather than NFS mounting a shared copy. Optional PVM support must have been compiled into HMMER for --pvm to function. HMMER @RELEASE@ Last change: @RELEASEDATE@ 4 HMMER Manual hmmpfam(1) --xnu Turn on XNU filtering of target protein sequences. Has no effect on nucleic acid sequences. In trial experi- ments, --xnu appears to perform less well than the default post hoc null2 model. SEE ALSO Master man page, with full list of and guide to the indivi- dual man pages: see hmmer(1). A User guide and tutorial came with the distribution: Userguide.ps [Postscript] and/or Userguide.pdf [PDF]. Finally, all documentation is also available online via WWW: http://hmmer.wustl.edu/ AUTHOR This software and documentation is: @COPYRIGHT@ HMMER - Biological sequence analysis with profile HMMs Copyright (C) 1992-1999 Washington University School of Medicine All Rights Reserved This source code is distributed under the terms of the GNU General Public License. See the files COPYING and LICENSE for details. See the file COPYING in your distribution for complete details. Sean Eddy HHMI/Dept. of Genetics Washington Univ. School of Medicine 4566 Scott Ave. St Louis, MO 63110 USA Phone: 1-314-362-7666 FAX : 1-314-362-7855 Email: eddy@genetics.wustl.edu HMMER @RELEASE@ Last change: @RELEASEDATE@ 5