The complete list of command line options supported by
WU-BLAST 2.0 is provided in the
tables below.
The information presented here
comprises their definitive description.
This information should be considered valid only
for the current (most recent) version of the software.
If you find an inconsistency between the advertised
behavior and the actual behavior of the software,
first be sure you are using the latest version,
as indicated by the date of the latest release shown at
http://blast.wustl.edu.
If the inconsistency persists after upgrading,
please report it to
If you wish to continue using an older version of the software
instead of upgrading,
please consult the copy of parameters.html
that came bundled with that version;
it may be more accurate for your purposes than the on-line documentation.
For most of the options, a logical
diagram
indicates where each has its effect.
When this web page can not be conveniently accessed,
terse descriptions of most items may be obtained by entering the relevant
BLAST program name on the command line without any arguments.
A copy of this parameters.html
web page is bundled
with the licensed software, as well.
Where differences arise between the bundled file and the on-line version,
they may be due to differences in the software at the time of release.
The most recent version of the page you are viewing is located
here.
A PDF version of the page is available
here.
Command line options for the obsolete WU- and NCBI-BLAST version 1.4, first released in 1994, often apply unchanged to WU-BLAST 2.0, which yields a high degree of upward compatibility. While BLAST 1.4 is many years old now, if you are interested in it, e.g., for reasons of reproducing prior results, please see the BLAST 1.4 manual page in PDF format.
Aside from the first two command line arguments (database name and query filename), which are required items, the WU-BLAST search programs support a flexible syntax for command line options and parameters. Parsing of the command line is generally alphabetical case-independent. A leading hyphen (-) is unnecessary on option names but may improve its human readability. Parameter values can optionally be specified using an equals sign (=). Combined use of hyphens and equals signs is allowed and does not need to be consistently applied throughout a given command line. Large integer values can be specified using floating point representation (e.g., 1e9 instead of 1000000000). For parameters with single letter names, neither a hyphen nor an equals sign is necessary.
The basic command line syntax is:
<program> <database> <query> [options...]
where <program> is one of
blastp
,
blastn
,
blastx
,
tblastn
and
tblastx
;
<database>
is the name of the database to search
(previously formatted with xdformat
);
<query> is the name of a file containing one or more query
sequences in FASTA format;
and [options...] is a list of zero or more command line options
and parameter settings.
As examples of the command line flexibility available, each of the following command lines are valid and equivalent:
blastp nr myquery.aa v=10 b=100 filter=seg e=1e-10 nogaps blastp nr myquery.aa V=10 B=100 filter=seg E=1e-10 nogaps blastp nr myquery.aa -V=10 -B=100 -filter=seg -E=1e-10 -nogaps blastp nr myquery.aa -V10 -B100 -filter seg -E1e-10 -nogaps blastp nr myquery.aa -V10 -B100 -filter "seg" -E1e-10 -nogaps blastp nr myquery.aa -V 10 -B 100 -filter seg -E 1e-10 -nogaps blastp nr myquery.aa V 10 B 100 filter seg E 1e-10 nogaps blastp nr myquery.aa -v10 B=100 FILTER=seg -e=1e-10 -nogaps
Option | Description |
altscore= "score_spec" |
alter individual scores or entire rows or columns
of scores in a scoring matrix,
without editing the scoring matrix file itself.
Score_spec is a quoted character string
consisting of three components, each separated by white space:
(1) a letter in the query sequence alphabet;
(2) a letter in the subject sequence alphabet;
(3) the new pairwise score to be assigned to the alignment
of these two letters.
If the query (subject) letter is specified
as the special word any,
the altered score will be assigned
to the entire column (row) of the scoring matrix.
If the indicated score is the special word min (max),
the new assigned score will be the minimum (maximum) score observed
in the matrix.
If the score is given as na,
the alignment of the indicated letters will be not allowed,
effectively assigning to them an infinite negative score.
Multiple altscore options can be specified
on a given command line.
As an example of the option's use, to assign an alignment score
of zero (0) to the presence of a stop codon
in either the query or database sequence,
these two specifications can be used together:
altscore="* any 0" altscore="any * 0" .
See also: matrix ,
M and N .
|
B= <b> |
set the maximum number of database sequences for which any alignments will be reported to b.
The default limit is 250.
The maximum number of alignments that may be saved and reported per
database sequence is governed by other parameters.
See also: V ,
hspmax ,
gspmax ,
spoutmax and
noseqs .
|
bottom |
used to restrict the search of a nucleotide sequence
to the bottom (-) strand.
In the TBLASTX search mode, where both query and subject
are nucleotide sequences, the bottom option only affects
the query sequence.
See also: top ,
dbtop ,
dbbottom and
qframe .
|
C= <gcid> |
use the indicated genetic code to translate the query
sequence in the BLASTX and TBLASTX search modes.
gcid is a numerical identifier for the desired code.
A list of the genetic codes and their
identifiers is displayed if C=list is specified
on an otherwise syntactically correct command line.
(Example: blastx foo foo c=list ).
In the TBLASTN search mode, the C parameter can be
substituted for the dbgcode parameter.
The available genetic codes are: 1. Standard* 2. Vertebrate Mitochondrial 3. Yeast Mitochondrial 4. Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma 5. Invertebrate Mitochondrial 6. Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear 9. Echinoderm Mitochondrial 10. Euplotid Nuclear 11. Bacterial and Plant Plastid 12. Alternative Yeast Nuclear 13. Ascidian Mitochondrial 14. Flatworm Mitochondrial 15. Blepharisma Macronuclear 16. Chlorophycean Mitochondrial 21. Trematode Mitochondrial 22. Scenedesmus obliquus mitochondrial 23. Thraustochytrium mitochondrial 1001. Codon2004 *The default genetic code (1). Specify the desired genetic code by its number.The Codon2004 code provides preliminary support
for a draft alphabet for working precisely
with each of the 64 possible codons,
rather than mapping the codons to the usual 20 common amino acids.
Scoring matrix files to use the Codon2004 alphabet
with a translated query sequence in BLASTX should be
placed in a subdirectory named ca ,
located parallel to the usual aa and nt
subdirectories of the matrix directory.
For use in TBLASTN searches, the scoring matrix should reside
in an ac subdirectory;
and for TBLASTX searches, the subdirectory should be cc .
(Notice the use of the letter “c” for the codon alphabet,
the letter “a” for the amino acid alphabet,
and the query-subject ordering of the two letters to create
the subdirectory name).
For “codon-ized” scoring matrices derived from the BLOCKS database
and appropriate for use “as is” with TBLASTX,
please go
here.
For more information about the Codon2004 alphabet, please see
this.
See also: dbgcode .
|
cdb |
force nucleotide sequence databases to be searched in their compressed form.
This option is only effective in the BLASTN search mode for word lengths ≥ 7.
Users should generally avoid specifying this option themselves,
letting the software decide when to employ this search strategy.
See also: ucdb .
|
compat1.3 |
perform a BLAST version 1.3-style search (no gaps and significance estimated using Poisson statistics),
but with bug fixes, performance enhancements and new options available.
See also: compat1.4 .
|
compat1.4 |
perform a BLAST version 1.4-style search (no gaps in the alignments),
but with bug fixes, performance enhancements and new options available.
See also: compat1.3 .
|
consistency |
turn off the determination of “consistent” sets of HSPs, effectively lumping all HSPs found for a given database sequence into one set. Use of this option also disables a combinatorial adjustment that is otherwise made to the Sum and Poisson statistics to account for the consistent arrangement of the HSPs out of all possible relative arrangements. This option has no effect if Sum or Poisson statistics are not being used. |
cpus= <n> |
request that n processors or threads be employed for the search.
The default behavior is to employ as many threads as there are
processors in the computer system
(to a maximum of 4 threads for BLASTN searches).
This default may be altered by setting a specific value for cpus
in a system-wide file named /etc/sysblast ;
see the sysblast.sample example file included in
WU BLAST 2.0 software distributions for further information.
NOTE:
Memory consumption increases linearly with the number of threads;
the actual number of threads employed may be automatically reduced
by the software if memory resources are seen to be limiting.
|
ctxfactor= <c> |
set the “context factor” that is
used as a Bonferroni correction in the statistics to c,
to account for the number of contexts searched.
Each distinct reading frame-to-reading frame or strand-to-strand combination
between query and subject sequences constitutes one “context”.
Thus, one context exists in a BLASTP search,
as many as two contexts (because of the two distinct strand combinations) exist in a BLASTN search,
up to 6 contexts (one for each reading frame) exist in a BLASTX or TBLASTN search,
and up to 6x6 = 36 contexts exist in a TBLASTX search.
The maximum default value for ctxfactor then is 1 for BLASTP,
2 for BLASTN, 6 for BLASTX and TBLASTN,
and 36 for TBLASTX.
Restricting a search to a single strand of the query and/or database
reduces the number of contexts accordingly for that search.
More accurately, however,
the contribution of any given context to the default value
for ctxfactor is the fraction of residues in the query
(or reading frame of the query) that are unambiguous (up to a maximum value of 1.0).
(N.B. this fraction is computed after any optional filtering
has been applied to the query).
The default ctxfactor is then merely the sum of these fractions for every context involved in the search.
The software should normally be allowed to set the value of this parameter itself, unless the user has a compelling reason to change it. One rationale for explicitly setting a value for ctxfactor
might be to ensure a constant value is used in the statistics
across multiple searches,
where the results from the searches need to be examined
and compared for their statistical significance on an common basis.
|
dbbottom |
used to restrict the search to the bottom (-) strand of all database sequences.
See also: dbtop ,
top ,
bottom and
qframe .
|
dbchunks= <nchunks> |
establishes the granularity of the database, as it is divided into
slices for assignment to individual threads,
to make more efficient use of all CPUs when multiple CPUs
are employed for a given search.
Higher values are appropriate when the database contains relatively
few sequences and/or when the sequences vary greatly in length,
composition or content (e.g., genomic contigs).
Lower values are appropriate when the database contains many
sequences of comparable length
(e.g., the EST division of GenBank).
The minimum assignable value is the number of threads employed,
but this setting is ill-advised;
the optimal value for any given search type is likely to be
a large multiple of the number of threads employed
(although it need not be an exact multiple).
When searching mammalian genomic contigs, a good value may be 1000.
The default value is 500.
Users generally need not be concerned with this parameter. |
dbgcode= <gcid> |
use the indicated genetic code to translate database
sequences in the TBLASTN and TBLASTX search modes.
gcid is a numerical identifier for the desired code.
A list of the genetic codes and their
identifers is displayed if dbgcode=list is specified
on an otherwise syntactically correct command line.
(Example: tblastn foo foo dbgcode=list ).
See also: C .
|
dbrecmax= <last_record> |
search the database until last_record,
where database records are numbered starting with 1.
By default, databases are searched completely.
If last_record is greater than the actual
number of records in the database, the database is simply
searched until its end.
It is an error for the requested last_record to be
less than the first record requested to be searched in the database.
Records in virtual databases are numbered with respect to the
entire virtual database.
See also: dbrecmin .
|
dbrecmin= <first_record> |
search the database beginning at first_record,
where database records are numbered starting with 1.
By default, databases are searched completely.
It is an error for the requested first_record to be
greater than the last record requested to be searched in the database
(re: the dbrecmax parameter)
or to point beyond the end of the database.
Records in virtual databases are numbered with respect to the
entire virtual database.
See also: dbrecmax .
|
dbslice= m/n
dbslice= a-b/n |
at run time, for expressions of the form m/n, logically divide the database into n equivalent-sized slices and search only the mth slice, where 1 ≤ m ≤ n ≤ 100000. Alternatively, for expressions of the form a-b/n, search slices a through b (inclusive), where 1 ≤ a ≤ b ≤ n. Slice size is determined solely by the number of sequence records contained within and is not a function of sequence length. This can produce significant disparities in the workload associated with different slices, which may be alleviated by randomizing the order of sequences in the database before formatting for BLAST. In distributed computing environments, when the same, large database is to be searched repeatedly, overall throughput will likely benefit from consistently assigning the same slice(s) to the same client nodes for each search; improved efficiency results from the file caching activity that is typically performed by operating systems when the database files are first read from disk or over a network. Logically breaking the database into slices at run time means that each client node need only have sufficient unused memory as to be able to cache its assigned slice(s), not the entire database, and that the database need not be physically divided and reformatted into many smaller sub-databases whenever the number of available client nodes changes. |
dbtop |
used to restrict the search to the top (+) strand of all database sequences.
See also: dbbottom ,
top ,
bottom and
qframe .
|
E= <e> |
set the expectation threshold for reporting database hits to e.
A database sequence will only be reported if an ascribed E-value
for at least one of its alignments (or groups of alignments)
is ≤ E .
Lower E-values are more significant (less likely to occur by chance).
The default threshold is E=10 ,
such that if the search algorithm exhibited 100% sensitivity
and the statistics applied perfectly to the sequences being studied,
results involving 10 database sequences would be reported merely by chance.
See also: S .
|
E2= <e> |
set the expectation threshold for saving ungapped HSPs to e.
In the initial, ungapped alignment phase of a search,
individual HSPs will only be saved for further use
if their score is ≥ S2 ,
where the default value of S2 is computed from E2 .
The default value for E2 varies between BLAST search modes;
the resultant value for S2 will depend on the scoring system, as well.
If both E2 and S2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
See also: gapE2 ,
S2 and
gapS2 .
|
echofilter |
display the query sequence in the BLAST report, after all hard masks have been applied.
See also: filter and lcfilter .
|
endgetenv |
ignore any subsequent getenv options found on the command line during left-to-right parsing.
See also: endputenv ,
getenv and putenv .
|
endputenv |
for security in WWW server installations, where the command line may sometimes be left open to users,
ignore any subsequent putenv options found on the command line during left-to-right parsing.
See also: endgetenv ,
getenv and putenv .
|
errors |
suppress all ERROR messages. These messages should rarely, if ever, arise and indicate severe conditions (typically internal software bugs) that should be given immediate attention. When they do arise, parsers may break.
If any ERROR s arise with this option, the number SUPPRESSED will be reported
at the end of the search.
|
evalues |
report E-values (expectations) instead of P-values (probabilities) in the initial one-line descriptions section of output.
See also: pvalues .
|
filter= <filter> |
“hard mask” the query sequence using the specified filter.
The filter program may alter the sequence in composition but not in length.
For protein-level searches (BLASTP, BLASTX, TBLASTN and TBLASTX), the supported filter programs include:
seg and xnu .
For nucleotide-level (BLASTN) searches, supported filter programs include:
dust and seg .
If multiple filter specifications are made on the command line, their results are logically OR-ed.
filter=none causes any earlier specifications (to the left) on the command line to be ignored.
NOTE: By default, no filtering is performed. Arbitrary user-defined filter programs can be utilized, if their input and output are sequences in FASTA/Pearson format and if input/output are tied to stdin/stdout. The location of filter programs is governed by the BLASTFILTER
environment variable, which can be set to a colon-delimited list of directories that the BLAST programs will successively examine to find filters.
See also: wordmask ,
lcfilter ,
lcmask and
echofilter .
|
gapall |
effectively generate a gapped alignment for every ungapped HSP found (up to hspmax ). This is the default behavior.
See also: gapE .
|
gapdecayrate= <r> |
define r to be the common ratio of the terms in a geometric progression used in altering probabilities as a function of the number of Poisson events involved (typically the number of “consistent” HSPs in a set), according to a method suggested by Phil Green. An initial Poisson probability for n HSPs is weighted by the quantity Tn, which is itself the reciprocal of the nth term in the progression tn = (1-r)rn-1. The default value for r is 0.5, such that the default weights are successively T1=2, T2=4, T3=8, T4=16, and so on. These weights provide a conservative Bonferroni correction to the probabilities, in case multiple trials are performed in determining which set of HSPs yields the lowest P-value for a given database sequence. That the geometric progression contains an infinite number of terms allows it to satisfy the need for any number of tests (and weights), when this number is unknown prior to the search. |
gapE= <gapE> |
generate gapped alignments for all HSPs between sequences whose expected frequency of chance occurrence is ≤ gapE.
Default value is gapE= infinity — i.e., gapall is in effect.
See also: gapall ,
|
gapE2= <e> |
set the E-value for saving gapped HSPs to e.
In the secondary, gapped alignment phase of a search,
individual gapped HSPs will only be saved for further use
if their score is ≥ gapS2 ,
where the default gapS2 is computed from gapE2 .
The default value for gapE2 varies between BLAST search modes;
the resultant gapS2 will depend on the scoring system, as well.
If both gapE2 and gapS2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
See also: gapS2 ,
E2 and
S2 .
|
gapH= <h> |
set the value of the relative entropy, H, used in evaluating the statistical significance of gapped alignment scores.
See also H .
|
gapK= <k> |
set the value of the extreme value statistics K parameter
(Karlin and Altschul, 1990)
used in evaluating the significance of gapped alignment scores.
Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also K .
|
gapL= <lambda> |
use lambda for the value of the
λ parameter in the extreme value statistics
used to evaluate the significance of gapped alignment scores
(Altschul and Gish, 1996).
Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also: L .
|
gaps |
produce gapped alignments (the default behavior),
negating the effect of any previously specified nogaps option.
See also: nogaps and gapall .
|
gapS2= <s> |
set the score threshold for saving gapped HSPs to s.
In the secondary, gapped alignment phase of a search,
individual gapped HSPs will only be saved for further use
if their score is ≥gapS2 .
The default score threshold is computed from gapE2
and will depend on the scoring system.
If both E2 and S2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
See also: gapE2 ,
E2 and
S2 .
|
gapW= <gapW> |
set the window width (or band width) within which gapped alignments are computed by dynamic programming (default is gapW=32 for protein comparisons, gapW=16 for BLASTN). Note: gapW is the full bandwidth, not the half-width. |
gapX= <x> |
set the drop-off score for gapped alignment extensions to x.
Gapped extension of ungapped HSPs found
between query and subject sequences
continues until the cumulative alignment score deteriorates
from the maximum value seen thusfar by a quantity gapX or more.
The default value for gapX is the score associated with 10 bits
of significance (2-10 < 10-3 probability) for protein-level searches or 20 bits
of significance (2-20 < 10-6 probability)
for nucleotide-level (BLASTN) searches.
Higher values for gapX will increase sensitivity at the expense
of run time.
See also: X and
gapW .
|
getenv= "NAME" |
display the value of the environment variable named NAME. This may be useful for verifying that the settings of environment variables
on a web server or in an analysis pipeline have been propagated all the way to the BLAST search program.
See also: endgetenv ,
putenv and endputenv .
|
gi |
report NCBI “gi” (GenInfo) identifiers for sequences,
when present in sequence definition lines.
Normally these identifiers are suppressed from output,
but they represent one of the best, stable identifiers available
for the GenBank/EMBL/DDBJ databases
(with ACCESSION.VERSION being the other stable identifier).
|
globalexit |
when processing a file containing multiple query sequences,
if any of them encounters a FATAL error,
then after all queries have been processed,
append the line "EXIT CODE 12 " to the output and provide a testable
exit status 12;
if the exit status is 0 or if the last line of output is not "EXIT STATUS 12 ", then it can be assumed that all queries succeeded.
To determine whether all queries succeeded without this option,
the output would need to be scanned for instances of EXIT CODE
with a non-zero argument.
With the globalexit option, scanning of the output
is only necessary when one wishes to identify the specific query (or queries)
that failed and what the individual reason codes were.
See also: haltonfatal .
|
golfraction= <g> |
maximum fractional length of overlap, g, of two gapped alignments for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed. The default value is 0.125 (maximum 12.5% of the length from either end of either HSP).
For any given pair of HSPs, the more restrictive of golfraction
and golmax is used.
See also: golmax ,
olfraction ,
and
olmax .
|
golmax= <len> |
set the maximum permitted length of overlap (in residues), len, of two gapped alignments for their joint (Sum or Poisson) probability to be computed. The default is unlimited length, with the maximum extent of overlap being governed only by the golfraction parameter.
See also: golfraction ,
olfraction ,
and
olmax .
|
gspmax= <gspmax> |
establish gspmax as the maximum number of GSPs (gapped HSPs)
to report per subject sequence or pairwise sequence comparison.
If more than gspmax GSPs are found,
only the best-scoring GSPs are retained for subsequent processing and reporting.
The setting of gspmax will have no effect
if the nogaps option is specified or
if the setting of hspmax is more restrictive.
The default value for gspmax is 0, which implies no limit.
See also: hspmax , spoutmax .
NOTE: the B and V options limit the number
of subject sequences for which any results whatsoever are reported,
regardless of the number of HSPs or GSPs found.
|
H= <h> |
use h for the value of the relative entropy, H,
when computing the statistics of ungapped alignments.
NOTE: In BLAST 1.4 and earlier, the H option was used to invoke the display of a histogram of search results; this functionality is no longer supported.)
See also: gapH .
|
haltonfatal |
when processing a file containing multiple query sequences, use this option to
halt further processing at the first occurrence of a FATAL error.
Processing will otherwise resume with the next query sequence
when a FATAL error arises.
See also: globalexit .
|
hitdist= <hitdist> |
invoke a 2-hit BLAST algorithm similar to (but more sensitive and efficient than) that of
Altschul et al. (1997),
with the maximum distance between word hits along the same diagonal of <hitdist> residues, for seeding ungapped extensions.
Altschul et al. (1997)
use the equivalent of hitdist=40
in the BLASTP, BLASTX, TBLASTN and TBLASTX search modes.
In WU BLASTN, setting hitdist= W and wink= W,
where W is the word length, is akin to using double-length words generated
on W-mer boundaries.
NOTE: In protein-level comparisons, for best sensitivity (or the best sensitivity for the amount of memory used), 2-hit BLAST is not recommended. See also: wink .
|
hspmax= <hspmax> |
establishes hspmax as the maximum number of ungapped HSPs
that will be saved per subject sequence or pairwise sequence comparison.
Saved HSPs are then fed to the gapped alignment phase of the program
or are statistically evaluated
if gapped alignments are not to be performed.
If more than hspmax HSPs are found,
only the best-scoring HSPs are retained for subsequent processing.
The default value is 1000; a value of 0 signifies no limit. See also: gspmax and
spoutmax .
NOTE: This usage of hspmax is subtly,
but importantly,
different from the parameter's classical interpretation,
wherein all ungapped HSPs that satisfied the S2 score threshold
were saved; hspmax merely limited
the number of HSPs (gapped or ungapped) that would be reported.
The new interpretation was instituted to provide
vastly improved speed on large problems,
while imparting no effect on small problems
and many medium-sized problems.
The new behavior can help guard against horrendously slow searches
resulting from an inadvertent omission of a low-complexity filter.
Adverse effects on sensitivity may be obtained, however,
if every HSP is sacred.
To restore classical behavior, specify hspmax=0 .
As a compromise between sensitivity and speed, set a higher
value than the default.
NOTE: the B and V options limit the number
of database or subject sequences for which any results are reported,
regardless of the number of HSPs or GSPs found.
|
hspsepQmax= <d> |
maximum allowed separation along the query sequence between two HSPs (gapped or ungapped) that will be clustered into a “consistent” set. Distance is measured here in units of residues at the level of the actual sequence comparison — i.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the query sequence is significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of the query sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered. |
hspsepSmax= <d> |
maximum allowed separation along the subject (database) sequence between two HSPs (gapped or ungapped) that will be clustered into a consistent set. Distance is measured here in units of residues at the level of the actual sequence comparison — i.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the database contains sequences significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of a subject sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered. |
K= <k> |
set the value for extreme value statistics K parameter
(Karlin and Altschul, 1990)
used in computing the statistics of ungapped alignments.
See also: gapK .
|
kap |
use basic Karlin and Altschul (1990) statistics on individual alignment scores (i.e., do not evaluate the joint probability of multiple consistent HSP scores, such as with Poisson or Karlin and Altschul (1993) “Sum” statistics); in order to be reported, each HSP must pass the significance test on its own; these basic statistics are an option in all search modes. |
L= <lambda> |
use lambda for the value of the
λ parameter in the extreme value statistics
(Karlin and Altschul, 1990)
used in computing the statistics of ungapped alignments.
See also: gapL .
|
lcfilter |
replace any lower case letters in the input query sequence
with the appropriate ambiguity code for “any” residue
(N for nucleotide sequences; X for protein sequences).
See also: lcmask ,
filter ,
wordmask and
echofilter .
|
lcmask |
when generating the neighborhood word list for the query sequence,
do not process any portions of the query that were represented
in lower case letters in the input file.
Lower case letters in the query sequence remain unchanged
by this “soft masking” procedure and can therefore participate in alignments
seeded by word hits that occur in flanking regions.
See also: lcfilter ,
wordmask ,
filter ,
maskextra and
echofilter .
|
links |
report consistent link information for each alignment, indicating the set of “consistent” alignments used in joint statistical
significance calculations.
Links information appears on its own line for each HSP
and begins with the keyword Links .
Each HSP involving the query and a given subject sequence
is numbered from 1 to n,
where n is the total number of HSPs reported
for the pair of sequences.
When the links option is specified,
the current HSP number is enclosed in parenthese.
For example, the links information for an HSP might look like the following, where the HSP number 1 enclosed in parentheses indicates that this information accompanied the first HSP reported for the given subject sequence. It is evident in this example that a total of at least 8 HSPs were reported for the subject sequence (re: the 8 in the links list), but only 3 consistent HSPs (numbers 8, 2 and 1, in that order) were involved in obtaining the Sum statistics P-value of 0.15. Score = 72 (30.4 bits), Expect = 0.16, Sum P(3) = 0.15 Identities = 41/174 (23%), Positives = 74/174 (42%) Links = 8-2-(1)NOTE: While all link lists describe sets of consistent HSPs, unless one of the topcomboN
or topcomboE options is used,
only the list reported for HSPs in the most significant set
for each subject sequence is guaranteed to represent the
precise set of HSPs for which the joint statistics were computed;
all other link lists often do correctly describe the set of HSPs
involved but may have one or more missing or extraneous HSPs.
See also: hspsepQmax ,
hspsepSmax ,
topcomboE and
topcomboN .
|
M= <m> |
set the positive reward score for matching nucleotides in the BLASTN
search mode to m, with default value +5.
For compatibility with earlier versions of BLAST, in search modes other than BLASTN, the M option is synonymous with the
matrix option.
To provide a fully specified scoring matrix to BLASTN,
the matrix option itself must be used.
See also: N ,
matrix and
altscore .
|
maskextra= <extra> |
soft mask for an additional extra letters
to each side of regions that are soft masked by the
lcmask and wordmask options.
This reduces the incidence of high scoring alignments
in low-complexity regions that would be
initiated by spurious word hits
in otherwise unmasked flanking regions.
See also: wordmask ,
lcmask and
lcfilter .
|
matrix= <name> |
use the 2-dimensional matrix named name to score residue pairs
in gapped and ungapped alignments.
The default matrix for protein-level searches is BLOSUM62
(Henikoff and Henikoff, 1992).
For BLASTN searches,
the default scoring matrix is computed
dynamically from a +5/-4 match/mismatch scoring system
which can be altered using the M and N parameters.
BLASTN can also use fully specified scoring matrices
of the user's own design,
by providing the name of the matrix with the matrix option.
After unpacking the software, see the matrix/nt subdirectory
for some examples of nucleotide scoring matrices.
NOTE: matrices need not be symmetric about their major diagonal. The row-column format of a matrix corresponds to query-subject letter pairs. See also: altscore ,
M and N .
|
mformat= <m>[,outfile] |
used to select an output format by numerical identifier, m, and optionally
the name of the file where the output should be written, outfile.
Multiple formats may be chosen for simultaneous output during a single search,
as long as a different outfile is indicated for each format.
If no outfile is specified, either standard output
(stdout )
or the setting of the O option (if set) is used.
At most one mformat specification on a given command line
may lack an outfile.
If outfile contains any white space (e.g., blanks or tabs),
the entire token should be enclosed in quotes, to prevent command line interpreters
from breaking it into separate arguments.
The various output formats available are displayed if mformat=list is specified
on an otherwise syntactically correct command line.
(Example: blastp foo foo mformat=list ).
Setting mformat=0 clears any mformat
specification(s) appearing to the left on the command line.
Depending on the output format, some command line options cause additional elements to appear; these options include: topcomboN , topcomboE and links .
The available choices for m and their associated formats are:
list output this list and halt;
*Formats that are subject to change or removal without notice. See also:msgstyle ,
O and
xmlcompact .
|
mmio |
turns off the use of memory-mapped I/O when reading database files.
Use of this option will usually slow the search, particularly when multiple processors are being used, but it serves both to demonstrate the effectiveness of this form of I/O and to validate the associated I/O routines. Note that no special daemon or support programs (such as the old memfile program) are required to take full advantage of memory-mapped I/O.
When running 32-bit versions of the BLAST software,
the mmio option might free up important virtual address
space for use as working storage or heap memory.
For the vast majority of users, this option should never be used. |
msgstyle= <n> |
used to select by numerical identifier, n, the style of informatory messages to produce (i.e., NOTE s, WARNING s, etc.)
The available choices for n and their associated styles are: 0 => line-wrapped (default) 1 => single-line with the query sequence identifier embedded (if available) |
N= <n> |
set the negative penalty score for mismatching nucleotides
in the BLASTN search mode to n, with default value -4.
See also: M ,
matrix , and
altscore .
|
nogaps |
do not create gapped alignments, in essence reverting to WU BLAST 1.4 behavior
See also: gaps and gapall .
|
nonnegok |
Do not abort processing with a FATAL error when the expected score
is non-negative.
Formally, for Karlin-Dembo-Altschul statistics to apply to the
evaluation of the alignment scores found during a search,
the expected score for a sequence having the same residue composition
as the query must be negative, but this condition does not always
hold with unusual scoring matrices or query sequences.
Use the novalidctxok option to cause the search to proceed
even under these unusual conditions.
See also: novalidctxok and shortqueryok .
|
nosegs |
do not segment the query sequence on hyphens (-).
By default, hyphens in the query sequence create insurmountable
barriers for sequence alignment.
As an example of where this feature is useful,
multiple contigs may be concatenated together into one sequence
with a hyphen separating each contig;
no alignment will then extend beyond a contig boundary.
CAUTION: do not confuse this option with the similarly appearing noseqs option.
|
noseqs |
produce abbreviated output by omitting the sequence alignments.
The result is often correctly interpretable by parsers of normal output.
CAUTION: do not confuse this option with the similarly appearing nosegs option.
|
notes |
suppress all NOTE messages. Important recommendations from the software may be missed if this option is used.
If any NOTE s arise with this option, the number SUPPRESSED will be reported at the end of the search.
See also: warnings .
|
novalidctxok |
do not treat it as a FATAL error when none of the “contexts”
(e.g., strands or reading frames) of the query are valid.
A valid context is one in which the threshold score for saving
alignments can be achieved under ideal circumstances (typically
if an alignment of 100% identity were to be found).
See also: nonnegok and shortqueryok .
|
nwlen= <len> |
generate neighborhood words (or seed words) starting from
the beginning of the query sequence (or from the location specified
with the nwstart parameter) and continuing
for the distance len or to the end of the sequence,
whichever comes first.
While this parameter can be used to restrict the region in which
word hits occur for seeding ungapped alignments (and indirectly gapped alignments),
it does not restrict alignments from extending beyond this region.
See also: nwstart .
|
nwstart= <start> |
generate neighborhood words (or seed words) starting from
coordinate position start in the query sequence and continuing
to the end of the sequence (or for the distance specified with the nwlen parameter).
While this parameter can be used to restrict the region in which
word hits occur for seeding ungapped alignments (and indirectly gapped alignments),
it does not restrict alignments from extending beyond this region.
See also: nwlen .
|
O= <outfile> |
output results to the file named outfile instead of standard output (stdout ).
|
olfraction= <f> |
set the maximum fractional length of overlap, f, of two ungapped alignments
for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed.
The default f is 0.1 (maximum 10% of the length from either end of either HSP).
For any given pair of HSPs, the more restrictive of olfraction
and olmax is used.
See also: golfraction ,
golmax ,
and
olmax .
|
olmax= <len> |
set the maximum permitted length of overlap (in residues), len, of two ungapped alignments
for their joint (Sum or Poisson) probability to be computed.
The default is unlimited length, with the maximum extent of overlap being governed only
by the olfraction parameter.
See also: golfraction ,
golmax ,
and
olfraction .
|
pingpong |
Perform additional work to help ensure the alignments produced are locally optimal. This option typically adds 3-10% to the execution time, without affecting the results. Only rarely is an alignment and its associated score improved, for the additional time consumed by using this option. |
poissonp |
use Poisson statistics (Karlin and Altschul, 1990) to compute joint P-values of consistent sets of alignments; Poisson statistics are an option in all search modes. |
postsw |
perform full Smith-Waterman alignment of sequences and re-rank the database matches accordingly prior to output (currently supported in BLASTP only) |
progress= <s> |
provide an indication that the search is alive by outputting an asterisk (“*”) every s seconds during a search,
if some other indication of activity has not been provided in the mean time.
Such “keepalive” indicators may be useful when the software
is invoked over a network connection.
The default behavior
(obtained with progress=0 )
is only to report the actual progress made through the database,
using periods (“.”) and reports of percentages.
|
prune |
do not prune HSP lists, but instead report all HSPs, even
those that were not involved
in satisfying the statistical significance threshold necessary
for reporting the database sequence.
NOTE: When the default Sum statistics are used,
the normal pruning activity is robust;
when Poisson statistics are used,
some HSPs may get through the pruning process and be reported
that were not involved
in satisfying the statistical significance threshold.
See also: span ,
span1 and
span2 .
|
putenv= "NAME=VALUE" |
in the local environment to the BLAST search program, set the environment variable named NAME to the value VALUE.
See also: endgetenv ,
endputenv and getenv .
|
pvalues |
report P-values (the default) in the initial one-line descriptions section of output.
See also: evalues .
|
Q= <q> |
set the penalty for a gap of length one to q (default Q=9 for proteins; Q=10 for BLASTN).
See also: R . |
qframe= <f> |
search with the query sequence translated in the single reading frame f.
This parameter is useful for speeding up a search and improving both the
biological and statistical significance of the findings,
when the reading frame of a translation product in the query
is known in advance,
such as when the query sequence entails a complete ORF.
Reading frames on the top (plus) strand of the query are numbered 1, 2, 3;
reading frames on the bottom (minus) strand are numbered -1, -2, -3.
See also: top ,
bottom ,
dbtop and
dbbottom .
|
Qoffset= <i> |
adjust all query sequence coordinates in the output by the fixed quantity i (default 0). |
qrecmax= <n> |
in a multi-sequence query file, end database searches with the query sequence numbered n. |
qrecmin= <m> |
in a multi-sequence query file, start database searches using the query sequence numbered m. Record are numbered starting with 1. |
qres |
treat as a FATAL error when the query sequence contains any invalid residue codes.
By default, WARNING s are issued for invalid residue codes,
which are then skipped.
|
qtype |
treat as a FATAL error if the query sequence appears from its letter composition to be of the wrong type (peptide or nucleotide).
|
R= <r> |
set the per-residue penalty for extending a gap to r (default R=2 for proteins; R=10 for BLASTN)
See also: Q .
|
restest |
causes the Bonferroni corrections used in computing
statistical significance to depend
upon the length in residues of each database sequence
relative to the total number of residues in the database.
restest
is the default database-size correction method
in the BLASTN, TBLASTN, and TBLASTX search modes.
See also: seqtest .
|
S= <s> |
set the score-equivalence threshold for reporting database hits to s.
Hits for a database sequence will only be reported
if the statistical significance ascribed to one of its similar
regions (or groups of similar regions)
is at least as high as that of a single alignment with score
S .
By default, S is not actually used;
all findings are compared against the E threshold.
E and S are interchangeable, however,
through standard Karlin-Dembo-Altschul statistics,
with any setting of S implying
an expectation threshold of its own
(E=KNe-λS).
If both E and S are specified
on the command line, the one corresponding to the more restrictive (lower)
E is used.
See also: E ,
gapS2 ,
and
S2 .
|
S2= <s> |
set the score threshold for saving ungapped HSPs to s.
In the initial, ungapped alignment phase of a search,
individual HSPs will only be saved for further use
if their score is ≥S2 .
The default score threshold is computed from E2
and will depend on the scoring system.
If both E2 and S2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
gapS2 ,
E2 and
gapE2 .
|
seqtest |
causes the Bonferroni corrections used in computing
statistical significance to depend upon
the number of sequences in the database.
seqtest is the default database-size correction method
in the BLASTP and BLASTX search modes.
NOTE: For backward compatibility with legacy BLAST software, in all search modes, including BLASTP and BLASTX, if the Z option is specified,
Z is expected to be expressed in units of residues,
unless seqtest is also specified.
See also: restest and
Z .
|
shortqueryok |
do not treat it as a FATAL error when the query sequence is
shorter than the BLAST algorithm word length.
See also: novalidctxok and
nonnegok . |
Soffset= <i> |
adjust all subject sequence coordinates in the output by the fixed quantity i (default 0). |
sort_by_count |
sort database sequences from highest to lowest by the number of HSPs identified.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
sort_by_highscore |
sort database sequences from highest to lowest by the highest HSP score found.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
sort_by_pvalue |
sort database sequences from lowest to highest by their best P-value.
Multiple sort_by* options may be specified
and take precedence in the order specified.
sort_by_pvalue is the default primary sort key.
|
sort_by_subjectlength |
sort database sequences from longest to shortest.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
sort_by_totalscore |
sort database sequences from highest to lowest by the sum total score of all HSPs found.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
span |
retain HSPs (ungapped or gapped) regardless of whether they
span or are spanned by any other HSP.
When this option is specified, memory requirements may increase
dramatically to accommodate an increased number of HSPs that must
be tracked, particularly when the sequences being compared
contain short periodicity repeats and low complexity regions.
See also: span1 and span2 .
|
span1 |
discard an HSP (ungapped or gapped) when it spans or is spanned by
another HSP along either the query or the subject sequence (or both).
When a pair of such HSPs is found, the one with the lowest score
is discarded;
if their scores are equal, the longer, less information-dense HSP is discarded.
See also: span and span2 .
|
span2 |
discard an HSP (ungapped or gapped) when it spans or is spanned by
another HSP along both the query and subject sequences.
When a pair of such HSPs is found, the one with the lowest score
is discarded; if their scores are equal, the longer, less information-dense HSP is discarded.
span2 is the default behavior.
See also: span and span1 .
|
spoutmax= <spoutmax> |
establishes spoutmax
as the maximum number of segment pairs to report
in program output per subject sequence or pairwise comparison,
independent of the number of HSPs or GSPs actually found and evaluated.
If more than spoutmax segment pairs are found,
the segment pairs are sorted by the criteria in effect
for the search and only the first spoutmax
segment pairs will be reported.
The setting of spoutmax will have no effect
if either hspmax or gspmax
is more restrictive.
The default value for spoutmax is 0,
which signifies no limit.
See also: hspmax and
gspmax .
|
stats |
gather a variety of statistics about the search (e.g., the number of word hits in each reading frame, the highest score observed, etc.) and report them in the output. Use of this option marginally impacts search speed. |
sump |
use Sum statistics (Karlin and Altschul, 1993) to compute joint P-values of consistent sets of alignments; the use of Sum statistics is the default behavior in all search modes. |
T= <t> |
set the neighborhood word score threshold for the ungapped BLAST algorithm to t.
For a given word of length W in the query sequence,
its neighborhood words are defined as the set of words
that have scores ≥ T when aligned with it.
Neighborhood words become the seed words used to find ungapped alignments
by the BLAST algorithm.
Lower values for T tend to yield a larger neighborhood,
more potential seed words, and improved sensitivity for lower scoring alignments,
but at the expense of increased memory use and run time.
Higher values for T will yield a smaller (possibly empty) neighborhood word list and faster execution, at the expense of reduced sensitivity.
The default T varies with the scoring matrix, word length, and
between search modes.
For improved sensitivity and to obtain behavior that better satisfies user expectations,
identical words are included with neighborhood words in the list
of potential seeds,
if their score is positive but happens to be less than T .
No neighborhood words (only exactly matching words) are used by default in the BLASTN search mode; however, neighborhood words can be used even by BLASTN if a value for T is specified on the BLASTN command line.
CAUTION: for the long word lengths typically employed
with BLASTN, the memory required
for neighborhood words can easily be prohibitive and may only be
practical for shorter sequences.
|
top |
used to restrict the search of a nucleotide sequence
to the top (+) strand.
In the TBLASTX search mode, where both query and subject
are nucleotide sequences, the top option only affects
the query sequence.
See also: bottom ,
dbtop ,
dbbottom and
qframe .
|
topcomboE= <Eratio> |
Eratio is the maximum ratio of Ecurrent/Ebest for which
the current “topcombo” group of consistent (colinear) local alignments will be reported
for a given database sequence.
The "best" group is reported in the output as "Group = 1"
and tends to be the most statistically significant.
The default behavior is to impose no limit on this ratio, in which case all topcombo groups satisfying E are reported (up to a maximum of topcomboN groups, if specified).
See also: links and
topcomboN .
|
topcomboN= <n> |
report at most n “topcombo” groups of consistent (colinear) local alignments (HSPs).
Each local alignment is allowed to be a member of only one group.
Use of this option causes the addition of a "Group = #" indicator
in the output for each HSP.
Groups of HSPs tend to be assembled in decreasing order of statistical
significance.
Members of the most significant group thus tend to be reported
with "Group = 1".
See also: links and
topcomboE .
|
ucdb |
force nucleotide sequence databases to be searched in their uncompressed form,
with any-and-all ambiguity codes in place.
This option is only effective in the BLASTN search mode for word lengths ≥ 7.
Users should generally avoid specifying this option themselves,
letting the software decide when to employ this search strategy.
This option can increase sensitivity when ambiguity codes
are present in database sequences,
at the expense of memory and possibly speed.
Searching the uncompressed database is the only available behavior
for word lengths < 7.
This option offers improved sensitivity only when searching databases in XDF format that contain ambiguity codes.
The option is accepted by the software but offers no improvement in sensitivity for databases in the earlier BLAST 1.4 database format.
See also: cdb .
|
V= <v> |
set the maximum number of one-line descriptions of significant
database sequences to report in the first section of program output to v.
The default limit is 500.
See also: B .
|
W= <w> |
set the seed word length for the ungapped BLAST algorithm to w. The default word length for protein-level searches is 3 amino acids; for BLASTN searches, the default length is 11 nucleotides. Shorter word lengths may increase sensitivity, at the expense of increased run time. In all search modes, the acceptable range of word lengths is 1 ≤ w ≤ 1024. |
warnings |
suppress all WARNING messages.
CAUTION: important advisories may be missed if this option is used; however, if any WARNING situations should arise,
the number SUPPRESSED will be reported at the end of the search.
See also: notes .
|
wink= <wink> |
generate word hits at every winkth residue position along the query,
where the default wink=1 produces neighborhood words at every position.
For best sensitivity, wink should not be adjusted.
Wink settings greater than 1 are best used to find identical or nearly identical sequences more rapidly.
When used in conjunction with the hitdist option
to obtain the highest search speed, care should be taken that desirable alignments are not precluded
by these parameters.
The wink parameter is only available in the licensed 2.0 software.
NOTE: When using BLASTN to search compressed nucleotide sequence databases in their compressed form, an increase in speed (and concommitant decrease in sensitivity) will not be observed unless wink is set to a value
greater than the compression ratio, which is usually 4.
CAUTION: With versions of BLASTN prior to [15-Oct-2004], similarity of any length and even 100% identity can be missed when searching compressed nucleotide sequence databases in their compressed form, if wink is set to an even integer value.
This is simply due to the likelihood of a phase mismatch between the
compressed form of the query and the database sequence.
Assigning odd values to wink can avoid such phase mismatches.
The best solution, though, is to update to a newer version of BLASTN.
Versions of WU BLAST dated [15-Oct-2004] and later automatically
avoid the phase mismatch problem, so users need not be concerned.
If you are using the wink option with BLASTN
and are not running a more recent version, please update!
|
wordmask= <filter> |
“soft mask” the query sequence using the indicated filter.
A copy of the query sequence is passed through the filter program
and any letters converted by it to ambiguity codes
are skipped during neighborhood word or seed word generation.
Unlike the filter option,
the query sequence itself remains unaltered and available for alignment.
Usage of the wordmask parameter is otherwise identical to that of filter ,
with the same set of filtering methods available for use.
See also: filter ,
lcmask ,
lcfilter and
maskextra .
|
wstrict |
when searching a nucleotide database sequence
that contains one or more ambiguous residues,
require that every ungapped alignment found during the initial, ungapped phase of a search
actually contain an identical word hit (in the usual case of BLASTN usage)
or neighborhood word hit (in the case of TBLASTN and TBLASTX).
The wstrict option has no effect whatsoever on BLASTX
and has no effect on BLASTP when gapped alignments (the default)
are to be produced.
When ungapped alignments are the desired end product from BLASTP
(i.e., the -nogaps option is specified),
wstrict will prevent the software from exhaustively
searching diagonals that are found to contain HSPs in an effort
to find other HSPs that would not be seeded by neighborhood word hits.
|
X= <x> |
set the drop-off score for the ungapped BLAST algorithm to x.
Ungapped extension of initial neighborhood word hits or seed word hits
between the query and subject sequences
continues until the cumulative alignment score deteriorates
from the maximum value seen thusfar during the extension by a quantity X or more.
The default value for X is the score associated with 10 bits
of significance (2-10 < 10-3 probability) for protein-level searches or 20 bits
of significance (2-20 < 10-6 probability)
for nucleotide-level (BLASTN) searches.
Higher values for X will increase sensitivity at the expense
of run time, but with both typically diminishing rapidly in their rate of change.
See also: gapX .
|
xmlcompact |
omit newline and white space characters normally reported between
entities in XML documents produced with mformat=7 .
Their purpose is merely to improve the human readability of a document
when using XML-ignorant viewers, but these characters often comprise
a substantial fraction of the bytes in a document and are completely
extraneous for the purposes of automated parsing and viewing
with XML-aware software.
See also: mformat .
|
Y= <y> |
set the effective length of the querY sequence (in units of residues) used in statistical significance calculations to y. |
Z= <z> |
set the effective size of the database (databaZe) used in statistical significance calculations to z.
Unless overridden by the seqtest option,
the unit of measure is residues.
If seqtest is specified,
the unit of measure is sequences.
See also: restest and
seqtest .
|
Last modified: 2005-09-13
Return to the WU BLAST Archives home page
Copyright © 2004-2005 by Warren R. Gish, Saint Louis, Missouri 63108 USA. All rights reserved.