competition update
This commit is contained in:
171
language_model/srilm-1.7.3/man/cat1/segment-nbest.1
Normal file
171
language_model/srilm-1.7.3/man/cat1/segment-nbest.1
Normal file
@@ -0,0 +1,171 @@
|
||||
segment-nbest(1) General Commands Manual segment-nbest(1)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
segment-nbest - rescore and segment N-best lists using hidden segment
|
||||
N-gram model
|
||||
|
||||
SSYYNNOOPPSSIISS
|
||||
egment-nbest [ --hheellpp ] _o_p_t_i_o_n ... _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t ...
|
||||
|
||||
DDEESSCCRRIIPPTTIIOONN
|
||||
sseeggmmeenntt--nnbbeesstt processes a series of consecutive N-best lists from a
|
||||
speech recognizer and applies a hidden segment N-gram language model to
|
||||
them. The language model is a standard backoff N-gram model in ARPA
|
||||
nnggrraamm--ffoorrmmaatt(5) modeling sentence segmentation using the boundary tags
|
||||
<s> and </s>. The program reads in all N-best lists and outputs the
|
||||
hypotheses that have the highest aggregate (combined acoustic and lan-
|
||||
guage model) score. Hypothesized sentence boundaries are marked by <s>
|
||||
tags.
|
||||
|
||||
OOPPTTIIOONNSS
|
||||
Each filename argument can be an ASCII file, or a compressed file (name
|
||||
ending in .Z or .gz), or ``-'' to indicate stdin/stdout.
|
||||
|
||||
--hheellpp Print option summary.
|
||||
|
||||
--vveerrssiioonn
|
||||
Print version information.
|
||||
|
||||
--oorrddeerr _n
|
||||
Set the maximal N-gram order to be used, by default 3. NOTE:
|
||||
The order of the model is not set automatically when a model
|
||||
file is read, so the same file can be used at various orders.
|
||||
|
||||
--ddeebbuugg _l_e_v_e_l
|
||||
Set the debugging output level (0 means no debugging output).
|
||||
Debugging messages are sent to stderr.
|
||||
|
||||
--llmm _f_i_l_e
|
||||
Read the N-gram model from _f_i_l_e.
|
||||
|
||||
--ttoolloowweerr
|
||||
Map all vocabulary to lowercase. Useful if case conventions for
|
||||
N-best lists and language model differ.
|
||||
|
||||
--mmiixx--llmm _f_i_l_e
|
||||
Read a second, standard N-gram model for interpolation purposes.
|
||||
|
||||
--llaammbbddaa _w_e_i_g_h_t
|
||||
Set the weight of the main model when interpolating with --mmiixx--
|
||||
llmm. Default value is 0.5.
|
||||
|
||||
--bbaayyeess _l_e_n_g_t_h
|
||||
Interpolate the second and the main model using posterior proba-
|
||||
bilities for local N-gram-contexts of length _l_e_n_g_t_h. The
|
||||
--llaammbbddaa value is used as a prior mixture weight in this case.
|
||||
|
||||
--bbaayyeess--ssccaallee _s_c_a_l_e
|
||||
Set the exponential scale factor on the context likelihood in
|
||||
conjunction with the --bbaayyeess function. Default value is 1.0.
|
||||
|
||||
--nnbbeesstt--ffiilleess _l_i_s_t
|
||||
Specifies a list of N-best files. The file _l_i_s_t should contain
|
||||
a list of filenames, one per line, each corresponding to an N-
|
||||
best file in one of the formats described in nnbbeesstt--ffoorrmmaatt(5).
|
||||
The N-best files should correspond to consecutive speech wave-
|
||||
forms in the order listed.
|
||||
|
||||
--ffbb--rreessccoorree
|
||||
Perform Forward-backward rescoring. This generates new N-best
|
||||
lists as output whose LM scores reflect the posterior probabil-
|
||||
ity of each hypothesis. The default is to perform Viterbi
|
||||
rescoring and output only the best combined hypothesis.
|
||||
|
||||
--wwrriittee--nnbbeesstt--ddiirr _d_i_r
|
||||
Write rescored N-best lists to directory _d_i_r instead of to std-
|
||||
out. The filenames from the input are preserved.
|
||||
|
||||
--mmaaxx--nnbbeesstt _n
|
||||
Limits the number of hypotheses read from each N-best list to
|
||||
the first _n.
|
||||
|
||||
--mmaaxx--rreessccoorree _m
|
||||
Only choose among the top _m hypotheses of each list (after
|
||||
reordering hypotheses, see below). This is an effective way to
|
||||
limit the quadratic computation of the Viterbi or forward/back-
|
||||
ward dynamic programming.
|
||||
|
||||
--nnoo--rreeoorrddeerr
|
||||
Do not reorder the hypotheses before limiting the computation to
|
||||
the top _m. By default the hypotheses will first be sorted
|
||||
according to the acoustic and language model scores recorded in
|
||||
the N-best lists.
|
||||
|
||||
--rreessccoorree--llmmww _w_e_i_g_h_t
|
||||
Specifies the language model weight to be use in combining
|
||||
acoustic and language model scores to select the best hypothe-
|
||||
ses.
|
||||
|
||||
--rreessccoorree--wwttww _w_e_i_g_h_t
|
||||
Specifies the word transition weight to be used in selecting the
|
||||
best hypotheses.
|
||||
|
||||
--nnooiissee _n_o_i_s_e_-_t_a_g
|
||||
Designate _n_o_i_s_e_-_t_a_g as a vocabulary item that is to be ignored
|
||||
by the LM. (This is typically used to identify a noise marker.)
|
||||
|
||||
--nnooiissee--vvooccaabb _f_i_l_e
|
||||
Read several noise tags from _f_i_l_e, instead of, or in addition
|
||||
to, the single noise tag specified by --nnooiissee.
|
||||
|
||||
--ddeecciipphheerr--llmm _m_o_d_e_l_-_f_i_l_e
|
||||
Designates the N-gram backoff model (typically a bigram) that
|
||||
was used by the Decipher(TM) recognizer in computing composite
|
||||
scores. Used to compute acoustic scores from the composite
|
||||
scores if the N-best lists are in "NBestList1.0" format.
|
||||
|
||||
--ddeecciipphheerr--llmmww _w_e_i_g_h_t
|
||||
Specifies the language model weight used by the recognizer.
|
||||
Used to compute acoustic scores from the composite scores.
|
||||
|
||||
--ddeecciipphheerr--wwttww _w_e_i_g_h_t
|
||||
Specifies the word transition weight used by the recognizer.
|
||||
Used to compute acoustic scores from the composite scores.
|
||||
|
||||
--ssttaagg _s_t_r_i_n_g
|
||||
Use _s_t_r_i_n_g to mark segment boundaries in the output. Default is
|
||||
the start-of-sentence symbol defined in the language model
|
||||
(<s>).
|
||||
|
||||
--bbiiaass _b
|
||||
Make a segment boundary a priori more likely by a factor of _b.
|
||||
If _b is 0, the dynamic program algorithm is restricted to never
|
||||
consider hidden sentence boundaries; this is useful when sseegg--
|
||||
mmeenntt--nnbbeesstt is used merely for its ability to apply the LM across
|
||||
N-best boundaries.
|
||||
|
||||
--ssttaarrtt--ttaagg _s_t_r_i_n_g
|
||||
Insert a tag _s_t_r_i_n_g at the front of every N-best hypothesis read
|
||||
in.
|
||||
|
||||
--eenndd--ttaagg _s_t_r_i_n_g
|
||||
Insert a tag _s_t_r_i_n_g at the end of every N-best hypothesis read
|
||||
in. This and the previous option are useful if the LM marks
|
||||
acoustic waveform boundaries with a special tag.
|
||||
|
||||
sseeggmmeenntt--nnbbeesstt will also process any command line arguments following
|
||||
the options as lists of N-best lists, as with the --nnbbeesstt--ffiilleess option.
|
||||
Each _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t will be processed in turn, with individual output
|
||||
delimited by a line of the form
|
||||
<nbestfile _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t>
|
||||
|
||||
SSEEEE AALLSSOO
|
||||
ngram-count(1), segment(1), ngram-format(5), nbest-format(5).
|
||||
A. Stolcke, ``Modeling Linguistic Segment and Turn Boundaries for N-
|
||||
best Rescoring of Spontaneous Speech,'' _P_r_o_c_. _E_u_r_o_s_p_e_e_c_h, 2779-2782,
|
||||
1997.
|
||||
|
||||
BBUUGGSS
|
||||
N-gram models of arbitrary order can be used, but the context at the
|
||||
beginning of a hypothesis never extends beyond the words from the pre-
|
||||
ceding N-best list.
|
||||
|
||||
AAUUTTHHOORR
|
||||
Andreas Stolcke <stolcke@icsi.berkeley.edu>
|
||||
Copyright (c) 1997-2004 SRI International
|
||||
|
||||
|
||||
|
||||
SRILM Tools $Date: 2019/09/09 22:35:37 $ segment-nbest(1)
|
||||
Reference in New Issue
Block a user