competition update

2025-07-02 12:18:09 -07:00
parent 9e17716a4a
commit 77dbcf868f
2615 changed files with 1648116 additions and 125 deletions
--- a/language_model/srilm-1.7.3/man/cat1/segment-nbest.1
+++ b/language_model/srilm-1.7.3/man/cat1/segment-nbest.1
@@ -0,0 +1,171 @@
+segment-nbest(1)            General Commands Manual           segment-nbest(1)
+
+
+
+NNAAMMEE
+       segment-nbest  -  rescore and segment N-best lists using hidden segment
+       N-gram model
+
+SSYYNNOOPPSSIISS
+       egment-nbest [ --hheellpp ] _o_p_t_i_o_n ... _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t ...
+
+DDEESSCCRRIIPPTTIIOONN
+       sseeggmmeenntt--nnbbeesstt processes a series of consecutive  N-best  lists  from  a
+       speech recognizer and applies a hidden segment N-gram language model to
+       them.  The language model is a standard backoff N-gram  model  in  ARPA
+       nnggrraamm--ffoorrmmaatt(5)  modeling sentence segmentation using the boundary tags
+       <s> and </s>.  The program reads in all N-best lists  and  outputs  the
+       hypotheses  that have the highest aggregate (combined acoustic and lan-
+       guage model) score.  Hypothesized sentence boundaries are marked by <s>
+       tags.
+
+OOPPTTIIOONNSS
+       Each filename argument can be an ASCII file, or a compressed file (name
+       ending in .Z or .gz), or ``-'' to indicate stdin/stdout.
+
+       --hheellpp  Print option summary.
+
+       --vveerrssiioonn
+              Print version information.
+
+       --oorrddeerr _n
+              Set the maximal N-gram order to be used, by  default  3.   NOTE:
+              The  order  of  the  model is not set automatically when a model
+              file is read, so the same file can be used at various orders.
+
+       --ddeebbuugg _l_e_v_e_l
+              Set the debugging output level (0 means  no  debugging  output).
+              Debugging messages are sent to stderr.
+
+       --llmm _f_i_l_e
+              Read the N-gram model from _f_i_l_e.
+
+       --ttoolloowweerr
+              Map all vocabulary to lowercase.  Useful if case conventions for
+              N-best lists and language model differ.
+
+       --mmiixx--llmm _f_i_l_e
+              Read a second, standard N-gram model for interpolation purposes.
+
+       --llaammbbddaa _w_e_i_g_h_t
+              Set the weight of the main model when interpolating  with  --mmiixx--
+              llmm.  Default value is 0.5.
+
+       --bbaayyeess _l_e_n_g_t_h
+              Interpolate the second and the main model using posterior proba-
+              bilities  for  local  N-gram-contexts  of  length  _l_e_n_g_t_h.   The
+              --llaammbbddaa value is used as a prior mixture weight in this case.
+
+       --bbaayyeess--ssccaallee _s_c_a_l_e
+              Set  the  exponential  scale factor on the context likelihood in
+              conjunction with the --bbaayyeess function.  Default value is 1.0.
+
+       --nnbbeesstt--ffiilleess _l_i_s_t
+              Specifies a list of N-best files.  The file _l_i_s_t should  contain
+              a  list  of filenames, one per line, each corresponding to an N-
+              best file in one of the formats  described  in  nnbbeesstt--ffoorrmmaatt(5).
+              The  N-best  files should correspond to consecutive speech wave-
+              forms in the order listed.
+
+       --ffbb--rreessccoorree
+              Perform Forward-backward rescoring.  This generates  new  N-best
+              lists  as output whose LM scores reflect the posterior probabil-
+              ity of each hypothesis.   The  default  is  to  perform  Viterbi
+              rescoring and output only the best combined hypothesis.
+
+       --wwrriittee--nnbbeesstt--ddiirr _d_i_r
+              Write  rescored N-best lists to directory _d_i_r instead of to std-
+              out.  The filenames from the input are preserved.
+
+       --mmaaxx--nnbbeesstt _n
+              Limits the number of hypotheses read from each  N-best  list  to
+              the first _n.
+
+       --mmaaxx--rreessccoorree _m
+              Only  choose  among  the  top  _m  hypotheses of each list (after
+              reordering hypotheses, see below).  This is an effective way  to
+              limit  the quadratic computation of the Viterbi or forward/back-
+              ward dynamic programming.
+
+       --nnoo--rreeoorrddeerr
+              Do not reorder the hypotheses before limiting the computation to
+              the  top  _m.   By  default  the  hypotheses will first be sorted
+              according to the acoustic and language model scores recorded  in
+              the N-best lists.
+
+       --rreessccoorree--llmmww _w_e_i_g_h_t
+              Specifies  the  language  model  weight  to  be use in combining
+              acoustic and language model scores to select the  best  hypothe-
+              ses.
+
+       --rreessccoorree--wwttww _w_e_i_g_h_t
+              Specifies the word transition weight to be used in selecting the
+              best hypotheses.
+
+       --nnooiissee _n_o_i_s_e_-_t_a_g
+              Designate _n_o_i_s_e_-_t_a_g as a vocabulary item that is to  be  ignored
+              by the LM.  (This is typically used to identify a noise marker.)
+
+       --nnooiissee--vvooccaabb _f_i_l_e
+              Read  several  noise  tags from _f_i_l_e, instead of, or in addition
+              to, the single noise tag specified by --nnooiissee.
+
+       --ddeecciipphheerr--llmm _m_o_d_e_l_-_f_i_l_e
+              Designates the N-gram backoff model (typically  a  bigram)  that
+              was  used  by the Decipher(TM) recognizer in computing composite
+              scores.  Used to compute  acoustic  scores  from  the  composite
+              scores if the N-best lists are in "NBestList1.0" format.
+
+       --ddeecciipphheerr--llmmww _w_e_i_g_h_t
+              Specifies  the  language  model  weight  used by the recognizer.
+              Used to compute acoustic scores from the composite scores.
+
+       --ddeecciipphheerr--wwttww _w_e_i_g_h_t
+              Specifies the word transition weight  used  by  the  recognizer.
+              Used to compute acoustic scores from the composite scores.
+
+       --ssttaagg _s_t_r_i_n_g
+              Use _s_t_r_i_n_g to mark segment boundaries in the output.  Default is
+              the start-of-sentence  symbol  defined  in  the  language  model
+              (<s>).
+
+       --bbiiaass _b
+              Make  a  segment boundary a priori more likely by a factor of _b.
+              If _b is 0, the dynamic program algorithm is restricted to  never
+              consider  hidden  sentence  boundaries; this is useful when sseegg--
+              mmeenntt--nnbbeesstt is used merely for its ability to apply the LM across
+              N-best boundaries.
+
+       --ssttaarrtt--ttaagg _s_t_r_i_n_g
+              Insert a tag _s_t_r_i_n_g at the front of every N-best hypothesis read
+              in.
+
+       --eenndd--ttaagg _s_t_r_i_n_g
+              Insert a tag _s_t_r_i_n_g at the end of every N-best  hypothesis  read
+              in.   This  and  the  previous option are useful if the LM marks
+              acoustic waveform boundaries with a special tag.
+
+       sseeggmmeenntt--nnbbeesstt will also process any command  line  arguments  following
+       the  options as lists of N-best lists, as with the --nnbbeesstt--ffiilleess option.
+       Each _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t will be processed in turn, with individual  output
+       delimited by a line of the form
+            <nbestfile _n_b_e_s_t_-_f_i_l_e_-_l_i_s_t>
+
+SSEEEE AALLSSOO
+       ngram-count(1), segment(1), ngram-format(5), nbest-format(5).
+       A.  Stolcke,  ``Modeling  Linguistic Segment and Turn Boundaries for N-
+       best Rescoring of Spontaneous Speech,''  _P_r_o_c_.  _E_u_r_o_s_p_e_e_c_h,  2779-2782,
+       1997.
+
+BBUUGGSS
+       N-gram  models  of  arbitrary order can be used, but the context at the
+       beginning of a hypothesis never extends beyond the words from the  pre-
+       ceding N-best list.
+
+AAUUTTHHOORR
+       Andreas Stolcke <stolcke@icsi.berkeley.edu>
+       Copyright (c) 1997-2004 SRI International
+
+
+
+SRILM Tools              $Date: 2019/09/09 22:35:37 $         segment-nbest(1)