156 lines
		
	
	
		
			5.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
		
		
			
		
	
	
			156 lines
		
	
	
		
			5.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
|   | <! $Id: nbest-pron-score.1,v 1.8 2019/09/09 22:35:37 stolcke Exp $> | ||
|  | <HTML> | ||
|  | <HEADER> | ||
|  | <TITLE>nbest-pron-score</TITLE> | ||
|  | <BODY> | ||
|  | <H1>nbest-pron-score</H1> | ||
|  | <H2> NAME </H2> | ||
|  | nbest-pron-score - score pronunciations and pauses in N-best hypotheses | ||
|  | <H2> SYNOPSIS </H2> | ||
|  | <B>nbest-pron-score</B> [ <B>-help</B> ] <I>option</I> ... | ||
|  | </PRE> | ||
|  | <H2> DESCRIPTION </H2> | ||
|  | <B> nbest-pron-score </B> | ||
|  | reads N-best lists and computes log probability scores for the pronunciations | ||
|  | and pauses contained in them. | ||
|  | Pronunciation scoring requires that the N-best lists | ||
|  | contain phone backtraces in "NBestList2.0" | ||
|  | <A HREF="nbest-format.5.html">nbest-format(5)</A>. | ||
|  | <P> | ||
|  | Pronunciation scores are computed from the probabilities in a dictionary. | ||
|  | Pauses are binned into three length classes (none, short, long) and  | ||
|  | scored according to a trigram language model that conditions the pause length | ||
|  | on the left and right neighboring words, in that order (so that bigram | ||
|  | backoff uses the left neighbor only). | ||
|  | <H2> OPTIONS </H2> | ||
|  | <P> | ||
|  | Each filename argument can be an ASCII file, or a  | ||
|  | compressed file (name ending in .Z or .gz), or ``-'' to indicate | ||
|  | stdin/stdout. | ||
|  | <DL> | ||
|  | <DT><B> -help </B> | ||
|  | <DD> | ||
|  | Print option summary. | ||
|  | <DT><B> -version </B> | ||
|  | <DD> | ||
|  | Print version information. | ||
|  | <DT><B>-debug</B><I> level</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Controls the amount of output (the higher the | ||
|  | <I>level</I>,<I></I><I></I><I></I> | ||
|  | the more). | ||
|  | <DT><B> -tolower </B> | ||
|  | <DD> | ||
|  | Map all vocabulary to lowercase. | ||
|  | Useful if case conventions for text/counts and language model differ. | ||
|  | <DT><B> -multiwords </B> | ||
|  | <DD> | ||
|  | Deal with N-best lists containing multiwords joined by underscores. | ||
|  | This only affects pause scoring: if a word adjacent to a pause is  | ||
|  | a multiword and is not in the vocabulary of the pause LM, then it is split | ||
|  | and only the component closest to the pause is conditioned on. | ||
|  | <DT><B>-multi-char</B><I> C</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Character used to delimit component words in multiwords | ||
|  | (an underscore character by default). | ||
|  | <DT><B>-nbest</B><I> file</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Score the N-best hypothese in  | ||
|  | <I>file</I>.<I></I><I></I><I></I> | ||
|  | <DT><B>-rescore</B><I> file</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Same as  | ||
|  | <B>-nbest</B>.<B></B><B></B><B></B> | ||
|  | <DT><B>-nbest-files</B><I> file</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Process all N-best list filenames listed in  | ||
|  | <I>file</I>.<I></I><I></I><I></I> | ||
|  | <DT><B>-max-nbest</B><I> n</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Limits the number of hypotheses read from an N-best list. | ||
|  | Only the first | ||
|  | <I> n </I> | ||
|  | hypotheses are processed. | ||
|  | <DT><B>-dictionary</B><I> file</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Enable pronunciation scoring, using the pronunciation dictionary  | ||
|  | <I>file</I>.<I></I><I></I><I></I> | ||
|  | Each line contains a pronunciation in the format | ||
|  | <PRE> | ||
|  | 	<I>word</I> [<I>p</I>] <I>phone</I> ... | ||
|  | </PRE> | ||
|  | The optional value  | ||
|  | <I> p </I> | ||
|  | is the pronunciation probability. | ||
|  | If the second field in a line is not a number the pronunciation is assumed | ||
|  | to have probability one. | ||
|  | <DT><B> -intlogs </B> | ||
|  | <DD> | ||
|  | Interpret probabilities in the dictionary as intlog-scaled log probabilities | ||
|  | (as used in the SRI Decipher(TM) system), rather than straight probabilities. | ||
|  | <DT><B>-pause-lm</B><I> file</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Enable pause scoring, using the pause LM in | ||
|  | <I>file</I>.<I></I><I></I><I></I> | ||
|  | <DT><B>-no-pause</B><I> tag</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | The word used to represent the absence of a pause in the pause LM. | ||
|  | <DT><B>-short-pause</B><I> tag</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | The word used to represent a short pause in the pause LM. | ||
|  | <DT><B>-long-pause</B><I> tag</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | The word used to represent a long pause in the pause LM. | ||
|  | <DT><B>-min-pause-dur</B><I> T</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | The minimum duration, in seconds, for a non-speech region to be considered | ||
|  | a (short) pause. | ||
|  | <DT><B>-long-pause-dur</B><I> T</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | The duration, in second, above which a non-speech region is considered a | ||
|  | "long" pause. | ||
|  | </DD> | ||
|  | </DL> | ||
|  | <P> | ||
|  | The default values for pause tags and duration thresholds are printed by the | ||
|  | <B> -help </B> | ||
|  | option. | ||
|  | <DL> | ||
|  | <DT><B>-pron-score-dir</B><I> dir</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Write pronunciation scores to | ||
|  | <I>dir</I><I></I><I></I><I></I> | ||
|  | when processing multiple N-best lists, | ||
|  | using output filenames derived from the input files. | ||
|  | <DT><B>-pause-score-dir</B><I> dir</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Write pause scores to | ||
|  | <I>dir</I><I></I><I></I><I></I> | ||
|  | when processing multiple N-best lists, | ||
|  | using output filenames derived from the input files. | ||
|  | <DT><B>-pause-score-weight</B><I> W</I><B></B><I></I><B></B><I></I><B></B> | ||
|  | <DD> | ||
|  | Add pause LM scores to the pronunciation scores after multiplying them | ||
|  | by  | ||
|  | <I>W</I>.<I></I><I></I><I></I> | ||
|  | This creates a single weighted combination of both models. | ||
|  | Pause scores can still be output separately by specifying  | ||
|  | <B>-pause-score-dir</B>.<B></B><B></B><B></B> | ||
|  | </DD> | ||
|  | </DL> | ||
|  | <H2> SEE ALSO </H2> | ||
|  | <A HREF="nbest-format.5.html">nbest-format(5)</A>, <A HREF="nbest-scripts.1.html">nbest-scripts(1)</A>, <A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>, <A HREF="ngram.1.html">ngram(1)</A>. | ||
|  | <BR> | ||
|  | D. Vergyri, A. Stolcke, V. R. R. Gadde, L. Ferrer, & E. Shriberg, | ||
|  | ``Prosodic Knowledge Sources for Automatic Speech Recognition''. | ||
|  | <I>Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing</I>, | ||
|  | Hong Kong, April 2003. | ||
|  | <H2> BUGS </H2> | ||
|  | The binning of pause lengths into three classes should be generalized. | ||
|  | <H2> AUTHOR </H2> | ||
|  | Andreas Stolcke <stolcke@icsi.berkeley.edu>. | ||
|  | <BR> | ||
|  | Copyright (c) 2002-2008 SRI International | ||
|  | </BODY> | ||
|  | </HTML> |