43 lines
1.5 KiB
HTML
43 lines
1.5 KiB
HTML
<! $Id: classes-format.5,v 1.3 2007/12/19 22:08:05 stolcke Exp $>
|
|
<HTML>
|
|
<HEADER>
|
|
<TITLE>classes-format</TITLE>
|
|
<BODY>
|
|
<H1>classes-format</H1>
|
|
<H2> NAME </H2>
|
|
classes-format - File format for word class definitions
|
|
<H2> SYNOPSIS </H2>
|
|
<PRE>
|
|
<I>class</I> [<I>p</I>] <I>word1</I> <I>word2</I> ...
|
|
</PRE>
|
|
<H2> DESCRIPTION </H2>
|
|
Various programs dealing with word classes use this format to define
|
|
the posssible expansions of classes and their respective probabilities.
|
|
Each expansion appears on a separate line as in
|
|
the synopsis, where
|
|
<I> class </I>
|
|
names a word class,
|
|
<I> p </I>
|
|
gives the probability for the class expansion, and
|
|
<I> word1 word2 ... </I>
|
|
defines the word string that the class expands to.
|
|
If
|
|
<I> p </I>
|
|
is omitted it is assumed to be 1.
|
|
(All expansion probabilities for a given class should sum to one,
|
|
although this is not necessarily enforced by the software and would
|
|
lead to improper models.)
|
|
<P>
|
|
Note that the concept of word class here is generalized to include
|
|
``multi-words'', or phrases consisting of more than one word.
|
|
All expansions must have at least one word.
|
|
Certain models might impose more restrictive formats.
|
|
<H2> SEE ALSO </H2>
|
|
<A HREF="ngram.1.html">ngram(1)</A>, <A HREF="ngram-class.1.html">ngram-class(1)</A>, <A HREF="disambig.1.html">disambig(1)</A>, <A HREF="training-scripts.1.html">training-scripts(1)</A>, <A HREF="pfsg-scripts.5.html">pfsg-scripts(5)</A>.
|
|
<H2> AUTHOR </H2>
|
|
Andreas Stolcke <stolcke@speech.sri.com>.
|
|
<BR>
|
|
Copyright 1999 SRI International
|
|
</BODY>
|
|
</HTML>
|