181 lines
5.5 KiB
Groff
181 lines
5.5 KiB
Groff
.\" $Id: wlat-format.5,v 1.10 2019/02/06 09:53:12 stolcke Exp $
|
|
.TH wlat-format 5 "$Date: 2019/02/06 09:53:12 $" "SRILM File Formats"
|
|
.SH NAME
|
|
wlat-format \- File format for SRILM word posterior lattices
|
|
.SH SYNOPSIS
|
|
Word lattices:
|
|
.nf
|
|
\fBversion 2\fP
|
|
\fBname\fP \fIs\fP
|
|
\fBinitial\fP \fIi\fP
|
|
\fBfinal\fP \fIf\fP
|
|
\fBnode\fP \fIn\fP \fIw\fP \fIa\fP \fIp\fP \fIn1\fP \fIp1\fP \fIn2\fP \fIp2\fP ...
|
|
\&...
|
|
.fi
|
|
.PP
|
|
Word meshes (confusion networks):
|
|
.nf
|
|
\fBname\fP \fIs\fP
|
|
\fBnumaligns\fP \fIN\fP
|
|
\fBposterior\fP \fIP\fP
|
|
\fBalign\fP \fIa\fP \fIw1\fP \fIp1\fP \fIw2\fP \fIp2\fP ...
|
|
\fBreference\fP \fIa\fP \fIw\fP
|
|
\fBhyps\fP \fIa\fP \fIw\fP \fIh1\fP \fIh2\fP ...
|
|
\fBinfo\fP \fIa\fP \fIw\fP \fIstart\fP \fIdur\fP \fIascore\fP \fIgscore\fP \fIphones\fP \fIphonedurs\fP
|
|
\fBtime\fP \fIa\fP \fIt\fP
|
|
\&...
|
|
.fi
|
|
.SH DESCRIPTION
|
|
Word posterior lattices and meshes are lattices generated by aligning
|
|
N-best hypotheses with
|
|
.BR nbest-lattice (1),
|
|
or by aligning PFSG or HTK lattices with
|
|
.BR lattice-tool (1).
|
|
They compactly encode possible word hypotheses sequences and their
|
|
posterior probabilities.
|
|
(Word meshes have become generally known as ``confusion networks'' or
|
|
``sausages.'')
|
|
.PP
|
|
A word lattice is a partially ordered directed graph with nodes representing
|
|
word hypotheses.
|
|
Nodes are identified by non-negative integers.
|
|
The file format specifies the initial node
|
|
.IR i ,
|
|
the final node
|
|
.IR f ,
|
|
and any number of additional nodes
|
|
.IR n .
|
|
For each node
|
|
.I n
|
|
the following associated information is given on the same line:
|
|
the word identity
|
|
.I w
|
|
(the string ``NULL'' is used with initial and final nodes),
|
|
the alignment position
|
|
.I a
|
|
(identical values in this field identify hypotheses that occur at the
|
|
same position),
|
|
and the word posterior probability
|
|
.IR p .
|
|
Following these values, zero or more transitions to successor nodes
|
|
are specified, each given by the node index
|
|
.I ni
|
|
and the transition posterior probability
|
|
.IR pi .
|
|
In a properly normalized word lattice the transition posteriors
|
|
.I pi
|
|
sum up to the node posterior
|
|
.IR p .
|
|
.PP
|
|
Word meshes represent a more constrained lattice format in which
|
|
word hypotheses are in a total order.
|
|
A mesh contains a number of alignment positions, and a set of
|
|
mutually exclusive word hypotheses in each position (the ``confusion sets'').
|
|
The word mesh represents all sentence hypotheses that can be
|
|
generated by freely combining word hypotheses at each position.
|
|
The file format specifies the number of alignment positions
|
|
.IR A
|
|
and the total posterior probability mass
|
|
.I P
|
|
contained in the lattice,
|
|
followed by one or more confusion set specifications.
|
|
For each alignment position
|
|
.IR a ,
|
|
the hypothesized words
|
|
.I wi
|
|
and their posterior probabilities
|
|
.I pi
|
|
are listed in alternation.
|
|
The pseudo-word string
|
|
.B *DELETE*
|
|
represents an empty hypothesis.
|
|
.PP
|
|
Optionally, the word mesh format encodes additional information about
|
|
the hypothesis alignment from which it resulted.
|
|
The keyword
|
|
.B reference
|
|
specifies the correct word
|
|
.I w
|
|
that was aligned at position
|
|
.IR a .
|
|
The keyword
|
|
.B hyps
|
|
is used to list the sentence hypotheses of which a certain word
|
|
hypothesis was a part.
|
|
The word hypothesis is identified by an alignment postion
|
|
.I a
|
|
and the word string
|
|
.IR w ,
|
|
and is followed by the integer IDs
|
|
.I hi
|
|
(typically, the N-best ranks)
|
|
of the associated sentence hypotheses.
|
|
.PP
|
|
As another optional element, the word mesh can contain word-level acoustic and
|
|
temporal information,
|
|
following the keyword
|
|
.BR info ,
|
|
the alignment position
|
|
.IR a ,
|
|
and the word identity
|
|
.IR w .
|
|
This information is derived by
|
|
.BR nbest-lattice (1)
|
|
from word- and phone-level backtraces of N-best
|
|
hypotheses (as represented in Decipher NBestList2.0 format).
|
|
The details of this information are defined in the SRILM class
|
|
.B NBestWordInfo
|
|
and subject to change, but currently include the following.
|
|
.IR start :
|
|
word start time (in seconds from the beginning of the waveform);
|
|
.IR dur :
|
|
word duration (in seconds);
|
|
.IR ascore :
|
|
acoustic model likelihood (log base 10);
|
|
.IR gscore :
|
|
grammar (LM and pronunciation) score (log base 10);
|
|
.IR phones :
|
|
sequence of phones in word (separated by colons);
|
|
.IR phonedurs :
|
|
sequence of phone durations (in numbers of frames, separated by colons).
|
|
When word meshes are derived from HTK format lattices, pronunciation field
|
|
will consist of the HTK phone alignment information, which encodes both
|
|
phone sequence and durations; the phone duration field in turn is used
|
|
to encode the duration model scores, if present.
|
|
.B Note:
|
|
The encoded information pertains to the word hypothesis with the highest
|
|
posterior probability among all hypotheses of the same word aligned
|
|
to a given word mesh position.
|
|
.PP
|
|
The
|
|
.B time
|
|
keyword is used for debugging purposes and encodes the estimated timestamp
|
|
.I t
|
|
of an alignment position
|
|
.I a
|
|
when the input contains backtrace information.
|
|
It is ignored when reading in word meshes.
|
|
.PP
|
|
Both formats optionally encode the associated utterance IDs in the
|
|
.B name
|
|
field.
|
|
Word lattices and meshes can be converted to PFSG format using
|
|
the script
|
|
.BR wlat-to-pfsg .
|
|
.SH "SEE ALSO"
|
|
nbest-lattice(1), lattice-tool(1),
|
|
pfsg-scripts(1), pfsg-format(5), nbest-format(5).
|
|
.br
|
|
L. Mangu, E. Brill, & A. Stolcke, ``Finding consensus in speech recognition:
|
|
word error minimization and other applications of confusion networks,''
|
|
\fIComputer Speech and Language\fP 14(4), 373-400, 2000.
|
|
.SH BUGS
|
|
Detailed alignment and acoustic information is so far only implemented
|
|
for word meshes, although conceptually it would apply equally to word lattices.
|
|
.SH AUTHOR
|
|
Andreas Stolcke <andreas.stolcke@microsoft.com>
|
|
.br
|
|
Copyright 2001-2011 SRI International
|
|
.br
|
|
Copyright 2011-2019 Microsoft Corp.
|