b2txt25/language_model/srilm-1.7.3/CHANGES


Version History

0.90	29 Jun 95	first working code, n-gram models only

0.91	02 Aug 95	snapshot for fosler@icsi, minor bug fixes

0.92	13 Aug 95	added BayesMix, VarNgram LMs

0.93	27 Aug 95	included all LM95 code

0.94	13 Oct 95
	* new directory structure mirroring DECIPHER layout.
	* man pages added
	* added support for Decipher N-best list rescoring
	* added Null LM
	* added new utility scripts
	* bug fixes

0.95	08 Sep 96	as of WS96
	* added Trellis class, disambig program
	* added support for pause tokens (-pau-) in sentences
	  (these are ignored for sentence prob computation)
	* added -tolower mapping
	* added word reversal
	* made Ngram model reading much faster (optimized floating point parsing)
	* added template class for ngram count tries (to use either integer or
	  float count value)
	* added optional noise tag skipping
	* added SkipNgram model
	* added Witten-Bell backoff
	* ported to native Sun and SGI C++ compilers (see doc/c++porting-notes),
	* suppress log10(0.0) warnings

0.96	05 Jun 97
	* Honor -gtNmin parameter even when discounting of higher counts
	  is effectively disabled.  (Allows building maximum likelihood LMs
	  smoothed only by low-count ngram elimination.)
	* Ignore pauses and noise in nbest-lattice alignments (also added
	  -noise option).
	* ngram now supports mixtures of up to 6 ngram models.
	* added HiddenSNgram LM.
	* warn about multiple uses of '-' file for input or output
	* zio now handles incomplete reading of compressed file without error
	* Fixed interaction between deletion and iterations
	* Fixed handling of OOVs in cache model
	* Fixed decipher N-best rescoring: we now duplicate even the
	  roundoff errors incurred by bytelogs.  Also added -decipher flag
	  to ngram to allow replication of recognizer LM scores.
	  Also, takes into account that Decipher (incorrectly) applies WTW
	  even to pauses.
	* Enhanced decipher-rescore script to deal with NBestList2.0 format,
	  with -bytelog and -nodecipherlm options .
	* Added tools to convert bigram and trigram backoff LMs into
	  Decipher PFSG format (pfsg-from-ngram).
	* Enable DecipherNgram models order higher than bigram
	  (ngram -decipher-order flag).  Default is still bigram.
	* Fixed bug that caused float command line arguments to be parsed
	  incorrectly on SunOS4 systems (missing declaration in system header).

0.97 	30 Aug 97	as of WS97
	* New programs: segment and segment-nbest (moved here from
	  development code).
	* Made low-level NgramLM access functions public
	  (findProb, findBOW, insertProb, insertBOW).
	* Fixed nbest-lattice to use normalized posterior word
	  probabilities in lattice.
	* NBest, nbest-lattice: added N-best error computation.
	* WordLattice, nbest-lattice: added lattice error computation.
	* WordLattice: base all alignments on edit distance costs defined
	  in WordAlign.h.
	* contextID() now also returns length of context used.
	  Added contextID() implementations for NullLM and BayesMix.
	* Fixed contextID() for Ngram: don't truncate context if BOW = 1.
	* Fixed SArray, LHash to avoid assignment operator on remove().
	* Fixed add-ppls, subtract-ppls to handle -ppl -debug 2 output.
	* Lots of memory management fixes.
	* SArrayIter and LHashIter now work even while underlying object is
	  being moved (as when containing data structure is enlarged).
	* Added HTK Lattice tool interface (htk/ directory).
	* Made Trellis into a template class.
	* Allow arbitrary n-gram orders with disambig(1).
	* Added forward-backward decoding and posterior probability computation
	  to disambig(1).
	* Added disambig -lmw and -mapw options.
	* Added HMMofNGrams model (ngram -hmm option).
	* VocabMap reader now warns about duplicate entries

0.98	18 April 98
	* Allow ngram to disable Decipher LM backoff hack, for rescoring
	  new exact lattices (ngram -decipher-nobackoff).
	* N-best list vocabulary is now always expanded dynamically
	  (no more OOVs in N-best lists).
	* Added wrapper script for nbest-lattice to compute N-best error rate
	  (nbest-error).
	* Skip ngrams exceeding model order when reading.
	* Fixed memory bug in generateSentence().
	* Changed libmisc to work with Tcl version > 7.
	* Compute word error correctly for empty N-best list.
	* Added ngram pruning based on model perplexity change
	  (ngram-count -prune and ngram -prune).
	* Old ngram -prune option renamed -varprune.
	* New lattice word error minimization (nbest-lattice -lattice-wer).
	* Fixed ngram -gen bug due to omissions in SunOS4 header files.
	* merge-batch-counts removes merged source files
	* Added ngram -prune-lowprobs function to do the work of
	  remove-lowprob-ngrams, but much faster and using less memory.
	* Added support for new Decipher NBestList2.0 format.
	* Added word error count and posterior probability fields to NBestHyp
	  structure.
	* Added optional factor argument to countSentence() (convenient
	  to compute fractional sufficient statistics for alternative
	  training methods).
	* Don't make special symbols (<s>, </s>, <unk>) member of SubVocab
	  by default.
	* Ported to gcc 2.8.1 .

0.99	31 July 1999
	* Added hidden-ngram (word-boundary tagger).
	* Removed line length limit for File object.
	* Added disambig -continuous flag.
	* Fixed backward computation in disambig (again).
	* Generalized compute-best-mix to N > 2 models
	* Added AdaptiveMix LM class
	* Added nbest-mix utility (interpolation of N-best posteriors)
	* Added ngram -unk flag to handle open-class LMs
	* Added disambig and hidden-ngram -text-map option
	* Script enhancements:
	  - New script to convert nbest-lattice word graphs to PFSG
	    (wlat-to-pfsg)
	  - Added switches include probabilities in wlat-to-dot and pfsg-to-dot
	    output.
	  - Conversion to/from AT&T FSM format: fsm-to-pfsg and pfsg-to-fsm
	* ngram -rescore and associated scripts no longer set a hyp
	   probability to zero if it contains OOVs. Instead, the probability
	   is computed ignoring those words (more useful in practice).
	   A warning is output as always.
	* Added ngram-count -float-counts option.
	* Added build support for Linux/i686 platform.

1.00 	8 June 2000
	* Added ClassNgram class and ngram -classes option.
	* Capability to convert class ngrams into word ngrams.
	* New program ngram-class for automatic word class induction.
	* Fixed interaction of ngram -mix-lm -bayes with non-standard n-grams:
	  can now build an interpolation of the non-standard (hidden-event,
	  class-based, etc.) n-gram with the additional, standard n-grams.
	* Replaced LM.noiseTag with LM.noiseVocab (list of noise tags to
	  be ignored).  Tools now take -noise-vocab option (as well as -noise
	  for backward compatibility).
	* Made ngram -counts work for non-n-gram models.
	* Added nbest-lattice -posterior-{amw,lmw,wtw} options to compute
	  word posteriors with different weightings from the one used in
	  hypothesis ranking.  Also added -deletion-bias flag for explicit
	  control of del/ins errors (-use-mesh mode only).
	* NBest rescoring methods now have optional acoustic model weight
	  (defaulting to 1 as before).
	* New class RefList (list of reference transcripts).
	* New class NBestSet (set of N-Best lists).
	* NBest, NBestSet, and nbest-lattice optionally split multiwords into
	  their components on reading (-multiwords option).
	* New nbest-optimize tool for finding near-optimal score combination
	  weights for word error minimizing N-best rescoring.
	* New anti-ngram program, for computing posterior-weighted N-gram
	  counts from N-best lists.
	* New nbest-rover script allows ROVER-style combination of hypotheses
	  from multiple N-best lists.
	* New rescore-decipher -norescore option, to reformat N-best lists
	  without LM rescoring.
	* Fixed bugs related to missing <s> and </s> in change-lm-vocab and
	  make-ngram-pfsg.
	* Significant speedups in LMs involving dynamic programming
	  (HiddenNgram, DFNgram, HMMofNgrams) when interpolating with other
	  models or running in "ngram -debug 2" mode.
	* Allow absolute discounting on fractional counts, for more
	  effective construction of models from fractional counts.
	* Added ngram-merge -float-counts option, and allow "-" (stdin) as
	  input file.
	* ngram-count ensures <s> unigram (with prob 0) is defined to avoid
	  breaking other programs.
	* Added make-abs-discount script to compute absolute discounting
	  constants from Good-Turing statistics.
	* compute-sclite and compare-sclite now take -multiwords option to
	  split compound words prior to scoring.
	* Changed option handling so that unsigned option arguments are forced
	  to be non-negative.
	* Added Map2 (2D Map) class to libdstruct.
	* Much better string hash function (borrowed from Tcl).
	* New man pages: training-scripts(1), lm-scripts(1), ppl-scripts(1),
	  pfsg-scripts(1), nbest-scripts(1), lm-format(5), classes-format(5),
	  pfsg-format(5), nbest-format(5).

1.0.1	12 July 2000

	Functionality:

	* wordError() and nbest-lattice -dump-errors now also output the
	  location of deletions in the alignment (NOTE: possible code
	  incompatibility).
	* New reverse-ngram-counts script.

	Bug fixes:

	* Workarounds for shortcomings in Linux gcc, math library, and linker.
	* make-ngram-pfsg: don't ignore bigram states with zero BOW (bugfix).
	* nbest-rover: fixed problem with handling of + lines.

1.1	21 May 2001

	Functionality:

	* HiddenNgram class generalized to deal with disfluency-type events
	  that manipulate the N-gram context.
	* rescore-reweight script now accepts additional score directories
	  (and associated score weights) for combination of an arbitrary number
	  of knowledge sources.
	* Enhanced rescore-decipher functionality:
	  - Option -lm-only to produce output containing LM scores only
	  - Option -pretty to perform word mapping on the fly.
	  - Warn about and handle LM scores that are NaN.
	* New class VocabMultiMap, implementing dictionary-style mappings of
	  words to strings from another vocabulary.
	* Added support for pronunciation-based word alignments in
	  WordMesh and nbest-lattice -use-mesh .
	* Added nbest-lattice -keep-noise option to preserve pauses and noises
	  in alignments.
	* Support for multiwords: - make-multiword-pfsg expands PFSGs to use
	  multiwords (using AT&T FSM tools).
	  - multi-ngram expands N-gram LM to include multiwords.
	* Added support for Decipher Intlog scaled log probabilities.
	* Added ngram -seed option to initialize random sentence generation
	  (contributed by Eric Fosler).
	* New add-pauses-to-pfsg pause= and version= options to allow
	  generation of Nuance-compatible PFSGs (see man page for details).
	* The NBest class and scripts handle NBestList2.0 format containing
	  phone and/or state backtraces (by ignoring them).
	* Added Amoeba search option to nbest-optimize (contributed by
	  Dimitra Vergyri).
	* Added standard 1-best optimization mode to nbest-optimize.
	* wlat-to-pfsg script now also processes confusion networks output by
	  nbest-lattice -use-mesh .

	Bug fixes:

	* ngram -decipher-nobackoff now applies to the -lm ngram as well if
	  option -decipher is also specified.
	* ngram -expand-classes no longer dumps core when handling
	  "context-free" class expansions (though those aren't supported).
	* gawk path in scripts is now adjusted prior to installation
	  (/usr/bin/gawk for Linux, /usr/local/bin/gawk elsewhere).
	* Fixed numerical problems in nbest-rover/nbest-posteriors.
	* ngram-counts -float-counts behaved differently from equivalent
	  integer-count estimation;  both integer and float counts now use
	  the same estimation code.
	* Reduced memory requirements of nbest-optimize by about 25%.
	* Minor changes for gcc-2.95.3.

1.1.1	20 July 2001

	Functionality:

	* WordMesh: new interface to record reference word string in alignment.
	* nbest-lattice: confusion networks can now record reference words
	  if specified with -reference, and are preserved by -write/-read.
	* replace-words-with-classes now has option to process ngram count
	  files (have_counts=1).
	* merge-nbest: new utility to merge N-best hyps from multiple lists.
	* wlat-stats: new utility to compute statistics of word posterior
	  lattices.

	Bug fixes:

	* GT discounting: fixed anomaly due to different floating point
	  precision on x86 platforms.
	* anti-ngram(1): documented options previously omitted.
	* WordMesh: reading/writing of confusion networks now preserves
	  total posterior mass.
	* Changed the hypothesis alignment order in nbest-optimize to be
	  more compatible with decoding in nbest-lattice: first align nbest
	  hyps in order of decreasing (initial) scores, then align reference.
	  nbest-optimize -no-reorder keeps the old behavior (with references
	  anchoring the alignment).  All scores and initial lambdas are now
	  used to compute initial posterior hyp probabilities to guide the
	  hypothesis alignment; thus, it now makes sense to restart an
	  optimization with partially optimized weights to revised the
	  alignments.
	* nbest-optimize now warns about missing or incomplete score files.
	* Fixed a memory access error in nbest-optimize -1best.
	* Fixed weight normalization in nbest-optimize when first element is 0.
	* Miscellaneous fixes for compile under RH Linux 7.0.

1.2	20 November 2001

	Functionality:

	* nbest-lattice -dictionary allows word alignments to be guided by
	  dictionary pronunciations.
	* nbest-lattice -use-mesh -record-hyps records the rank of N-best hyps
	  contributing to each word hypothesis in the confusion network.
	* nbest-lattice -no-rescore and -decipher-format options make it
	  more convenient as an N-best format conversion tool.
	* VocabDistance: new class and subclasses to represent distance metrics
	  (e.g., phonetic distance) over vocabularies.
	* WordMesh: output word hyps in order of decreasing posteriors.
	* WordMesh: reading/writing of confusion networks now includes hyp IDs
	  from alignment.
	* NBest/MultiAlign/WordMesh: support for keeping extra word-level
	  information (NBeSTWordInfo).
	* nbest-lattice: unified single and multiple file processing.
	  New option -write-dir to write multiple output lattices.
	  New option -refs to supply multiple references.
	  Options -nbest-errors and -lattice-errors are replaced by
	  switches -nbest-error/-lattice-error, in conjunction with
	  -references/-refs.  Outputs are now prefixed by utterance IDs
	  when processing multiple files.
	* nbest-lattice -nbest-backtrace enables processing of backtrace
	  information from N-best lists; combined with -use-mesh this produces
	  sausages that contain word-level scores and alignment information,
	  as well as phone backtraces (see new wlat-format(5) man page).
	* wlat-stats script now also computes error statistics when processing
	  confusion networks with references.
	* nbest-rover now handles N-best lists in Decipher format.
	* hidden-ngram and disambig: new option -fw-only to use only forward
	  probabilities for posterior computation.
	* rescore-decipher -filter option to apply textual rewriting filters
	  to hypotheses before rescoring.
	* segment-nbest -write-nbest-dir option for dumping rescored N-best
	  lists to a directory instead of to stdout.
	* segment-nbest -start-tag and -end-tag options to insert tags at
	  margins of N-best hyps.

	Bug fixes:

	* WordMesh: computation of deletion costs using a dictionary distance
	  was completely bogus (only affected undocumented nbest-lattice
	  -dictionary option).
	* nbest-lattice: correctly process -nbest-files using -dictionary in
	  alignment.
	* nbest-rover: fixed to work on Linux
	* hidden-ngram: don't abort when an event posterior is 0.
	* hidden-ngram: avoid abort when *noevent* occurs in -hidden-vocab list.
	* segment-nbest: now correctly uses ngram contexts longer than trigram.
	* segment-nbest: optimized -bias 0 case by disallowing sentence
	  boundary states altogether.
	* multi-ngram -prune-unseen-ngrams prevents insertion of multiword
	  N-grams whose component N-grams were not in the original model.
	* ngram: fixed computation of mixture lambda for second LM when three
	  or more models are interpolated.
	* nbest-posterior (and thus nbest-rover) no longer split multiwords by
	  themselves.  To split multiwords with nbest-rover, append the
	  -multiwords option to the argument list, which is passed on to
	  nbest-lattice to achieve the desired effect.
	* ngram -renorm now applies BEFORE class expansion or pruning of
	  model (in case input model is unnormalized).
	* make-nbest-pfsg bug involving transition into final node fixed.
	* Minor script changes to avoid warnings with gawk 3.1.0.

1.3	11 February 2002

	Functionality:

	* Trellis class, disambig and hidden-ngram tools: added support for
	  N-best decoding (contributed by Anand Venkataraman).

	* MultiwordLM wrapper LM class as a convenient way to split multiwords
	  prior to LM evaluation.

	* New MultiwordVocab class to support MultiwordLM.

	* Added ngram -multiwords option (based on MultiwordLM wrapper).

	* Added support for Chen & Goodman's Modified Kneser-Ney smoothing
	  and interpolated backoff estimates.  See ngram-count options
	  -kndiscount[1-6], -kn[1-6], and interpolate[1-6].

	* New library and tool for lattice manipulation: lattice-tool.

	* New nbest-mix -set-am-scores and -set-lm-scores options. These allow
	setting either the AM or the LM scores in the N-best output to simulate
	the combined posteriors, while preserving the other scores.

	* Added some regression tests (test/ subdirectory).

	* Support for Windows via CYGWIN porting layer (MACHINE_TYPE=cygwin).
	See doc/README.windows for details.

	Bug fixes:

	* Trellis: deallocate old trellis nodes on demand in init(), rather
	  than preemptively in clear().  Greatly speeds up forward computation
	  for trellis-based LMs (e.g., ClassNgram).

	* Textstats: fix to handle zero denominator in ppl computation.

	* disambig: fixed off-by-one error indexing into trellis.

	* Miscellaneous small fixes for compilation and operation under Windows
	(using the CYGWIN environment).

	Warning: See doc/README.x86 about a gcc compiler bug that might
	affect you on Intel platforms.

1.3.1	25 June 2002

	Functionality:

	* nbest-optimize -write-rover-control option conveniently dumps a
	control file for nbest-rover that encodes the optimized parameters.
	* New regression tests for nbest-rover (i.e., nbest-lattice) and
	nbest-optimize.
	* nbest-posteriors, combine-acoustic-scores now all handle and
	preserve Decipher N-best formats.  This allows nbest-rover to
	generate sausages with backtrace information if input N-best lists
	contain it (using -nbest-backtrace option).
	* New tool nbest-pron-score for computing pronunciation and pause LM
	scores from N-best hypotheses.
	* Added disambig -totals option to compute total string probabilities
	(same as in hidden-ngram).
	* reverse-lm: simple filter to reverse a bigram backoff LM.
	* lattice-tool -collapse-same-words reduces lattices by merging all
	nodes with identical words (but also creates new paths in lattice).
	* nbest-lattice -prime-with-refs option uses reference strings
	to improve sausage alignment.
	* compute-best-sentence-mix: new script to optimize sentence-level
	interpolation of LMs.
	* nbest-lattice -lattice-files option to align multiple word lattices;
	currently only works with -use-mesh (sausages).
	* hidden-ngram now supports mixture and class N-gram LMs.
	* New class SimpleClassNgram, a more efficient implementation of
 	ClassNgram's where each word is assumed to belong to at most one
	class and class expansions are exactly one word long.
	Enabled by -simple-classes switch in ngram, lattice-tool, and
	hidden-ngram.
	* ngram -counts now handles escaped input lines and LM state change
	directives embedded in the input.
	* New tool nbest-pron-score for scoring pronunciations and pauses in
	N-best hypotheses.
	* NgramStats::parseNgram() new function to parse N-gram counts from
	a character string.
	* LM::pplCountsFile() new function to evaluate LM on counts read from
	a file.

	Bug fixes:

	* make-ngram-pfsg is no longer limited to trigram models.
	* Avoid NaN values in disambig and hidden-ngram, in cases where lmw or
	mapw are zero and the corresponding log probabilities are -Infinity.
	* Avoid numerical problems in N-best posterior computation by using
	AddLogP() to compute normalizer.
	* anti-ngram no longer requires -refs argument with -all-ngrams.
	* Fixed bug removing noise from N-best lists with backtrace.
	* Code fixes for clean compiles with gcc 3.x.
	* nbest-rover more efficient by using a single invocation of
	nbest-lattice for all input N-best lists.
	* ClassNgram: fixed handling of words that appear as members of a class
	with zero probability, or have zero membership probability.
	* nbest-lattice -record-hyps now outputs hyp ids according to the
	original N-best order, rather than the sorted one.
	* make-hiddens-lm now gives proper unigram probability to hidden-S tag.
	* Compute acoustic scores in Decipher N-best-2 format by subtracting
	token LM scores from total score.  This deals correctly with cases where
	the total scores have been adjusted by summing merged hyps, and are no
	longer the sum of all AC and LM word scores.
	* Gawk scripts that test for alphabetic or lowercase characters are
	more portable and handle non-ascii and multibyte characters.

	The package now includes a paper on SRILM, to appear in ICSLP-2002,
	that gives an overview of the software and its design (doc/paper.ps).

1.3.2	3 September 2002

	New functionality:

	* Added ngram-count and ngram-count -nonevents option to specify a
	subset of words that are to be non-events, i.e., tokens that can only
	occur in contexts (such as <s>).
	* Extended ngram-count discounting options for up to 9-grams.
	* Added support in Vocab and Ngram classes for processing meta-counts
	(counts-of-counts).
	* Added ngram-count -meta-tag and -kn-counts-modified options to
	support make-big-lm.
	* Added ngram-count -read-with-mincounts flag to suppress counts
	below cuttoff thresholds at reading time.  This dramatically lowers
	memory consumption, and speeds up make-big-lm operation (which used
	to use a gawk script for the same purpose).
	* Added option to specify vocabulary to add-pauses-to-pfsg for cases
	where heuristics fail.
	* lattice-tool can now handle arbitrary order LMs for expanding
	lattices.  The old trigram expansion algorithm is still available
	with -old-expansion; the compact trigram algorithm is unchanged with
	-compact-expansion.
	* To better support lattice expansion, two new functions have been
	added to the LM interface: contextID() takes an optional word
	argument, to compute the context needed to predict a specific word,
	and contextBOW() is a new interface to compute the backoff weight
	associated with truncating a history.
	* Added makefile support to generate executable versions that use
	"compact" data structures.  See item 9 in INSTALL for details, and
	doc/time-space-tradeoff for a simple benchmark result.

	Bug fixes:

	* Convert pseudo-log(0) value (-99) in DARPA backoff models back to
	true log(0) on reading.  This ensures that non-event words in the
	input are treated as zeroprobs (by the perplexity computation and
	otherwise).
	* Avoid NaN floating point results in N-best rescoring and
	nbest-optimize, by handling 0 * log(0) more carefully.
	* Handle -Inf AM and LM scores in SRILM N-best format.
	* make-big-lm was reworked to support KN in addition to GT discounting.
	Warning: the modified lower-order counts for KN are created using
	merge-batch-counts and can get almost as big as the original counts.
	Beware of the additional disk space and run time requirement!
	* Clear out old parameters before reading or estimating N-gram models.
	* Reading in new class definitions into ClassNgram object now deletes
	old definitions (unless classes file is empty).
	* Destructors for Ngram and ClassNgram now free N-gram and class
	definition memory.
	* nbest-pron-score: avoid core dump when pronunciation information is
	missing from N-best list.
	* make-ngram-pfsg: fixed generation of unigram PFSGs.
	* Avoid use of toupper() in add-pauses-to-pfsg.
	* Handle ngram-count -order 0 and print warning.
	* Avoid using zcat in scripts since it behaves differently on different
	systems and depending on PATH setting.
	* nbest-lattice and nbest-optimize no longer strip a filename part
	following '.' to derive utterance ids; only known file suffixes
	are removed.
	* Fixed bugs in member declarations that were preventing TaggedVocab,
	TaggedNgramStats, and StopNgramStats from working correctly.
	* compute-sclite now ignores utterances with a reference of
	"ignore_time_segment_in_scoring", consistent with NIST STM scoring.
	* Vocab.h now defines SArray_compareKey() for strings over VocabIndex,
	allowing use as keys in sorted arrays.
	* ClassNgram now uses the processed words as the context after an OOV.
	This works better when the input contains context cue tags.
	* i386-solaris platform was not being detected by machine-type script.

1.3.3	2 March 2003

	New functionality:

	* Increased maximum number of interpolated LMs in ngram, hidden-ngram,
	and lattice-tool to 10.
	* ngram now computes static interpolation (N-gram merging) of up to 10
	input LMs (consistent with handling of dynamic interpolation).
	* ngram and lattice-tool -limit-vocab option limits LM reading to
	those parameters that pertain to words specified by -vocab.
	The LM:read() function got an optional second argument for this
	purpose.
	ngram -limit-vocab -renorm now effectively does the same as the
	change-lm-vocab script.  However, the main purpose of -limit-vocab
	is to save memory by discarding N-grams that are not relevant to a
	test set.
	* rescore-decipher -limit-vocab precomputes the vocabulary used by
	N-best lists and invokes ngram -limit-vocab to allow rescoring with
	very large models on machines with little memory.
	* Ngram::mixProbs() now has version that destructively merges an Ngram
	into an existing model.  ngram -mix-lm now uses this version, instead
	of the old, non-destructive one, thereby achieving considerable time
	and space savings (only two models, rather than 3, have to be kept in
	memory at a time).
	* ngram-count and ngram -map-unk option, to change the "unknown" word
	token string.
	* compute-sclite, compare-sclite now understand multiple -S options to
	specify intersections of several utterance subsets for scoring.
	* make-batch-counts now ignores lines in input file list that start
	with # (allowing comments in the file list).
	* Added replace-words-with-classes partial=1 option to prevent
	multi-word replacements that include multiple whitespace characters
	(i.e., "a b" is only replaced with a single space between the words).
	* New LM script: sort-lm, reorders N-grams lexicographically, as
	required by some other software (e.g., Sphinx3, pointed out by
	Mikko Kurimo <mikkok@james.hut.fi>).
	* New training script: reverse-text, reverses word order in text file.
	* New pfsg script: pfsg-vocab, extracts vocabulary used in PFSGs.

	Bug fixes:

	* disambig and hidden-ngram -keep-unk now also causes LM to be
	treated as  open-vocabulary.
	* HiddenNgram class (debug level 2) was omitting the event after
	the last word from the Viterbi backtrace.
	* ngram -expand-classes was including -pau- word in expanded LM.
	* Made backoff computation in Ngram:wordProbBO() more efficient,
	avoiding multiple lookups in the context trie.  Gives about a 30%
	speedup in ngram -debug 3 -ppl.
	* ngram -lm reading is faster by about 8% due to a code optimization.
	* ngram-count -order 2 -kndiscount3 no longer aborts with an error.
	The -order option effectively limits the discounting parameters
	computed, so that the model order can be changed without having to
	adjust the smoothing options.
	* make-big-lm -trust-totals option is ignored with KN discounting,
	they don't work well together.
	* make-big-lm now checks that input counts files are not stdin.
	* Reading N-best lists in Decipher format now sets the number-of-words
	score, so that weight rescoring, optimization etc. can use them.
	* ngram-count normalizes the N-gram probabilities for a context to 1
	if the backoff distribution for that context has probability mass 0.
	The latter can happen e.g. if all N-grams for a context have been
	observed and received discounted probabilities.  The fix ensures that
	the overall distribution is normalized in this case.
	* rescore-reweight now accepts Decipher N-best lists.
	* nbest-posteriors and nbest-rover now handle Decipher version 2
	N-best lists better (allowing LM and WT weights to be applied).
	* Initialize locale in all top-level programs.  disambig, hidden-ngram,
	segment, and segment-nbest were missing it, causing potential problems
	with non-ASCII characters.
	* nbest-lattice -write-vocab option to find vocabulary used in N-best
	list.
	* nbest-pron-score now uses idFromFilename() function to avoid
	over-truncating filenames when inferring sentence ids.
	* Added more strippable filename suffixes in idFromFilename() function.
	* NBest: correctly read in phone backtraces that are time-reversed.
	* compute-oov-rate ignores -pau- tokens.
	* Various N-best scripts now process input directories containing links
	(rather than plain files) correctly.
	* Lattice class takes care to limit range of intlog transition
	probabilities in PFSG output, so as to avoid overflow when converting
	to bytelog scale.
	* make-ngram-pfsg removes temporary file (now placed in /tmp) even
	when killed by signal.
	* Hidden-event and DF N-gram models are documented in detail in ngram
	man page.
	* Test suite result comparisons against reference output now use a
	script that ignores small numerical discrepancies, so as to produce
	fewer false alarms.

	Portability:

	* Compiles under MacOS X (MACHINE_TYPE=macosx), thanks to help from
	wooters@icsi.berkeley.edu and jean-philippe.demoulin@enst.fr.

1.4	14 February 2004

	New functionality:

	* Added support for factored language models, developed by Katrin
	Kirchhoff and Jeff Bilmes, and implemented by Jeff Bilmes.
	A new library, libflm.a, and two new tools, fngram-count and fngram
	are built in the flm/ directory.  A conference paper and a technical
	report are included as documentation in flm/doc/.  Questions and bug
	reports should be directed to bilmes@ee.washington.edu.
	FLM support has also been integrated into some of the standard
	tools (ngram and hidden-ngram) and is enabled by the -factored option.

	* Added support in lattice-tool to read/write and rescore HTK lattices.
	See lattice-tool man page for details.
	* The lattice expansion algorithm for general LMs now preserves
	pause and null nodes.  Consequently, lattice-tool no longer eliminates
	pause and null nodes prior to applying this algorithm, unless
	-no-pause or -compact-pause was specified.
	* Implemented a new algorithm to build word meshes (confusion networks,
	sausages) from lattices, that is faster than the original Mangu et al.
	method.  lattice-tool -posterior-decode uses this to extract 1-best
	word hypotheses, and lattice-tool -write-mesh allows writing of
	sausages to file.
	* The "compact" lattice expansion algorithm that uses backoff nodes
	(described in Weng et al. 1998) has been generalized to handle
	LMs of arbitrary order.  As before, this algorithm is triggered by
	lattice-tool -compact-expansion.  (To get the old version, which
	handles only trigrams and produces non-identical results, use
	lattice-tool -compact-expansion -old-expansion.)
	* lattice-tool -density allows pruning of lattices to a specified
	density (in addition to the posterior threshold).
	* lattice-tool -multi-char option allows designating characters other
	than underscore as multiword delimiters.
	* Added a "LatticeLM" class that emulates a language model using the
	transition probabilities in a lattice.  This is useful for debugging
	and comparing the probabilities assigned by lattices to corresponding
	LM probabiltiies.  A new option lattice-tool -ppl makes use of this
	class (analogous to ngram -ppl).
	* lattice-tool lattice algebra operations (or, concatenate) can now
	be applied to multiple input lattices, always using the same lattice
	as second operand.

	* ngram has enhanced N-best rescoring functionality, allowing
	multiple input lists to be rescored (-nbest-files, -write-nbest-dir,
	-decipher-nbest, -no-reorder, -split-multiwords).
	* rescore-decipher -fast enables a faster rescoring mode that uses
	only the built-in functions of ngram, thus running much faster.
	* New option ngram -rescore-ngram to recompute the probabilities in
	an N-gram model using an arbitrary other LM.

	* Added original (unmodified) Kneser-Ney discounting (ngram-count
	-ukndiscountN options). Contributed by Jeff Bilmes.
	* New disambig -classes option to read vocabulary maps in
	classes-format(5).
	* New disambig -write-counts option to output word/class substitution
	bigram counts (useful to reestimate class membership probabilities).
	* nbest-pron-score -pause-score-weight creates weighted combination
	of pronunciation and pause LM scores.
	* compute-sclite -noperiods option to delete periods from hyps
	for scoring purposes.
	* New script empty-sentence-lm to modify existing LM to allow
	the empty sentence with a given probability.
	* compute-sclite handles CTM files in RT-03 format.
	* ngram-class -debug 2 prints the initial word-to-class assignments,
	so that the entire class tree can be reconstructed from the output.
	* RefList class has option to read and look up reference words without
	associated ID strings (indexed by integers).
	* Enhanced WordMesh and WordLattice classes to have an optional
	"name" field, used to record utterance ids.
	* New select-vocab command to implement likelihood-optimizing
	vocabulary selection from multiple corpura.  Contributed by
	Anand Venkataraman and Wen Wang. See man page for details.

	Bug fixes:

	* ngram avoids reading classes file multiple times if -limit-vocab
	is not being used (otherwise it is unavoidable, and will lead to
	errors if the reading is from stdin).
	* Fixed some bugs in compare-sclite and compute-sclite.
	* Modified ngram and compute-best-mix so that the latter works
	with ngram -counts output.  ngram -counts now outputs the count
	values != 1 for each N-gram so that compute-best-mix can take them
	into account in the optimization.
	* rescore-reweight and nbest-rover were not handling Decipher N-best
	lists correctly when additional score directories are given.
	* nbest-rover -wer disables use of nbest-lattice -use-mesh option,
	so nbest-rover can be used for old-style word error minimization
	(or even 1-best rescoring, by also specifying -max-rescore 1).
	* lattice-tool -ref-file and -ref-list were being ignored when
	processing only a single input lattice.  Fixed so that lattice error
	can now be computed with either -input-lattice or -input-lattice-list.
	* Enhanced MultiwordLM class with new contextID() and contextBOW()
	versions that better reflect the backoff behavior of the wrapped LM
	class.  Makes it much more efficient to use the lattice-tool -multiword
	option, i.e., expand a multiword lattice with a non-multiword LM.
	* rescore-decipher -pretty had a bug that caused mapping to be applied
	to the score fields as well, potentially corrupting the format.
	* Fixed bugs in mixture lambda computation (ngram, hidden-ngram,
	lattice-tool), triggered by more than one lambda being zero, or using
	more than 5 mixtures.
	* lattice-tool algebra operations used to crash if operand lattices
	contained NULL nodes.
	* Non-compressed files ending in .gz can now be read successfully.
	* Catch a possible 0/0 problem in the Good-Turing discount estimator.
	* Fixed memory management for strings returned by TaggedVocab::getWord()
	thereby avoiding garbled results.
	* lattice-tool -pre-reduce-iterate and post-reduce-iterate arguments
	where not being used to control number of lattice reduction iterations.
	* Fixed an unitialized memory bug that could produce random results
	in posterior probability computation (and hence in lattice pruning).
	* Fixed a bug in lattice pruning triggered by unnormalized posteriors
	greater than 1.

	Portability:

	* Fixed some problems compiling with gcc-3.2.2; eliminated compile-
	time warnings about division by zero in constant definitions.
	* Rewrote some code to work around limitations and warnings in the
	Intel C++ compiler.  (In return, got compiled code that runs 10-20%
	faster!)  For processor-specific optimizations, use
		make MACHINE_TYPE=i686-p4	.
	* Fixed some script problems that surfaced in latest gawk version.
	* Fixed some problems compiling with Tcl/Tk-8.4.1.
	* FreeBSD support (contributed by Zhang Le <ejoy@peoplemail.com.cn>).
	* Updated Nuance-related features in PFSG scripts and man page.
	* Note: Integration of FLM support required some changes to the
	Vocab and Ngram class interface.  In particular, several member
	variables (e.g., Boolean Vocab::unkIndex) have been replaced by virtual
	member functions that return references to the variables (e.g.,
	Boolean &Vocab::unkIndex()).  This requires, albeit trivial, changes
	to any client code that accesses these variables.

1.4.1	9 May 2004

	Functionality:

	* New option lattice-tool -htk-quotes to enable the HTK quoting
	mechanism that allows whitespace and non-printable characters to be
	used in word labels.  (This is disabled by default since other SRILM
	tools don't allow such word strings.)
	* New option lattice-tool -add-refs to add a path corresponding to
	the reference word string to each lattice.
	* New option ngram -counts-entropy to compute entropy (log probabilties
	weighted by joint N-gram probability) from counts.

	Bugs fixed:

	* nbest-lattice could core dump if references where not supplied.
	* FLM/ProductVocab: fixed problems with mapping of <s> and </s> to
	factored form.
	* Lattice algebra operations (or, concatenate) now preserve HTK link
	information and lattice names.
	* Fixed LM::contextProb() handling of <s> and other non-event tokens.
	This also allowed Ngram:computeContextProb() to be eliminated.
	* LatticeFollowIter iterator no longer takes lookahead parameter --
	lookahead is unlimited and cycles are avoided by keeping a table of
	visited nodes.  This also greatly speeds up lattice expansion in
	some cases.
	* Detect negative discounts in modified Kneser-Ney method, arising
	from non-monotonic counts-of-counts.
	* Fixed various debugging output messages in the Lattice class.

	Portability:

	* Matthias Thomae <thomae@ei.tum.de> found that make-ngram-pfsg
	(and probably other gawk scripts) may not work correctly with recent
	versions of gawk unless the environment is set to LC_NUMERIC=C.

1.4.2	19 October 2004

	Functionality:

	* lattice-tool -factored option to handle factored LMs (analogous
	to ngram and hidden-ngram).
	* lattice-tool -nbest-decode generates N-best lists from lattices
	(contributed by Dustin Hillard, University of Washington).
	* lattice-tool -output-ctm option to generate CTM-formatted 1-best
	output, either with -viterbi-decode or with -posterior-decode.
	Of course this requires HTK input lattices containing timemarks.
	* Added version of WordMesh::minimizeWordError() that returns acoustic
	information in a NBestWordInfo array, to support the above.
	* lattice-tool -insert-pause option to insert optional pause nodes in
	lattices.
	* lattice-tool -unk will map unknown words to <unk> instead of
	automatically augmenting the vocabulary (the -map-unk option allows
	the mapping of unknown words to be customized).
	* lattice-tool -acoustic-mesh records word times, scores, and phone
	alignments when confusion networks are built.
	* lattice-tool -ignore-vocab option to define the set of words that
	are ignored in LM processing (like pause nodes).
	* lattice-tool -write-ngrams option to compute expected N-gram counts
	from lattices.
	* HTK lattices now supports up to three "extra" score fields (x1..x3),
	which can be used to rescore hypotheses with arbitrary non-standard
	knowledge sources.
	* Added support for the "s" key in HTK lattices (used to encode
	state alignment info).
	* anti-ngram -min-count option to prune N-grams with expected frequency
	below specified threshold.
	* ngram -adapt-marginals and related options to trigger use of
	unigram marginals adaptation, following Kneser et al. (Eurospeech 97).
	* New LM class AdaptMarginals to support the above.
	* nbest-lattice and lattice-tool -hidden-vocab option allows specifying
	a subvocabulary that should not be aligned with regular words when
	building confusion networks.
	* New VocabDistance subclass SubvocabDistance, to support the above.
	* nbest-optimize -combine-linear and -non-negative options, useful to
	optimize linear combinations of posterior probability scores.

	Bugs fixed:

	* lattice-tool: Avoid disconnecting lattice in density pruning.
	* Utility script installation was not working for Cygwin hosts.
	* ProductNgram::contextID() now returns hash code of context used,
	instead of zero, and limits context-used length to order-1.
	* HTK lattice output was omitting wdpenalty value.
	* Improved collision-prone hash function for VocabIndex arrays.
	* Documented order of operations in lattice-tool(1).
	* Fixed excessive /tmp space usage in nbest-rover script, so as to
	avoid frequent incomplete output with large N-best data as a result
	of running out of disk space.
	* Fixed bug in compute-sclite that would garble STM references without
	the optional 6th field.
	* Fixed bug in Trie::insert(), which would always set foundP = true,
	even if a new entry was created.
	* Preserve Lattice:limitIntlogs flags in lattice algebra operations.
	* Use sorted node map iteration in lattice-tool expansion algorithms,
	so that results are not subject to pseudo-random hash table ordering.
	* HTK lattice output no longer has more nodes/links than input
	(provided -no-htk-nulls, -htk-scores-on-nodes, or -htk-words-on-nodes
	are NOT used).
	* Take default lattice name from input filename, rather than output
	filename (which may not be defined), however:
	* The embedded names of output lattices from binary lattice operations
	are derived from the output file name.
	* Fixed bug in reading of word meshes (confusion networks) introduced
	in release 1.4.
	* Fixed a bug in alignments of multiple confusion networks, affecting
	cases where the inputs have posterior masses != 1.

1.4.3	3 December 2004

	Functionality:

	* Increased the number of extra scores supported in HTK lattices
	(x1, x2, ... x9).
	* lattice-tool -nbest-viterbi option to use Viterbi N-best algorithm,
	which uses less memory (contributed by Jing Zheng).
	* Added nbest-lattice -output-ctm analoguous to lattice-tool.
	* Make -output-ctm output word posteriors in the confidence field.
	* Extend the meaning of the nbest-lattice -max-rescore option so that,
	in lattice mode, it limits the number of hypotheses that are aligned.
	(The meaning of -max-rescore was previously only defined in N-best
	rescoring mode).
	* Added -version option to all top-level programs.

	Bug fixes:

	* Improved efficiency and duplicate elimination in A-star N-best
	generation (contributed by Jing Zheng).
	* Worked around a problem with gawk scripts in Linux handling of
	/dev/stderr device which can cause a file to be truncated if stderr is
	redirected to it.
	* MultiAlign::addWords() was not preserving NBestWordInfo.

	Other:

	* Various small code changes for compilation with gcc 3.4.3.
	* Maintenance scripts moved to $SRILM/sbin/.
	* Support for commercial releases excluding third-party code
	contributions.

1.4.4	6 May 2005

	Functionality:

	* ngram-count now allows use of -wbdiscount, -kndiscount, etc.,
	without a specified N-gram order, to set the default discounting
	method for all N-gram orders.  As before, this can be overridden by
	-wbdiscount[1-9], -kndiscount[1-9], etc., for specific N-gram
	lengths (suggested by Anand).
	* lattice-tool -keep-pause has additional side-effects if used with
	-nonevents and -ignore-vocab (making pauses behave like regular words).
	* lattice-tool -dictionary-align option triggers use of dictionary
	pronunciations for word mesh alignment (contributed by Dustin Hillard).
	* New option lattice-tool -nbest-duplicates allows control over the
	number of duplicate word hypotheses to output (from Dustin Hillard).
	* Update to the FLM tools from Kevin Duh, to make fngram-count use the
	-vocab option to limit the vocabulary of the estimated model.
	* Added nbest-optimize -hidden-vocab option to constrain the alignment
	of a subvocabulary (analogous to nbest-lattice -hidden-vocab).
	* wlat-stats computes the posterior expected number of words in the
	input lattice.

	Bug fixes:

	* ngram -unk maps unknown words in N-best hyps to <unk> instead of
	adding them to the vocabulary.
	* lattice-tool: Don't punt when encountering a NULL word node with
	pronunciation, output a warning instead.
	* lattice-tool -nbest-decode now uses a double-ended heap data
	structure, and -nbest-max-stack drops hypotheses from the bottom
	of the heap instead of the top (contributed by Dustin Hillard).
	* lattice-tool -nbest-decode now does more thorough duplicate removal
	(not just adjacent duplicates are removed).
	* lattice-tool no longer gives an error if input lattice has posteriors
	specified on nodes (even though they are effectively ignored).
	* select-vocab: miscellaneous bug fixes from Anand.
	* nbest-lattice: fixed various bugs with -nbest-backtrace option.
	* compute-sclite: work around bug in csrfilt.sh -dh affecting waveform
	names containing hyphens.
	* Minor tweaks for MacOSX build.

1.4.5	28 August 2005

	Functionality:

	* ngram -debug 0 -ppl now outputs statistics for each input section
	delimited by escape lines, in addition to overall results (based on
	a modification by Dustin Hillard).  ngram -debug 1 and higher behave as
	before.
	* ngram -loglinear-mix implements log-linear mixture LMs.
	* LoglinearMix: new class to support the above.
	* VocabMap: added remove(.) method to remove all entries for given
	source word.
	* WordMesh: added wordColumn() function to return confusion set at
	given position (contributed by Dustin).
	* Lattice: added readMesh() function to read in confusion networks
	(from Dustin).
	* lattice-tool -read-mesh allows handling in confusion network format
	(from Dustin).
	* nbest-optimize -1best-first implements a heuristic strategy whereby
	the relative score weights are first optimized in -1best mode, followed
	by full optimization together with posterior scale.
	* nbest-optimize -max-time forces search to time out if new best
	weights aren't found within a certain number of seconds.
	* New script combine-rover-controls to merge multiple nbest-rover
	control files for system combination.

	Bug fixes:

	* disambig clears old map entries when encountering a duplicate
	definition for a source word.
	* nbest-optimize: posterior scaling of fixed weights was broken.
	* WordMesh, nbest-lattice: do better error checking on reading
	confusion network files, handle numalign and posterior specs out of
	order.
	* lattice-tool had a bug in the handling of HTK format lattices that
	do not contain an explicit specification of initial/final nodes.
	* Added proper copy constructors and assignment operators for
	Array, SArray, and LHash classes.  This in turn makes the copy
	constructor for NgramLM and other classes work properly.
	(Assignment still doesn't work for some higher-level classes because
	of reference (&) variable members.)
	* Fixed minor bug in the ngram -skipoovs implementation, found by
	Alexandre Patry.

	Portability:

	* Port to win32-mingw platform (by Jing Zheng).  Doesn't support
	compressed file i/o, or the -max-time options in nbest-optimize and
	lattice-tool.
	* Minor tweaks for compilation with gcc-4.0.1.
	* Renamed HTKLink class to HTKWordInfo, which is more appropriate and
	avoids a naming conflict with SRI's Decipher software.

1.4.6	20 January 2006

	Functionality:

	* Added support for reading/writing files compressed with bzip2
	(file suffix .bz2).  Requires that the bzip2/bunzip2 binaries be
	installed.

	Bug fixes:

	* Lattice class now creates completely empty lattices (no nodes).
	This avoids having to first remove a node when reading an actual
	lattice.  Empty lattices can be output, but not read (because at
	least an initial/final node has to be defined).
	* lattice-tool -ignore-vocab was not being used in conjunction with
	-viterbi-decode, -posterior-decode, -collapse-same-words, and lattice
	error computation.  Words to be ignored are now treated same as
	-noice-vocab in those operations.
	* Fixed a bug in lattice expansion whereby backoff weights were
	dropped at NULL nodes (problem noticed by Teemu Hirsimaki).
	* Fixed bug in reading of node-specific posterior probabilities
	in word meshes.
	* Fixed a bug in lattice-tool -read-mesh, which was not creating
	sentence initial/final tags on initial/final lattice nodes.
	* Fixed a bug in the LatticeFollowIter class that could cause incorrect
	results in LatticeLM (lattice-tool -ppl).
	* When outputting PFSG lattices in HTK format, map PFSG weights to
	HTK acoustic scores.  (But, as before, LM rescoring discards input
	PFSG weights and causes the probabilities to be output as LM scores.)
	* Scale wdpenalty values specified in lattice according to log-base.
	Also, scale -htk-wdpenalty specified on command line according to
	-htk-logbase (or default 10).
	* Correctly handle HTK score output with -htk-logbase 0.

	Portability:

	* Added workaround for compilers that don't support arrays of
	non-constant size (such as SunStudio and Visual C++). On these
	systems, Array will be used instead.

	* Added a new compilation option "_s" that triggers use of 2-byte
	integers for vocabulary indices and counts.  With compilers that
	implement __attribute__((packed)) correctly, this causes N-gram counts
	to use 1/3 less memory than in the default option, at some limitations
	in functionality.  First, only vocabularies of up to 64k words may
	be used.  Second, only up to 32k counts exceeding 32k may be stored.
	The latter is typically not a problem because in most natural data
	the number of very frequent words is small.
	Unfortunately, gcc does not currently handle __attribute__((packed))
	correctly, but Intel's icc does.

	* Tested on Linux for PowerPC-64bit.

	* Tested on Linux for x86_64, using gcc.

	* Minor tweaks for Intel icc 8.0.

	* Tested on Solaris-x86 using Sun Studio 11 compiler.
	Compilation still generates lots of warnings, but the resulting
	binaries work correctly.

	* Ported to Microsoft Visual C 7.0 (by Jing Zheng);
	See doc/README.windows-mscv.

	* gcc versions older than 3.4.3 are no longer supported, though
	they might still work.

1.5.0	31 July 2006

	Functionality:

	* Added support for a binary data format for N-gram backoff models
	which speeds up the reading of model files by a factor of 2
	for full models, and by an order of magnitude if -limit-vocab is used.
	Note that the binary format is machine architecture dependent.
	See the ngram -write-bin-lm option (contributed by Jing Zheng).

	* disambig now support Bayesian or standard interpolation of up to
	10 LMs, just like ngram and hidden-ngram.

	* Added disambig -factored option to support factored hidden tag LMs.

	* Added disambig -escape option to pass information unprocessed to
	the output, similar to hidden-ngram.

	* New utility script: split-tagged-ngrams, see training-scripts(1)
	man page.

	* New function Vocab::checkWords() for more efficient implementation
	of the ngram -limit-vocab functionality.

	* Modified compute-sclite to support scoring of overlapped speech
	with asclite program.

	* New NgramCountLM class implementing a mixture of count-based
	maximum-likelihood estimators (aka deleted interpolation aka
	Jelinek-Mercer smoothing).

	* ngram-count and ngram -count-lm options to implement deleted
	estimation and evaluation of NgramCountLM models.
	This option is also supported by hidden-ngram, disambig, and
	lattice-tool.

	* Added support for ngram counts stored in an indexed directory
	structure, based on a format developed by Thorsten Brants for data
	delivered to LDC by Google.  This data format can be used in
	conjunction with the NgramCountLM class, and may be generated
	from standard ngram count files using the make-google-ngrams script
	(see training-scripts(1)).

	* Added NgramStats::clear() function.

	* Added the limitVocab option to the NgramStats::read() function.
	In conjunction with NgramCountLM, this allows use of arbitrarily
	large N-gram statistic on limited test sets.

	* Added ngram-count -limit-vocab option.

	* Added hidden-ngram -vocab and limit-vocab options.
	Possible incompatibility: the -hidden-vocab wordlist must not contain
	the *noevent* word; it is added implicitly.

	* Added lattice-tool -write-vocab option to extract vocabulary from
	lattice files.

	* Added lattice-tool -init-mesh option to align lattice to preexisting
	confusion network.

	* Added an interface for vocabulary aliasing (name mapping) to
	the Vocab class, and the option -vocab-aliases to the programs
	disambig, hidden-ngram, lattice-tool, nbest-lattice,
	ngram-count, and ngram.  This allows direct use of LMs with
	slightly mismatched vocabularies relative to some test data.
	Also, added handling of the -vocab-aliases option to the
	rescore-decipher script, so that large name mapping files can
	be subsetted when -limit-vocab is in effect (so that only the
	relevant portions of an LM are loaded).

	* disambig now automatically limits LM reading to the words found in
	the map file (suggested by Jing Zheng).

	* hidden-ngram -bayes and -bayes-length options added to give more
	control over interpolation.

	* The default count type is now "unsigned long" intead of
	"unsigned int".  This makes no difference on 32-bit platforms,
	but on 64-bit platforms it allows the handling of data upwards of
	4.3 billion tokens (which would causes integer overflow on 32bit
	machines).

	* For 32-bit platforms, added a compile option "_l", which triggers
	use of 64-bit "long long" integers for count storage.
	This uses the XCount class to avoid needing extra memory for count
	storage, assuming that large count values will be sparse.

	Bug fixes:

	* Fixed a bug in the handling of -mix-lm[789] options in ngram,
	hidden-ngram and lattice-tool.  (With the -bayes option in effect,
	the -mix-lm6 argument was used for -mix-lm[789].)

	* Fixed memory management in the XCount implementation, which was
	giving incorrect results when compiling with OPTION=_s.

	* disambig no longer adds <s> and </s> tokens if input already
	contains them (consistent with ngram).

	* lattice-tool -read-mesh was broken in the previous release, now
	works again.

	* lattice-tool -density-prune and -nodes-prune now work without
	-posterior-prune being specified.

	* The -debug option was being ignored with ngram -null .

	* Fixed a bug in Vocab::remove(VocabString) that could be triggered by
	interactions between ngam -vocab and -vocab-aliases .

	* Tweaks to MACHINE_TYPE=msvc compilation.  updated documentation in
	doc/README.windows-cygwin and doc/README.windows-mscv.

	* Tweaked compiler flags for Solaris to handle files larger than 2^31.

	* Prevent possible NaN probabilities in ClassNgram.

	* Fixed a problem in make-ngram-pfsg triggered by a word named "BO".

	* Support long int key values in data structures.

	* rescore-decipher -filter option now works correctly in conjunction
	with -limit-vocab.

1.5.1	20 November 2006

	Functionality:

	* ngram-count -write-binary is a new option to create binary count
	files, which load much faster.  They are recognized automatically by
	ngram-count -read, and can be used in count-based LMs.

	* Revised binary backoff LM format (ngram -write-bin-lm) to use only
	a single data file and be machine-independent and somewhat more
	compact.  Reading the 1.5.0 binary format is still supported, but not
	writing it.

	* Added lattice-tool -bayes and -bayes-scale options for compatibility
	with ngram and other programs.

	* New lattice-tool -write-ngram-index option to generate an index of
	N-gram occurrences in a lattice.

	* New lattice-tool -multiword-dictionary option enables accurate
	handling of acoustic information (timestamps, pronunciations) when the
	-split-multiwords option is used (contributed by Dustin Hillard).

	* New nbest-optimize -insertion-weight and -word-weights options to
	implement weighted forms of word error optimization.

	* New option make-ngram-pfsg no_empty_bo=1 to disallow an empty (null)
	path through the PFSG via the unigram backoff.

	* New script get-unigram-probs to extract unigram probabilities from
	an LM file.

	Bug fixes:

	* Enabled large-file (64bit offsets) handling for Linux 32bit
	compilation.

	* Fixed utility and test scripts to support platforms that don't
	support compressed file I/O.  Check test/README for instructions.

	* Fixed bug in compute-sclite that could lead to failure if
	waveform names contain hyphens, or sort differently after mapping to
	lowercase.

	* Fixed another bug in compute-sclite that was preventing
	compare-sclite from working.

	* Fixed a typo-bug in Ngram::estimate that could cause problems in
	handling discounting errors, but in practice seems to have been
	harmless (from Federico Cesari).

	* Improved MSVC portability:
	  - fixed header file usage
	  - enabled binary file i/o for binary LMs
	  - fixed miscellaneous compiler warnings
	  - simplified build (see doc/README.windows-mscv)
	  - workaround in WordMesh.cc to avoid a compiler bug (from
	    Federico Cesari).

	* Fixed win32 (Windows gcc, not cygwin) build.

1.5.2	6 March 2007

	Functionality:

	* Support binary LM formats (based on Ngram binary format) for most
	LM classes.

	* New lattice-tool -htk-logzero option to set a dummy score to
	replace zero scores found in HTK lattices.

	Bug fixes:

	* Make sure Google ngrams can be read in both compressed and
	uncompressed format if platform supports both.

	* Make sure the file pointer is updated when reading binary Ngram LM.
	This enables reading multiple LMs from one file, and avoids errors
	reading binary class-LMs.

	* Avoid NaN values when a lattice score is infinity and the
	corresponding scale factor is 0 (the score is ignored in that case).

	* Avoid degenerate decoding results if lattice hypotheses contain
	-infinity scores. (Effectively, -infinity is replaced by a large
	negative log score, thus allowing the decoder to rank hypotheses based
	on their non-infinity components.)

	* Updated lattice-tool man page to clarify the interaction of
	LM rescoring and lattice decoding.

	Portability:

	* Added configuration for Solaris amd64 platform with
	Sun C compiler (amd64-solaris_spro).

	* Updated instructions for MSVC build (see doc/EADME.windows-msvc),
	based on imput from Mike Frandsen.
	Merge MSVC .manifest files into binary before installation.

1.5.3	28 July 2007

	Functionality:

	* New ngram-count -write-binary-lm option to output LM in binary format
	(avoids the need to dump ascii format first, and then convert to
 	binary using ngram tool).

	* New make-google-ngrams yahoo=1 option to read Yahoo ngram corpus
	(which needs to be sorted first, however).

	* New make-big-lm -ngram-filter option to pipe input counts through
	an arbitrary filter program (e.g., for format conversion).

	* The make-kn-discount utility will now try to estimate missing
	counts-of-counts based on their global statistics, using an empirical
	law: 		log f(k) - log f(k+1) = C / k for some constant C.
	Note this functionality is not implemented in the C++ code for KN
	discounting.  Therefore, it is only available when building LMs with
	make-big-lm.

	* New scripts tolower-ngram-counts and uniq-ngram-counts to help
	manipulate counts files.

	* New option ngram-count -write-vocab-index (for debugging).

	* Vocab.h: Increased maxWordLength constant from 256 to 1024.

	* Trie class can now initialize root node size with optional constructor
	argument (similar to other container classes).

	* LHash and SArray classes have a new function to preallocate space
	following construction (but before any data is inserted).

	* The platform "i686-p4" has been renamed "i686-icc" (Linux x86 with
	Intel compiler) for consistency.

	Bugs:

	* Fixed a buffer overrun problem triggered by nbest rescoring of
	empty hypotheses.

	* Fixed problem in compute-sclite with extraction of speaker labels
	from ctm files.

	* NBest class (affecting nbest-pron-score): strip Decipher-specific
	phone diacritic labels separated by underscores from pronunciation
	strings.

	* Fixed memory leak in Trie::removeTrie().  This was causing a leak
	in NgramLM deallocation.

	* Fixed a performance bug which caused the building of unigram
	hash tables to have quadratic time complexity (due to an unfortunate
	interaction between hash table iterators and hash functions).

    	* Made make-big-lm detect missing -read option and print usage message.
	Also, handles degenerate -kndiscount with -order 1 now.

	* Workaround for icc compiler error: optimization disabled for some
	files when using MACHINE_TYPE=i686-m64-icc.

1.5.4 	2 November 2007

	Functionality:

	* New option ngram-count -addsmooth for additive smoothing.
	A corresponding new discounting subclass "AddSmooth" is defined in
	Discount.h.

	* New option ngram -server-port to start a "probability server"
	(based on a contribution by Elad Dinur).

	* WordLattice: print lattice name in warning messages.

	* lattice-tool -keep-unk option to preserve labels of OOV words in
	LM rescoring (currently works only for HTK lattices).

	* New option nbest-optimize -anti-refs and -anti-ref-weight to
	decorrelate errors with another set of hypotheses.

	* New support in nbest-optimize for BLEU optimization and Powell search
	(from Jing Zheng).

	* New option ngram-class -save-maxclasses to start the saving of
	intermediate results when a specified number classes is reached
	(suggested by Shlomo Wavrow and Mats Svenson).

	Bugs:

	* Fixed incorrect reference output for test "nbest-rover-acoustic".

	* Fixed a possible problem with tests "ngram-class" and
	"ngram-count-lm-limit-vocab" in non-C locales.

	* nbest-lattice: Avoid aligning reference words with -dump-errors or
	-wer, which would cause crash because no lattice is being generated
	internally.

	* make-batch-counts, merge-batch-counts: be more portable by dynamically
	finding the right options to use with xargs.

	* add-pauses-to-pfsg: Avoid using a regular expression construct that
	causes a gawk error in UTF-8 locales.  However, to ensure this works
	correctly a gawk version of 3.1.5 should be used. See note in
	doc/README.linux.  If the test "make-ngram-pfsg" fails a workaround is
	to set LANG=C or LANG=en_US and avoid UTF-8.

	* Fixes an uninitialized member variable in the unary constructor for
	class File, which was causing garbage to be return on the first
	getline().

	* common/Makefile.machine.macos: Updated Tcl linking instructions
	(from Chuck Wooters).

	* Makefile: exit immediately if any of the subdirectories result in
	build errors.

1.5.5	6 November 2007

	Bug fixes:

	* Fixed Makefile problem in binaries depending on libraries that was
	preventing executables being generated on some platforms.

	* Fixed a compilation problem with MSVC for nbest-optimize.

	* Use MSVC _getpid() in ngram -generate random seed initialization.

1.5.6	2 January 2008

	Functionality:

	* New ngram -use-server option to run the client side of a network LM
	server as implemented by ngram -server-port.  Optionally, probabilities
	may be cached in the client (option -cache-served-ngrams).
	Mixtures of one or more network and file-based LMs are also possible.

	* Likewise, disambig, hidden-gram, and lattice-tool understand the
	-use-server option.

	* New LMClient class to implement the above (a stub LM subclass that
	queries a server for LM probabilities).

	* ngram -server-port now behaves like a true server daemon: it handles
	multiple simultaneous or sequential clients, and never exits (unless
	killed).  The number of simultaneous clients may be limited with the
	-server-maxclients option.

	* Support for 7-zip compressed files (suggested by Alexy Khrabrov).

	* lattice-tool -split-multiwords will now print a warning message
	about multiwords that were not split because their LM probability was
	non-zero.

	* LoglinearMix LM class supports n-way mixtures directly, giving more
	efficient implementation for n > 2 than recursive object construction
	in ngram (contributed by Tanel Alumae).

	Bug fixes:

	* MultiwordLM now implicitly adds all words to the vocabulary, so that
	previously unseen multiwords get split.  This has the side effect that
	OOVs will appear as zeroprob words.

	Documentation:

	* The doc/FAQ file has been expanded and reformated as a man page.
	It can be viewed with "man srilm-faq" or online at
	http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.html .
	The major content additions are questions about the build
	process, how to build a "Google N-gram LM", smoothing issues,
	and OOV-handling (the latter by Deniz Yuret).  Corrections and
	additions to this document are most welcome!

	* A new manual page ngram-discount(7) gives a detailed overview of
	smoothing methods found in SRILM (contributed by Deniz Yuret).

	* The conversion of man pages to html has been enhanced to better
	handle code samples and nested itemized lists.

1.5.7	14 October 2008

	Functionality:

	* make-big-lm -text option allows building of LMs that only contain
	N-gram contexts that are needed for a given test set, thus saving
	space.

	* ngram-count -intersect option allows reading of counts to be
	restricted to an N-gram subset.

	* NgramStats added a Boolean switch "intersect" and a method
	setCount(), used for implementing the above.

	* Allow changing the character used to compound multiwords, using the
	new option -multi-char with ngram, anti-ngram, nbest-lattice,
	nbest-optimize, nbest-pron-score, and several of the nbest-scripts.

	* New options -no-sos and -no-eos for ngram-count and ngram tools,
	to control the insertion of <s> and </s> tokens around sentences.

	* New lattice-tool -no-expansion option to decode a lattice with a
	new LM without first expanding the lattice (contributed by Jing Zheng).

	* New CachedMem mix-in class to implement a caching memory allocator
	(contributed by Jing Zheng).

	* Added lattice-tool -print-sent-tags option to preserve <s> and </s>
	tags in lattice output format, instead of mapping them to null nodes.

	Documentation:

	* Added redirecting http links to non-SRILM program documentation
	in manual pages.

	Portability:

	* Removed SRI-specific paths etc. from common/Makefile.machine.* .
	Added a mechanism that allows site-specific customizations to be
	recorded in common/Makefile.site.$MACHINE_TYPE to override definitions
	in common/Makefile.machine.$MACHINE_TYPE, without a need to change the
	latter.

	Bug fixes:

	* Always output the elements of binary count files and ngram LMs
	in index-sorted order (same as the _c program version).  This avoids
	poor performance when reading the data back in.

	* Fixed LMClient.h so it compiles on win32 and msvc platforms (even
	though it still doesn't do anything, since Unix sockets are not
	supported).

	* Process ngram-count -writeN options after applying count smoothing,
	so that the effect of any count modifications (e.g., by KN) is seen,
	and consistent with the -write option.

	* Fixed the timestamps on initial and final nodes of lattice-tool
	-operation or (bug found by gaojie@hccl.ioa.ac.cn).

	* NgramLM: Handle cases where interpolated discounting leaves no
	backoff probability mass.

	* AdaptiveMarginals: Now handles words that are added after LM was
	created. This can happen in N-best rescoring and would previously
	cause an assertion failure.

	* Fixed bugs in IntervalHeap memory allocation, which could
	cause problems in N-best generation from lattices  (from Jing Zheng).

	* Set LC_NUMERIC=C in make-big-lm to avoid problems with non-C
	locales for gawk scripts that compute discounting parameters.

1.5.8	10 May 2009

	Functionality:

	* merge-batch-counts -float-counts option for merging of fractional
	counts.

	* compare-sclite now includes statistical significance computation
	based on a matched-pair Sign test.

	* Added a Perl tool to compute the cumulative binomial distribution,
	contributed by Brett Kessler and David Gelbart.

	* Don't output LM server banner message for ngram -use-server -debug 0.

	* The LM::generateSentence() function now takes option argument to
	specify sentence prefix that is to be used to condition subsequent
	word generation (suggested by Alexy Khrabrov).  The default is to
	condition on <s> as before, or an empty context if no start-of-sentence
	tag is defined.

	* A new option ngram -gen-prefixes to read conditioning prefixes
	from a file, and generate random sentences based on them.

	* New options in nbest-optimize that modify -print-hyps output so that
	only unique hypotheses are included (-print-unique-hyps), and to print
	the original ranks of hypotheses (-print-old-ranks) (from Jing Zheng).

	* The -version option reports whether support for compressed files
	is available.

	* Added merge-batch-count -l option to control how many files to merge
	in each iteration.

	Bug fixes:

	* ngram-count, NgramLM: disable the Doug Paul smoothing hack (add one
	to denominator when smoothing results in 0 backoff mass) in contexts
	where the entire vocabulary has been observed.

	* nbest-optimize fixes to the -minimum-bleu-reference functionality
 	(from Jing Zheng).

	* Fixed nbest-optimize bug that was causing incorrect log output with
	gcc 4.x.

	* Output vocabulary index map in binary ngram count and LM format
	in numerical index order.  This avoids a performance bug whereby
	reading the data structures back into _c binary version could take
	a long time due to inefficient insertion order.

	* Fix ngram -counts with -use-server (from Ergun Bicici).

	* Fixed memory allocation bug in FLM tag vocabulary handling that could
	lead to crash when interpolating several FLMs.

	* Rewrote make-batch-counts scripts to
	  - avoid problems with limits on command line length
	  - support systems that don't have compressed file I/O.

	* Modified merge-batch-counts script to
	  - ensure that unmerged files are always merged in the next iteration,
	  to avoid file size imbalance (suggested by Alex Marin)
	  - support systems that don't have compressed file I/O.

	* Fixed a portability issue with Intel icc version 7.0.

	* compute-sclite fixed to invoke csrfilt.sh script with -t option.

1.5.9	24 August 2009

	Functionality:

	* Added ngram-count -text-has-weights option to scale counts on a
	per-sentence basis.

	* LMStats::countString() and NgramStats::countSentence() methods
	generalized to take optional weight string argument (to support the
	above change).

 	* Added compile-time option to generate position-independent code
	(make MAKE_PIC=yes, see INSTALL file).

	* Added support for xz-compressed files (.xz files offer better
	compression than .gz at the expense of time and memory).
	The xz tool has to be installed separately (http://tukaani.org/xz).

	Bug fixes:

	* wlat-to-pfsg generates NULL output labels for initial/final nodes
	with sentence start/end tags (because PFSGs encode those implicitly).

	* TaggedVocab: check and report if number of tags/words exceeds max.
	Make number of bits allocated for tags/words proportional to
	word size. Parse word/tag strings such that last (not the first)
	slash (/) character is treated as the delimiter.

	* Documented the lattice-tool -ngrams-time-tolerance option that had
	been previously implemented but omitted from the man page.

1.5.10	7 Jan 2010

	Functionality:

	* New option ngram -float-counts to allow the -counts option to
	process fractional counts.

	* The LM::pplCountsFile() and LM::countsProb() have been templatized
	(as a function of count type), and the TextStats class now uses double
	float counts, all in support of the above change.

	* New option lattice-tool -word-posteriors-for-sentences for computing
	word posteriors based on confusion networks (contributed by Jing Zheng).

	* lattice-tool now performs confusion network decoding and ngram
	computation AFTER rescoring or expansion with LMs.  Therefore the two
	operations can be combined in a single run where previously two
	invocations were necessary.

	* Added fsm-to-pfsg map_epsilon= option, to translate FSM <eps> symbols
	to another label.

	* New script filter-event-counts to preprocess a count file for use
	with ngram -counts .

	* lattice-tool continues processing when one of the lattices specified
	with -in-lattice-list cannot be opened.

	* Regression tests have been moved to module subdirectories
	(lm/test, flm/test, lattice/test) and can now be run from the
	top-level with "make test".  Decompression of data files for platforms
	that don't support compressed file I/O is now automatic.

	Documentation:

	* Added new FAQ items covering handling of OOVs and zeroprob words,
	based on input from Nitin Madnani.

	* Correction to the man page description of the ngram -count-order
	option:  It limits the maximal order of processed ngrams.

	* Corrected and updated ordered list of processing steps in
	lattice-tool man page.

	Bug fixes:

	* Use double precision to record log probs in TextStats object.

	* Workaround for a deficiency in Intel's 7.00 C++ compiler.

	* lattice-tool was not handling PFSG lattices in (1best or N-best)
	decoding with a LM.

	* lattice-tool will exit with a non-zero status if any of the lattice
	operations fail.

	* Fixed some format string/argument mismatches that could bite on
	64-bit platforms.

	* Updated usage of sort with key specification to conform to latest
	POSIX standard.  The old syntax was no longer working with recent
	GNU sort versions.

1.5.11	16 June 2010

	Functionality:

	* New program "maxalloc" to find the maximum amount of memory that
	can be allocated by a user process in the current environment.
	May be useful to debug out-of-memory conditions.

	Bug fixes:

	* Avoid deleting low-posterior null tokens when aligning lattices into
	word meshes.

	* Map explicit start/end-of-sentence tags in HTK lattices to null,
	since they are already implicitly attached to the start/end nodes
	of the lattice (LM scoring gives anomalous results on repeated tags).

	* option.[ch]: fixed declaration issues to avoid compiler warnings.

	* Moved man page for the option library functions to misc/doc.

	Bug fixes:

	* Fixes to compile cleanly with gcc -Wall -Wno-unused-variable
	-Wno-uninitialized.
	* Fixed a problem with gcc-4.4 compiles.
	* Fixed a problem with macro definition of fseeko() ftello().
	* Fixed a problem with the lm/ngram-count-wb-subset test, which could
	fail after the test data is uncompressed.
	* Use gzip -d to read gzipped files, avoids shell wrapper overhead.

1.5.12  20 Jan 2011

	Functionality:

	* Enable lattice-tool -old-decoding if -nbest-duplicates is specified
	(and warn about it).
	* Support make-big-lm -wbdiscount option.
	* New option ngram -prune-history-lm, for specifying a separate LM that
	computes the history marginal probablities needed for N-gram pruning
	purposes.  Inspired by C. Chelba et al., "Study on Interaction Between
	Entropy Pruning and Kneser-Ney Smoothing", Proc. Interspeech-2010.
	* Added optional limitVocab argument to VocabMultiMap::read() function.
	This is now used by lattice-tool -limit-vocab to avoid reading parts of
	the dictionary that are not used in the input.
	* Added an option -zeroprob-word to ngram and lattice-tool. It
	specifies a word that should be used as a replacement if the current
	word has probability zero.  This is different from -map-unk which only
	applies to OOV words and actually replaces the word label in the output
	lattice, if any.
	* Added new wrapper LM class NonzeroLM, to implement the above.

	Portability:

	* New MACHINE_TYPE values for Android-ARM platform: android-armeabi and
	android-armeabi-v7a (from Mike Frandsen).
	* Deleted the htk directory from distribution; it was obsolete and not
	documented.

	Bug fixes:

	* Prob.h: guard against under/overflow in intlog and bytelog
	conversions.
	* Replaced gunzip with gzip -d in all scripts (for efficiency).
	* Better option checking in make-big-lm, disallowing mixing of
	discounting methods and use of discounting flags that are not supported.
	* Undefine max() macro in Trellis.h to avoid conflict with some system
	header files.
	* Better support for recent MSVC versions in
	common/Makefile.machine.msvc (from Mile Frandsen).
	* add-pauses-to-pfsg: prevent existing pause nodes from being processed.

1.6.0  8 December 2011

	Functionality:

	* Added lattice-tool -loglinear-mix option.
	* Add platform-independent strtok_r() function, and replaced all
	instances of strtok().
	  Eventual goal is thread safety and re-entrance.
	* Modified File object to allow I/O to/from strings as well as files.
	* Modified code for reading and writing HTK lattices and NBest lists to
	enable I/O to/from strings as well as files, for in-memory processing.
	* Added special-purpose malloc/free implementation for SArray and LHash
	data structures, to reduce overhead for small allocation chunks.  Also
	added some allocation statistics reporting (enabled by ngram -memuse
	-debug 1).
	* Added the metadb config file lookup tool.
	* Cumulative binomial script (cumbin) command accepts optional 3rd
	argument to set p parameter.

	Bug fixes:

	* Correctly handle lattice-tool -use-server when generating nbest lists
	(server- based LM was previously ignored).
	* lattice-tool -split-multiwords no longer splits words appearing in
	-ignore-vocab.
	* lattice-tool allowed to operate on HTK lattices containing unrecognized
	header fields (but warn about them).
	* Updated reference output for many build platforms to avoid spurious
	test failures.
	* Avoid abnormal backoff weights when lower-order probabilities sum to
	almost one.
	* Avoid test failures for merge-batch-counts and make-ngram-pfsg due to
	locale differences.
	* Fix maxalloc for 64bit systems where "long" is still 32 bits.

	Building:

	* Added Microsoft Visual Studio 2005 projects, see
	doc/README.windows-msvc-visual-studio for more information.
	* Added new Makefile targets superclean and pristine to return
	SRILM to pre-build state.
	* Add Makefiles for MACHINE_TYPE macosx-m32 and macosx-m64 to
	allow explicit 32- or 64-bit compilation on MacOS X 10.6.  Updated
	GAWK location to allow tests to succeed.
	* Replaced various C-shell helper scripts in sbin/ with Bourne-shell
	versions, for greater portability.
	* New MACHINE_TYPE=msvc64 for 64bit builds with Visual Studio.

	Documentation:

	* Added doc/asru2011-srilm.pdf, a paper describing SRILM updates since
	2002.  Old ICSLP paper renamed to doc/icslp2002-srilm.pdf .

1.7.0	23 December 2012

	Functionality:

	* ngram -codebook option for reading of Ngram LMs with quantized parameters
	(contributed by Microsoft).
	* ngram -msweb-lm option for obtaining LM probabilities from the Microsoft
	Web N-gram service (web-ngram.research.microsoft.com). You need to obtain
	a user ID to use this service, see man ngram for details (contributed by
	Microsoft).
	* Added support for dictionary-induced word distance metrics to
	nbest-optimize (-dictionary option).
	* Added support for matrix-defined word distance metrics to
	nbest-optimize (-distances option).
	* ngram -debug 4 -ppl outputs ranking statistics (number of times correct
	word was in top 1, 5, 10), as well as quadratic and absolute loss averages
	(based on code from Omid Madani).
	* nbest-optimize accepts n-best list in SRInterp format and generates
	SRInterp format rover-control file (weights file), when -srinterp-format
	is specified.
	* nbest-optimize accepts SRInterp counts file that contains BLEU and TER
	counts info.
	* lattice-tool -read-mesh will try to preserve acoustic information
	(times, scores, pronunciations) if they are encoded in the input confusion
	network.
	* Support reading of text files in UTF-8 and UTF-16 encodings.  All string
	data is internally represented, and output, as ASCII/UTF-8 (contributed
	by Microsoft).
	This feature uses the iconv library.  Support for this feature can be
	disabled by compiling with "NO_ICONV=anything" on the make command line.

	Portability:

	* Ported LM client/server code to Winsock API (native socket library in
	Windows), enabling this functionality for mingw and MSVC platforms
	(contributed by Microsoft).
	* Let machine-type script return 64bit platform names for Linux and Solaris
	x86 when appropriate.  This implies that 64bit binaries are built by
	default on machines that support them.
	* Array.h tweak for clang compiler (from kutlak.roman@gmail.com).
	* Work around a namespace problem in C++11 (from kutlak.roman@gmail.com).
	* Use size_t for hash codes to ensure word width matches pointer type.
	* Fixes for mingw32 build, using Windows APIs for sockets and UTF
	conversion (contributed by Microsoft).
	* Support for 64bit mingw build (MACHINE_TYPE=win64).
	* Updates for MacOSX (MACHINE_TYPE=macosx, thanks to Chuck Wooters).
	* Deal with nonportability of isfinite() and isnan().
	* Changes for thread-safety (by Kyle McIntyre). See doc/README-THREADS
	for details.
	- Modified the remove() methods in various container classes to return
	Boolean instead of a pointer to the removed element.  The removed element
	can be gotten with an optional reference argument. This eliminates the
	need for a global static variable.
	- Use STL sort() instead of qsort() in LHash and SArray sorted iterations.
	- Replaced all static variables with thread-local storage via the TLSWrapper
	class, requiring the pthread library. This is available on most platforms,
	but can be disabled at compile-time with -DNO_TLS.

	Bug fixes:

	* NgramLM backoff computation fixed to avoid spurious insertion of nonzero
	unigram probabilities and non-unity backoff weights (resulting from
	numerator/denominator values below Prob_Epsilon).
	* lattice-tool does a better job inferring the lattice basename from the
	UTTERANCE string embedded in HTK lattices.
	* Trellis class: use a secondary sorting criterion to make N-best output
	deterministic.
	* WordMesh class: use posterior word probability to decide which acoustic
	information to keep when merging hyps, instead of duration-normalized
	acoustic stores as before.  This leads to fewer words with out-of-order
	timestamps when extracting one-best from confusion networks.
	* fix-ctm script: Check for out-of-order word timestamps and adjust them
	minimally as needed to produce a monotonic sequence, as required for
	CTM sorting.
	* Fixed bug in NgramCountLM estimation procedure reported by ariya@jhu.edu.
	* Allow ngram -hidden-vocab to read hidden event properties described in
	man page.
	* Fixed bug in ngram -hidden-vocab -write-lm output.
	* Avoid crash when ngram -hidden-not -ppl is used with debug level 2.
	* Fixed (very rare) bug by which ngram -prune might remove all ngrams
	sharing a common context.
	* Improved ngram -prune-lowprobs by also removing backoff weights that
	have become useless (suggested by Arlo Faria).
	* Check for successful search for HTK lattice start/end nodes, if not
	explicitly specified (reported by nshmyrev@yandex.ru).
	* Handle infinity scores in lattice rescoring, and catch NaN scores when
	reading HTK lattices.
	* make-kn-discounts checks for negative discount values and reports
	error if appropriate.
	* nbest-optimize accepts combined BLEU and error rate objective via switch
	-error-bleu-ratio R (R specifies the error rate weight).
	* lattice-tool -timeout option now uses sigsetjmp/siglongjmp to handle
	timeout alarms.  This is necessary in Linux-compatible (including cygwin)
	systems to handle alarms repeatedly.
	* Fixed a bug reading NBestList2.0 format without phone information (led
	to malformed confusion network output).
	* Fixed a bug in Ngram::contextID() that was causing incorrect expansion
	of lattices with pruned backoff models.
	* Fixed a bug in the lattice-tool -keep-unk implementation that was
	sometimes allowing an OOV word label to be output as <unk>.
	* Removed some pseudo-randomness in ngram-class so that results are more
	invariant to OPTION setting and platform properties.
	* Avoid differences due to machine arithmetic in word mesh alignment,
	making confusion network building and posterior decoding more stable
	across platforms.
	* Exclude metatags when writing out the vocabulary of binary Ngram LMs.
	* Fixed some missing dependencies in Visual Studio solution file.

1.7.1	4 June 2014

	* Updated INSTALL, Copyright.  Added ACKNOWLEDGEMENTS.

	Functionality:

	* Integrated the maximum entropy extension by Tanel Alumae, described
	at http://www.phon.ioc.ee/~tanela/srilm-me/ .
	Please cite Tanel's paper (copied here in doc/is2010-maxent.pdf) if you
	use this functionality in your research.
	* Enable LM server to process multiple commands in a single message
	(separated by newlines).  This capability was never documented, but
	existed in the first implementation that used read/write system calls,
	but was lost when we switched to recv/send calls.
	* Generalized the BayesMix LM class to allow an arbitrary number of
	mixture components, similar to LoglinearMix.
	* Added the ngram -context-priors option to read context-dependent
	mixture weight priors from a file.
	* Added the ngram -read-mix-lms option to read the list of interpolated
	LMs, weights and options from a file, specified by the -lm option.
	* Use zlib for I/O from/to gzipped files. Benefits are: (a) works with
	native Windows binaries, (b) avoids subprocess, (c) allows reading
	(though still not writing) of gzipped binary LM and count files.
	* ngram-count -gtNmin options accept floating point values for more
	flexibility with LM estimation from fractional counts.
	* Added lattice-tool -set-lattice-names option to preserve input
	filenames inside lattices.
	* New script replace-unk-words, for replacing OOV words relative to
	a vocabulary with <unk> tag.
	* Added new lattice-tool options -hyp-list -hyp-file -hyp2-list
	-hyp2-file -add-hyps to add ASR hypotheses into word mesh (confusion
	network). The added options are similar to -ref-list -ref-file -add-refs,
	except that the added hypothesized words will not be indicated as
	reference words in the word mesh.
	* Added a function in WordMesh to compute slot-to-slot alignment
	between two confusion networks.
	* Added ngram-class option to limit number of words per class (from
	seppo.enarvi@aalto.fi).

	Portability:

	* Added support for 64bit cygwin builds (MACHINE_TYPE=cygwin64).

	Bug fixes:

	* ngram -rescore-ngram was not setting the handling of special word
	tokens (<s>, </s>) if the rescored LM was being evaluated in the same
	run.
	* ngram-count -skip needs to read counts one order higher than specified
	by -order .
	* SkipNgram will now try to reestimate the discounting parameters from
	expected counts on each EM iteration (but fall back on initial parameters
	if that fails, e.g., for discounting methods that cannot handle float
	counts).
	* SubVocab instances' handling of metatags and nonevent words is now
	tied to the base Vocab instance.
	* Avoid anomalies in random word generation due to nonzero probabilities
	for nonwords.
	* Cleaned-up select-vocab script from Anand Venkataraman.  Now works
	with perl 5.12 and gives consistent results on different platforms.
	Added a test case.
	* Fixed removeTrie() bug that was leading to memory leak in Ngram
	destructor.
	* Fixed bug in LHash iterator that lead to potential double enumeration
	of items after deletions, and could affect Ngram pruning results.
	* Allow number of ngrams in ARPA LM to exceed 2^31. (Vocabulary size
	is still limited to 2^32.)
	* Initialize key and data objects in SArray and LHash containers after
	allocation.
	* Pass Trellis state parameters by reference to avoid copying of
	potentially complex objects.
	* Fixed memory access error in Ngram::clear() for order-1 models.
	* Fixed a problem handling null string states in Trellis.
	* Fix to preserve double precision in NBest acoustic and LM scores.
	* Fixed an error concerning the use of -gtNmin options in the srilm-faq(7)
	man page pointed out by dugast@systran.fr.
        * If a lattice-tool input lattice is a word mesh, avoid calling
	alignLattice() since the input is already a word mesh.
	* Fixes to reading/writing of quantization codebook files.
	* Fixed header comment and test program for Map2::remove().

1.7.2	9 November 2016

	Functionality:

	* Added interfaces to Lattice and WordMesh that allows external programs
	to map sausage nodes to their original lattice nodes.
	* New VocabDistance subclass StemDistance, comparing words only based on
	their stems.
	* New lattice-tool option -stem-dist triggers StemDistance use in
	confusion network alignments, including -add-hyps and -add-refs processing.
	* Add optional support for keyword spotting (in Lattice.h and
          LatticeIndex.cc) when writing a 1-gram index.
  	* Added new File field NBestOptions::nbestRttm2, if it exists then write
          (an approximation to) the NBestList2.0 format output.
	* Added simple Trellis pruning based on relative thresholding of forward
	probabilities (Trellis::prune()).
	* make-big-lm now understands the -ukndiscount option. The make-kn-discounts
	helper script has an option to compute unmodified KN discounts.
	* The -version option now reports the compiler version used.
	* Added ngram-count -write-text option to test conversion of UTF-16 files
	to ASCII/UTF-8.
	* Added ngram -text-has-weights option to allow weighting sentences in ppl
	computation.
	* Added scripts nbest-words and compute-sclite-nbest for conveniently
	computing nbest-optimize -errors information using sclite.
	* Added the nbest-optimize -xval-files option to support cross-validation.
	* Added script search-rover-combo for searching for best combination among
	a list of systems.
	* Added confidence value fields to NBestWordInfo class.
	* Added check to compute-best-mix to warn about word label mismatches between
	input files.

	Portability:

	* Honor TMPDIR environment variable in various scripts.
	* Miscellarous MacosX fixes.
	* Include BSD rand48 functions so that random sentence generation gives same
	result on all platforms.

	Bug fixes:

	* Avoid leaky backoff by mapping very small probability sums to 0 in BOW
	computation.  Otherwise unseen ngrams may end up with nonzero probabilties
	in unsmoothed LMs.
	* Fixed compare-ppls compute-best-mix compute-best-sentence-mix ppl-from-log
	to recognize the MSVC representation of -infinity.
	* Fixed a bug in the handling of zero prefix probabilities in ClassNgram,
	HiddenNgram and HMMofNgrams.
	* Fixed a memory allocation bug that caused the ngram-count-maxent test
	to crash.
	* Fixes to lattice-tool rttm nbest output.
	* Fix for possible endless loop in lattice-tool -posterior-prune due to
	limited float precision (from Seppo Enarvi).
	 * Fixed a problem with declaration of Map_nokeyP() that takes reference
	arguments and were missing "const"; was causing crash in segment tool.
	* Workaround for what looks like an optimizer bug in gcc >= 4.9 that can
	cause ngram -prune to core dump.
	* Output TextStats quantities (sentence/word counts, log probs, perplexities),
	model parameters, nbest and lattices scores, and other quantities with full
	precision so as to avoid loss of information.
	* nbest-optimize -1best now outputs a rover-control file that simulates
	Viterbi decoding (by using a small posterior scale).
	* nbest-optimize -errrors now tolerates varying number of reference words
	for the same sentence.  This can arise from sclite references with alternate
	words strings.
	* Fixed a stupid bug in uniform-classes.gawk script.
	* Allow combine-rover-controls to merge control files with the same systems
	in them, adding their weights.
	* Updated zlib to version 1.2.8.  This fixes a bug whereby gzipped output files
	could end up with zero size (instead of a legal gzipped file that results in a
	zero-length file when decompressed).

1.7.3.	9 September 2019

	Functionality:

	* Added nbest-oov-counts script to generate OOV counts for nbest hypotheses.
	* Added a simple mechanism for weight tying in nbest-rover control files.  A
	system weight of = indicates that it should be tied to the previously listed
	system.  This is useful for reducing the number of free parameters when
	searching for good system combinations (search-rover-combo).
	* Add Map_noKey() and Map_noKeyP() for unsigned long long type, to enable use
	with size_t on Windows MSVC.
	* Output from -version now includes compile-time options.
	* Added option ngram -minbackoff to fix up models that have unnormalized
	probabilities or that are not smoothed.
	* Added option ngram -unk-probs to override unknown word probabilities.
	* Added nbest-optimize-args-from-rover-control script, convenient for
	extracting initialization parameters for nbest-optimize from existing
	nbest-rover control file.
	* Added ngram-count -text-has-weights-last option to allow text input with
	count values at ends of lines.
	 * Added nbest-rover -missing-nbest option to treat missing nbest lists as if
        an empty hypothesis (no words) had been output, rather than simply skipping
        that nbest list.
	* Added nbest-lattice -time-penalty option, implementing a soft constraint
	on time stamps (when present) during confusion network building and alignment.
	* Added nbest-lattice -average-times option, to average word times instead
	of picking the timing of the highest posterior hypothesis.
	* Added nbest-lattice -suppress-vocab option to disallow certain words in
	posterior decoding.
	* New scripts concat-sausages for chaining word confusion networks together.
	* Added nbest-lattice -dump-lattice-alignments option to output mappings
	between sausage positions and alignment costs.
	* Updated Android build for 64-bit development for armv8 using NDK r20 and clang.
	This almost certainly breaks the 32-bit build for armv7.  The last known good 32-bit
	build is in common/Makefile.core.android.r11c, last built using NDK r11c.  To use this,
	copy Makefile.core.android.r11c to Makefile.core.android.  See doc/README.android.

	Bug fixes:

	* Added a new tool nbest-rover-helper that combines the functions of the
	combine-acoustic-scores and nbest-posteriors scripts, doing these computations
	in double precision and faster. nbest-rover now uses this tool (except when
	certain options like -nbest-backtrace are used).
	* nbest-rover strips DOS end-of-line CR characters from the control file, so
	they no longer mess up the parsing of the file.
	* Rationalize the way ties are broken when decoding word confusion networks.
	The word with the lowest internal index is now preferred (and the *DELETE* token
	always comes before all other words), unless the new nbest-lattice option
	-random-tie-break is given.  The output order of alternative word hypotheses
	to sausage files is always by probability rank first, then by internal index.
	* The reverse-ngram-counts script now replaces <s> with </s> and vice-versa,
	as required for training reverse-direction LMs, and consistent with reverse-text.
	* Handle comment lines starting with '##' and empty lines in nbest-rover control
	files the same way as in File::getline(), i.e., ignore them.
	* Fixed the syntax for the nbest-optimize -dynamic-random-series options (now
	starts with single dash, as described in man page).
	* Don't let compute-best-mix complain about word mismatches if <unk> is involved.
	* Cast input to isspace() to (unsigned char) to guarantee input is non-negative.
	* Fixed memory management problems in MEModel.
	* Work around a bug in zlib's gzprintf() printing of very long %s arguments; was
	causing long word strings not to be output into .gz files.
	* Removed word string length limit.
	* Removed limit on total line length in outputting ngram count files.
	* Zlib updated to version 1.2.11.
	* nbest-posteriors ensures that bytelog scores are output in fixed-point format.
	* Allow floating point values when parsing bytelog scores in nbest lists.
	* Most robustness to word sausages input files that have missing data for some
	position.
	* Fixed a performance bug when nbest-rover is invoked with -output-ctm option.

$Date: 2019/09/09 23:09:32 $