Here is a brief guide on how to build the SRI LM tools and associated libraries. 1 - Unpack. This should give a top-level directory with the subdirectories listed in README, as well as a few documentation files and a Makefile. For an overview of SRILM, see the paper in doc/paper.ps . For reference information, look in man/html . 2 - Set the SRILM variable in the top-level Makefile to point to this top-level directory (an absolute path). 3 - You need a Linux (i686/x86_64), Mac OSX, Solaris (i386 or amd64), SunOS (Sparc or i386), IRIX 5.x, Alpha OSF, or CYGWIN platform to compile out of the box. For other OS/cpu combinations you will have to modify the sbin/machine-type script to detect (and name) the platform type, and create a file common/Makefile.machine. that defines platform-dependent makefile variables. As a workaround, the MACHINE_TYPE variable can also be set on the make comand line make MACHINE_TYPE=foo ... in which case no changes to sbin/machine-type are needed. Some platform-specific notes may be found in doc/README.. Even on the known platforms you might have to modify variables defined in common/Makefile.machine. . Candidates for changes are CC, CXX: choose compiler or compiler version. For example, you might have to specify a directory path to the compiler driver. PIC_FLAG: define if your compiler uses something other than -fPIC to generate position-independent code. In particular, define this to be empty if your compiler does not generate PIC, or does so by default. DEMANGLE_FILTER: If the "c++filt" program is not installed on your system set this variable to empty. TCL_INCLUDE, TCL_LIBRARY: to whatever is needed to find the Tcl header files and library. If Tcl is not available, set NO_TCL=X and leave the above variables empty. NO_ICONV: Set this variable to anything to turn off 16-bit unicode support and linking with the iconv library. It is recommended that you record changes to platform dependent variables in common/Makefile.site. and leave Makefile.machine. unchanged. That makes it easier to upgrade SRILM to future releases (just copy common/Makefile.site. to a new installation). 4 - You need the following free third-party software to build SRILM: - gcc version 3.4.3 or higher, or Microsoft Visual Studio. (older versions might work as well, but are no longer supported). SRILM is occasionally tested with other compilers, see the portability notes in the CHANGES file. - Optionally, the libLBFGS optimization library if you want to build maximum entropy models. If so, build and install libLBFGS separately and set HAVE_LIBLBFGS=1 in the platform-specific makefile (see above). - GNU make - An iconv library, such as the GNU implementation, unless libiconv is already part of your C library. - John Ousterhout's Tcl toolkit, version 7.3 or higher (this is currently used only for some test programs, but is needed for the build to go through without manual intervention). - Additional platform-dependent prerequisites are mentioned in doc/README.-, e.g., doc/README.windows-cygwin. The following tools are needed at runtime only: - GNU awk (gawk), to interpret many of the utility scripts - gzip, to read/write compressed files - bzip2, to read/write .bz2 compressed files (optional) - p7zip, to read/write .7z compressed files (optional) - xz, to read/write .xz compressed files (optional) For Windows, you will need the CYGWIN UNIX compatibility environment, which includes all of the above. The MinGW and Visual C platforms will also work, but with some loss of functionality. See doc/README.windows-* for more information. Links to these packages can be found on the SRILM download page (http://www.speech.sri.com/projects/srilm/download.html). 5 - In the top-level directory, run gnumake World or make World (if the GNU version is the system default) This will create the directories bin/ lib/ include/ build everything and install public commands, libraries and headers in these directories. Binaries are actually installed in subdirectories indicating the platform type. To create binaries for a platform that is not the default on your system, use make MACHINE_TYPE=xxx, e.g. make MACHINE_TYPE=i686-m64 World # 64-bit binaries for Linux make MACHINE_TYPE=msvc World # MS Visual C++ on Windows 6 - The result of the above should be a fair number of .h and .cc files in include/, libraries in lib/$MACHINE_TYPE, and programs in bin/$MACHINE_TYPE. In your shell, set the following environment variables: PATH add $SRILM/bin/$MACHINE_TYPE and $SRILM/bin MANPATH add $SRILM/man 7 - To test the compiled tools, run gnumake test from the top-level directory. This exercises the most important (though not all) functionality in SRILM and compares the results to reference outputs. If discrepancies are reported, examine the output files in $SRILM//test/output and compare them to the corresponding files in $SRILM//test/reference, where is a subdirectory name (lm, flm, lattice). 8 - After a successful build, clean up the source directories of object and binary files that are no longer needed: gnumake cleanest 9 - (Optional) To build versions of the libraries and executables that are optimized for space rather than speed, run gnumake World OPTION=_c gnumake cleanest OPTION=_c The libraries will appear in ${SRILM}/lib/${MACHINE_TYPE}_c, with executables in ${SRILM}/bin/${MACHINE_TYPE}_c . The data structures used in these versions use sorted arrays rather than hash tables, which wastes less memory, but is also somewhat slower. The directory suffix "_c" stands for "compact". Other versions of the binaries can be built in a similar manner. The compile options currently supported are OPTION=_c "compact" data structures OPTION=_s "short" count representation OPTION=_l "long long" count representation OPTION=_g debuggable, non-optimized code OPTION=_p profiling executables In addition, if libraries with position-independent code are needed, add MAKE_PIC=yes to the make command line. This may incur a slight performance penalty but is necessary for certain software projects that link against SRILM libs. 10 - Recent versions of gawk may not perform correct floating-point arithmetic unless either LC_NUMERIC=C or LC_ALL=C is set in the environment. This affects many of the scripts in utils/. 11 - Be sure to let me know if I left something out. Andreas Stolcke stolcke@speech.sri.com $Date: 2014-03-24 17:57:28 $