lader - Latent Derivation Reorderer

lader is a program that is able to train and use discriminative parsers to improve machine translation reordering. It is unlike other parsers in that it can be trained directly from aligned parallel text with no annotated syntax trees. Using it for translation between language pairs with very different word order can greatly improve translation accuracy.

lader was developed mainly by Graham Neubig during his period as an intern at NICT. Hwidong Na has also done a large amount of work on improving training speed on a single machine, and Jeremy Gwinnup has contributed code for parallel training.

If you would like more details about the method or want to cite lader in your research, please reference:

Inducing a Discriminative Parser to Optimize Machine Translation Reordering
Graham Neubig, Taro Watanabe, and Shinsuke Mori
Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL 2012)

You can also read about improvments in training time in:

A Discriminative Reordering Parser for IWSLT 2013
Hwidong Na and Jong-Hyeok Lee
International Workshop on Spoken Language Translation (IWSLT 2013)

Download/Install
Documentation
FAQ:
Development/Contact

Download/Install

Download

Latest Version: @github

Past Versions: lader 0.1.6 lader 0.1.5 lader 0.1.4 lader 0.1.3 lader 0.1.2 lader 0.1.1 lader 0.1.0

Models: (These may work with 0.1.5, but not the latest version)

KFTT Japanese-English (trained on KFTT data)
KFTT English-Japanese (trained on KFTT data)

The code of lader is distributed according to the Eclipse Public License, v 1.0, and can be distributed freely according to this license.

Install

On Linux, Mac OS X, or Cygwin, download the source code, and install using the following commands.

tar -xzf lader-X.X.X.tar.gz
cd lader-X.X.X
./configure
make
src/bin/lader --help

If this prints a help message, lader is working properly.

Program Documentation

An example of how to run the program in included in the "example/" directory of the download. There are two well-documented scripts (train-model.sh and test-model.sh) showing how to train and use the reorderer. If you want more information about how to define the feature set for lader, you can visit the features page for more details. You can also find a full training script for a machine translation system using lader at the Kyoto Free Translation Task web site.

FAQ

Training lader is too slow!

lader can be trained more quickly at the expense of (greatly) increased memory use by enabling the -save_features option. This will only calculate the features for each parse tree in the training data once, and re-use these after the second iteration. The total amount of memory necessary will depend on the length of your sentences and number of features used, but with the default feature set you should expect to use about 5 megabytes per sentence.

There is also a script script/trainLaderParallel.pl by Jeremy Gwinnup that will allow you to train lader in parallel on a SunGrid engine. When you run this script, be sure to set the -learner perceptron option of train-lader, as the default learner (Pegasos) does not play well with parallelization and parameter mixing. Better documentation is forthcoming.

The speed of lader training could also be greatly improved through a better feature representation. If you are a developer and interested in helping with this, contact me and I will help you start out.

Running lader is too slow!

lader can be simply parallelized by using the -threads option. If this is still too slow, you can split the data and run lader on multiple machines if you have them at your disposal. Also, very long sentences (80 words or more?) can take more time, so filtering the data to remove these sentences will also increase speed.

Like training, parsing speed will also be improved with a new feature representation, so if you are a developer that wants to help please contact me.

What do the parse trees produced by lader look like?

You can see the output of the parser in English or Japanese. This is on the held-out test set, so the trees may include some errors. (You can visualize trees with my simple script and NLTK or a TreeBank viewer).

Development

Contributors

Graham Neubig (main developer)
Hwidong Na (speed improvements)
Jeremy Gwinnup (parallel training)

If you are interested in participating in the lader project, please send an email to neubig at gmail dot com.

Revision History

Future Features/Known Issues

Feature hashing to improve efficiency (see [Nakagawa 2015])
Better documentation of the parallel training.

Version 0.1.6 (2015/8/30)

Fixed a bunch of compile errors on newer compilers

Version 0.1.5 (2013/8/15)

Made more verbose error messages for bad alignment files
Made it possible to train with the simple perceptron
Added some scripts for parallelization (thanks to Jeremy Gwinnup!)

Version 0.1.4 (2013/1/17)

Fixed compile errors on Mac OS and Windows (Cygwin)

Version 0.1.3 (2012/11/20)

Changed the default of the -save_features option to off, as it uses too much memory and may surprise people.

Version 0.1.2 (2012/10/2)

Fixed an error with missing scripts in the distributed file (thanks to Isao Goto for pointing this out!).

Version 0.1.1 (2012/9/19)

Fixed an error in the compilation of 0.1.0 (thanks to Katsuhito Sudoh for pointing this out!).
Added an option to do non-loss-augmented inference.

Version 0.1.0 (2012/7/27)

Initial release!