Travatar Model Format

This page will tell you the basics of the Travatar model format, and give some tips about how to modify the model if necessary.

Model Format Basics

After training, the travatar model can be found in train/model/rule-table.gz, where "train" is the output directory specified at training time. The format of the rules in the table is as follows:

source ||| target ||| features ||| counts ||| alignments

Below is an example of a rule:

vp ( vbd ( "expected" ) x0:sbar ) ||| x0:sbar "予想" "し" "た" @ vp ||| fgep=-6.07 w=3 egfp=-4.52 fgel=-3.11 p=1 egfl=-13.28 lfreq=2.22 ||| 1 10 47 ||| 0-0 0-1 0-2

In particular, with regards to the features, these are the standard dense features used in standard machine translation systems (take a look at Philipp Koehn's "Statistical Machine Translation" for more details). They are described briefly below:

Modifying Travatar Models

If you would like to modify the travatar model for whatever reason, it is in text format so you can do so directly. For example, let's say we really don't want to translate anything about apples, we can remove all rules that contain the string apple.

zcat train/model/rule-table.gz | grep -v apple | gzip > train/model/no-apples.gz

It is also possible to add new rules. Let's say we want to add a rule to translate the proper noun "apple" into "アップル" (the company) and the regular noun "apple" into "りんご" (the fruit). We can do so by creating a file apple.txt:

echo 'nnp ( "apple" ) ||| "アップル" ||| apple_rule=1' >> apple.txt
echo 'nn ( "apple" ) ||| "りんご" ||| apple_rule=1' >> apple.txt

We can then combine this file with our original rule table. Note that Travatar rule tables must be sorted, so we perform a sort on the newly concatenated table.

gzip apple.txt
zcat train/model/rule-table.gz apple.txt.gz | LC_ALL=C sort | gzip > train/model/with-apples.gz

After creating a new rule table, we can then modify the travatar.ini file to point to our new table.