prontron - PRONunciation percepTRON

by Graham Neubig


prontron is a tool for pronunciation estimation, mainly focusing on the pronunciation of Japanese unknown words, but written in a general way so it can be used for any string-to-string conversion task. I created it as a quick challenge to see if I could apply discriminative learning (the structured perceptron) to Japanese pronunciation estimation, but I am posting it here in case anybody will find it useful.


Latest Version: prontron 0.1

Bleeding-Edge Code: @github
Past Versions: None yet!

The code of prontron is distributed according to the Common Public License v 1.0, and can be distributed freely according to this license.

Using Prontron

Estimating Pronunciations with prontron

To estimate the pronunciation of words with prontron, you can use the models included in the model directory. If you have a file input.txt with one word per line, run the program as follows:

$ model/model.dict model/model.feat < input.txt > output.txt

This will output pronunciations, one per line, into output.txt.

Training prontron

Prontron training is a two step process. First, you have to build a dictionary of "subword/pronunciation" pairs, then run weight training.

First, create two files train.word and train.pron that contain words and their pronunciations. Then run the alignment program to create a dictionary model/model.dict of subword/pronunciation pairs:

$ train.word train.pron model/model.dict

You can add more entries to the dictionary if you notice that anything important is missing. Also another tool like mpaligner could also be used, although we haven't tried it. Next, we train the feature weights model/model.feat using the perceptron algorithm (note that this will take a while).

$ train.word train.pron model/model.dict model/model.feat

That is it! Both of these programs have a number of training options (mins and maxes should be the same for both.

    -fmin  minimum length of the input unit (1)
    -fmax  maximum length of the input unit (1)
    -emin  minimum length of the output unit (0)
    -emax  maximum length of the output unit (5)
    -iters maximum number of iterations (10)
    -word  use word units instead of characters only:
    -cut   all pairs that have a maximum posterior probability 
           less than this will be trimmed (0.01) only:
    -inarow  skip training examples we've gotten right
             this many times
    -recheck re-check skipped examples in this many times 

How Does it Work?

Prontron uses discriminative training based on the structured perceptron. This is good, because it lets the training many arbitrary features. The basic idea of the structured perceptron algorithm is:

In the case of pronunciation estimation, it is not too difficult to find p, f(p), and f(p*) using the Viterbi algorithm. For the current features in prontron, we use bigram and length features over four sequences:

Word:発音 発表
Pronunciation:はつおん はっぴょう
Seq 1 -- Char/Pron. Pairs:発/はつ 音/おん発/はっ 表/ぴょう
Seq 2 -- Pron. Strings:はつ おんはっ ぴょう
Seq 3 -- Pron. Characters:は つ お んは っ ぴ ょ う
Seq 4 -- (Almost) Phonemes:h a t u o nh a x p i xyo u

Examples of some high-weighted features learned over each of these sequences are as follows:

How well does it do?

On a quick test, using 90% of the unique words in BCCWJ as training data, and 10% of the unique words as testing data, prontron get 66% correct, while a noisy channel model gets 62% (a joint trigram would probably do better). More importantly, it gives the flexibility to incorporate new features easily, which could lead to much better increases in accuracy.



If you are interested in participating in the prontron project, particularly tackling any of the interesting challenges below, please send an email to neubig at gmail dot com.


There are a bunch of possible improvements that would be quite interesting and useful:

Revision History

Version 0.1.0 (7/10/2011)