MTH raises MT training standards

choc croc

One of the primary problems with trained MT engines is… poor output, of course. With free solutions, the problem partly lies with customizations. Now Microsoft Translator Hub claims to have overcome the two main bottlenecks on the path to output improvements: inability to update glossary on the fly and the need for a huge number of TM segments to train the engine right. Here’s an excerpt from their blog:

Dictionary only training: You can now train a custom translation system when you just have a dictionary and no other parallel documents. There is no minimum size to that dictionary. It should have at least one entry. Simply upload the dictionary, which is an Excel file with the language ID as column header, include it in your training set, and hit train. The training completes very quickly, and you are ready to deploy.

Training with 1000 parallel sentences only: You can now train a custom system with only 1000 parallel sentences. Use 500 sentences for the tuning set and 500 sentences in the test set. The Hub will build a system based on Microsoft models, and will tune the models to your tuning set, giving you a better adjusted system than the generic translation system. You can use in-domain target language documents as part of this training as well. The 1000 sentences must be unique and pass the Hub’s data filtering. (Source)

If this really works as described, this is a fascinating breakthrough for individual translators, who train their MT engines for recurring projects of technical nature. Well, about time you checked that out yourself.

Photo credit: Crocodile via photopin (license)
Advertisement

One thought on “MTH raises MT training standards

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s