Adaptation of Machine Translation for Multilingual Information Retrieval in the Medical Domain - pecina 2014
We study machine translation of user search queries in cross-lingual information retrieval in the medical domain. We work on adapting machine translation to improve translation quality and cross-lingual information retrieval. This work uses in-domain training and tuning, intelligent training data selection, phrase table optimization, compound splitting, and synonyms as translation options. Morphological normalization and numerous translation variants are information retrieval approaches. Experiments are run on Czech-English, German-English, and French-English. Our algorithms surpass our strong baselines, Google Translate, and Microsoft Bing Translator in direct comparisons on all language pairs. The baseline BLEU scores for Czech-English, German-English, and French-English all rose from 26.59 to 41.45. Average improvement is 55%. Only French-English information retrieval improves significantly over the baseline in this test collection. Increased machine translation quality doesn't improve information retrieval for Czech-English and German-English. Most of our machine translation approaches increase medical search queries. Intelligent training data selection helps machine translation adapt to domains. German compound splitting improves the source language. Higher translation doesn't necessarily mean better information retrieval performance. We examine individual methodologies, state-of-the-art characteristics, and future research goals.
Pecina P, Dušek O, Goeuriot L, Hajič J, Hlaváčová J, Jones GJ, Kelly L, Leveling J, Mareček D, Novák M, Popel M, Rosa R, Tamchyna A, Urešová Z. Adaptation of machine translation for multilingual information retrieval in the medical domain. Artif Intell Med. 2014 Jul;61(3):165-85. doi: 10.1016/j.artmed.2014.01.004. Epub 2014 Feb 5. PMID: 24680188.