SCATE

Smart Computer-Aided Translation Environment

Aims

In the SCATE project we investigate several aspects of translation technology and its usage in the translator's work flow. This work is done in close collaboration with the industry in order to facilitate valorization. LIIR is involved in the following tasks of SCATE: 1) Improvements in automated terminology extraction from comparable corpora; and 2) Improvements in speech recognition accuracy.

We research methods to determine which texts in different languages contain comparable information, and we improve on current methods of terminology extraction from comparable corpora through techniques such as cross-lingual topic modelling. Improvements in speech recognition accuracy in the context of machine translation are targeted by integrating the language model (LM) of the machine translation engine with the language model of the speech recogniser, in two directions. We study the adaptation of the recogniser as input method for the post-editor, and we study the translation of speech. Furthermore we study also how to perform automatic domain-adaptation for speech recognition, in order to automatically adapt the language models of the recogniser to the domain.

Partners

The project is coordinated by CCL, Centre for Computational Linguistics of KU Leuven (Prof. dr. Frank Van Eynde, Dr. Vincent Vandeghinste). Other partners are ESAT/PSI - Centre for the Processing of Speech and Images of KU Leuven, the Translation School Thomas More of KU Leuven, LT3 - Language and Translation Technology Team of the University of Ghent and EDM - Expertise centre for Digital Media of Hasselt University.

Results

We have built models for extraction of translation equivalents from comparable corpora based on probabilistic topic models and word embeddings, and have used these novel models successfully in cross-lingual information retrieval.



Period From 2014-03-01 to 2018-02-28.
Financed by IWT - SBO 130041
Supervised by Marie-Francine Moens
Staff Geert Heyman
Ivan Vulic
Contact Geert Heyman

More information can be found on the project website http://www.arts.kuleuven.be/ling/ccl/projects/scate

Publications

  1. VANDEGINSTE, Vincent, VANALLEMEERSCH, Tom, HOSTE, Veronique, MOENS, Marie-Francine, WAMBACQ, Patrick, CONINX, Karin & DE WACHTER, Ken Smart Computer Aided Translation Environment. In Proceedings of the Seventeenth Annual Conference of the European Association for Machine Translation (EAMT 2013). 2014
  2. VULIC, Ivan & MOENS, Marie-Francine. Probabilistic Models of Cross-Lingual Semantic Similarity in Context Based on Latent Cross-Lingual Concepts Induced from Comparable Data. In Proceedings of EMNLP 2014: Conference on Empirical Methods in Natural Language Processing. 2014
  3. VULIC, Ivan, DE SMET, Wim, TANG, Jie & MOENS, Marie-Francine Probabilistic Topic Modeling in Multilingual Settings: An Overview of Its Methodology and Applications. Information Processing & Management, 51 (1), 111-147. 2015
  4. VULIC, Ivan & MOENS, Marie-Francine Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. In Proceedings of the 38th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. 2015
  5. VULIC, Ivan & MOENS, Marie-Francine Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015). East Stroudsburg, PA: ACL. 2015
  6. VANDEGHINSTE, Vincent et al. Smart Computer Aided Translation Environment. In Proceedings of the Annual Conference of the European Association for Machine Translation (EAMT 2015). 2015
  7. KIELA, Douwe, RIMELL, Laura, VULIC, Ivan, CLARK, Stephen (2015). Exploiting image generality for lexical entailment detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015). East Stroudsburg, PA: ACL. 2015
  8. KIELA, Douwe, VULIC, Ivan & CLARK, Stephen Transferring Features from a Convolutional Neural Network to Perform Bilingual Lexicon Induction. In Proceedings of EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. ACL 2015
  9. HEYMAN, Geert, VULIC, Ivan & MOENS, Marie-Francine C-BiLDA Extracting Cross-lingual Topics from Non-Parallel Texts by Distinguishing Shared from Unshared Content. Data Mining and Knowledge Discovery. 2016
  10. VULIC, Ivan & MOENS, Marie-Francine Bilingual Distributed Word Representations from Document-Aligned Comparable Data. Journal of Artificial Intelligence Research. 2016
  11. VULIC, I., KIELA, D., CLARK, S. & MOENS, M.-F. Multi-Modal Representations for Improved Bilingual Lexicon Learning. In Proceedings of The 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). ACL. 2016
  12. HEYMAN, Geert, VULIC, Ivan & MOENS, Marie-Francine Bilingual Lexicon Induction by Learning to Combine Word-Level and Character-Level Representations. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. 2017


Back to all projects