Generic Technology for Information Extraction from Texts

The project deals with the development of generic technologies for extracting information from texts. The technologies are generic because the algorithms can be used across different applications and types of texts, are to a considerable extent language- and domain- independent and are portable to other domains or languages with a minimum of effort. We have applied machine learning and linguistics-based techniques to a number of information extraction tasks. The technologies that we have developed are widely applicable in information retrieval, text searching, text analysis and language understanding.

We have selected a limited number of challenges in function of their relevance for solving practical problems and exploring new methods:


Hierarchical topic segmentation:


Entity scoring:


Case role detection:


Single- and multi-document summarization:

Period From 2000-10-01 to 2004-12-31.
Financed by IWT-STWW (Nr. 000135) , Roularta Media Group, Language & Computing, ICMS Group, Wolters-Kluwer
Supervised by Marie-Francine Moens
Contact Marie-Francine Moens


