Generic Technology for Information Extraction from Texts

The project deals with the development of generic technologies for extracting information from texts. The technologies are generic because the algorithms can be used across different applications and types of texts, are to a considerable extent language- and domain- independent and are portable to other domains or languages with a minimum of effort. We have applied machine learning and linguistics-based techniques to a number of information extraction tasks. The technologies that we have developed are widely applicable in information retrieval, text searching, text analysis and language understanding.

We have selected a limited number of challenges in function of their relevance for solving practical problems and exploring new methods:


Hierarchical topic segmentation:


Entity scoring:


Case role detection:


Single- and multi-document summarization:

Period From 2000-10-01 to 2004-12-31.
Financed by IWT-STWW (Nr. 000135) , Roularta Media Group, Language & Computing, ICMS Group, Wolters-Kluwer
Supervised by Marie-Francine Moens
Contact Marie-Francine Moens


  1. MOENS, M.-F. & DE BUSSER, R. Generic Topic Segmentation of Document Texts. In Proceedings of the 24th ACM SIGIR Annual International Conference on Research and Development in Information Retrieval (pp. 418-419). New York: ACM 2001
  2. ANGHELUTA, R., DE BUSSER, R. & MOENS, M.-F. The Use of Topic Segmentation for Automatic Summarization. In Proceedings of the ACL-2002 Post-Conference Workshop on Automatic Summarization. 2002
  3. DE BUSSER, R., ANGHELUTA, R. & MOENS, M.-F. Semantic Case Role Detection for Information Extraction. In COLING 2002 - Proceedings of the Main Conference. New Brunswick: ACL, pp. 1198-1202. 2002
  4. MOENS, M.-F. & DE BUSSER, R. Information Extraction: Current Technologies and Promising Research Directions. Internal report TR-IE-1, 68 p. 2001
  5. ANGHELUTA, R., MOENS, M.-F. & DE BUSSER, R. Multi-document Summarization, Technical Report, K.U.Leuven 2002 2002
  6. MOENS, M.F., DE BUSSER, R., HIEMSTRA, D. & KRAAIJ, W. Proceedings of the Third Dutch-Belgian Information Retrieval Workshop. Leuven: ICRI. 2002
  7. ANGHELUTA, R. & MOENS, M.-F. A Study about Synonym Replacement in News Corpora; In Proceedings of the 3'rd Dutch-Belgian Workshop in Information Retrieval 2002
  8. MOENS, M.-F., ANGHELUTA, R. & DE BUSSER, R. Summarization of Texts Found on the World Wide Web. In W. ABRAMOWICZ (Ed.), Knowledge-Based Information Retrieval and Filtering from the Web (pp. 101-120) (The Kluwer International Series in Engineering and Computer Science) . Boston: Kluwer Academic Publishers. 2003
  9. DE BUSSER, R., "Report on the 3rd Dutch-Belgian Information Retrieval Workshop (DIR-2002)." BNVKI Newsletter 20 (1), 19-21 and SIGIR Forum 37 (1), 4-6. 2003
  10. ANGHELUTA, R., MOENS, M.-F. & DE BUSSER, R. The K.U.Leuven Summarization System DUC-2003. In Proceedings of the Document Understanding Conference (DUC-2003). National Institute of Standards and Technology, USA. 2003
  11. DE BUSSER, R. & MOENS, M.-F. Learning Generic Semantic Roles. Technical Report, 15p. (submitted for publication) 2003
  12. ANGHELUTA, R., JEUNIAUX, P., MITRA, R. & MOENS, M.-F. Clustering Algorithms for Noun Phrase Coreference Resolution. In Proceedings of 7´┐Żmes Journ´┐Żes internationales d'Analyse statistique des Donn´┐Żes Textuelles (pp. 60-70). March 10-12, 2004, Louvain La Neuve, Belgium. 2004
  13. MOENS, M.-F., ANGHELUTA, R. & DUMORTIER. J., Generic Technologies for Single- and Multi-document Summarization. Information Processing & Management , 2005 (forthcoming). 2000
  14. MOENS, M.-F., ANGHELUTA, R., DE BUSSER, R. & JEUNIAUX, P. Summarizing Text at Various Levels of Detail. In Proceedings of RIAO 2004 Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval (pp. 597-609). Le Centre de Hautes ´┐Żtudes Internationales d'Informatique Documentaire. 2004
  15. ANGHELUTA, R., MITRA, R., JING, X. & MOENS, M.-F. K.U.Leuven Summarization System at DUC-2004. In DUC Workshop Papers and Agenda (pp. 53-60). Boston. 2004
  16. MOENS, M.-F., ANGHELUTA, R. & DUMORTIER, J. Generic Technologies for Single- and Multi-document Summarization. Information Processing & Management , 41(3), 569-586. 2005
  17. MOENS, M.-F. (2006). Using Patterns of Thematic Progression for Building a Table of Content of a Text. Journal of Natural Language Engineering 12 (3): 1-28. 2006
  18. MOENS, M.-F. Information Synthesis: A Glance at the Future. In Proceedings of the IJCAI 2005 Workshop on Knowledge and Reasoning for Answering Questions (invited lecture). 2005
  19. MOENS, M.-F., JEUNIAUX, P., ANGHELUTA, R. & MITRA, R. (2006). Measuring Aboutness of an Entity in a Text . In Proceedings of HLT-NAACL 06 TextGraphs: Graph-based Algorithms for Natural Language Processing. East Stroudsburg: ACL. 2006
  20. MITRA, R., ANGHELUTA, R., JEUNIAUX, P. & MOENS, M.-F. Progressive Fuzzy Clustering for Noun Phrase Coreference Resolution. In
  21. MEHAY. D., DE BUSSER, R. & MOENS, M.-F. Labeling Generic Semantic Roles. In H. Bunt, J. Geertzen & E. Thyse (Eds.), Proceedings of the Sixth International Workshop on Computational Semantics (IWCS-6) (pp. 175-187). Tilburg, The Netherlands: Tilburg University. 2005
  22. BIRYUKOV, M., ANGHELUTA, R. , MOENS, M. -F. Multidocument Question Answering Text Summarization using Topic Signatures. In Proceedings of the DIR-2005 Dutch-Belgian Information Retrieval Workshop . 2005
  23. BIRYUKOV, M., ANGHELUTA, R. & MOENS, M.-F. Multidocument Question Answering Text Summarization Using Topic Signatures. Journal on Digital Information Management . 2005
  24. MOENS, M.-F. Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6). Boston: Kluwer Academic Publishers. 2000
  25. MOENS, M.-F. & SZPAKOWICZ, S. (Eds.) Text Summarization Branches Out. New Brunswick: Association for Computational Linguistics. 2004
  26. MOENS, M.-F. Information Extraction: Algorithms and Prospects in a Retrieval Context (The Information Retrieval Series 21). New York: Springer. 2006

Back to all projects