Accessible & Open Knowledge Infrastructure for Flanders


Our contribution to the Acknowledge project is to make the documentation of the United Nations University Comparable Regional Integration Studies (UNU-CRIS) (Bruges, Belgium) accessible. The document collection of UNU-CRIS is very heterogeneous and is composed of books, scientific articles, essays, and news found on the World Wide Web. The documents are retrieved by means of a classical full text search and the results can be filtered based on the subject categories a user is interested in. The categories concern concepts in the realm of regional integration (e.g., trade, poverty, globalization,).


In this part of the AcKnowledge project we have collaborated with the company i.Know.


We have built a crawler and cleaner of Web documents, which are used to extract the content from a variety of Web news sources. Our search engine and interface build on the Lemur architecture. An API with the Acknowledge e-learning platform was constructed. In addition, we have built a text categorization system where the focus was on implementing and evaluating various feature extraction techniques (e.g., frequent item sets) and feature selection techniques (e.g., linear classifier weights).

Period From 2006-09-01 to 2008-12-31.
Financed by IBBT
Supervised by Marie-Francine Moens
Staff Erik Boiy
Xu Zhang
Javier Arias Moreno
Contact Marie-Francine Moens

More information can be found on the project website


  1. ARIAS, Javier, DESCHACHT, Koen & MOENS, Marie-Francine Content Extraction from Multilingual Web Pages. In Aly R., Hauff, C. Hiemstra, D., Huybers, T. & De Jong, F. (Eds.), Proceedings of the 9th Dutch-Belgium Information Retrieval Workshop. University of Twente. 2009
  2. BOIY, Erik & MOENS, Marie-Francine Categorization of an Heterogeneous Document Collection: Implementation and Evaluation of Several Feature Extraction Techniques. Technical Report. 2008
  3. BOIY, Erik & MOENS, Marie-Francine Feature Selection for Document Categorization. In Marko TADIC, Bojana DALBELO BASIC & Marie-Francine MOENS (Eds.). Technologies for Processing and Retrieval of Semi-Structured Documents (pp. 159-176). Zagreb: Croatian Language Technologies Society. 2009

Back to all projects