Combining Computer Vision and Language Processing For Advanced Search (EU ICT COST Action)


aaaa Content found in large public and private repositories is increasingly composed of a mixture of visual, textual and speech data, opening opportunities as well as the need for integrative models to bridge Natural Language Processing (NLP) with Computer Vision (CV). This situation demands for solutions for multimedia and cross-media processing of inherently multi-modal data, cross-media linking and cross-media search, retrieval and mining. In addition, the processing should be made scalable and adaptable to different domains and data sources, as typically in this context we are dealing with "big data", which is often user-generated and unstructured. Fragments of natural language in the form of tags, captions, subtitles, surrounding text or audio can aid the interpretation of image and video data by adding context or disambiguating visual appearance. In addition, labeled images are essential for training object or activity classifiers. On the other hand, visual data can help resolve challenges in language processing such as disambiguation of person names, places, events, etc. Studying language and vision together can also provide new insight into cognition and universal representations of knowledge and meaning to be used in a Semantic Web context. In addition, multimodal and cross-modal search and retrieval, which combine visual and textual modalities, become increasingly popular. The project aims at bringing together researchers of the respective fields. The initiative on integrating vision and text will organically yield a better understanding of the nature and usability of vast multimodal data available online such as in the World Wide Web and especially in social media.


Partner LIIR is involved in the scientific coordination of the EU COST action iV&L.


Partner LIIR was/is involved in the organization of the following workshops:
  • Workshop on Vision And Language 2014 (VL'14) at the 25th International Conference on Computational Linguistics (COLING 2014), August 23, 2014, Dublin, Ireland.
  • Workshop on Vision And Language 2015 at the Conference Empirical Methods in Natural Language Processing, September 17, 2015, Lisbon, Portugal.

  • Period From 2013-11-15 to 2017-11-14.
    Financed by EU ICT COST Action IC1307
    Supervised by Marie-Francine Moens
    Staff Golnoosh Farnadi
    Aparna Nurani Venkitasubramanian
    Ivan Vulic
    Contact Marie-Francine Moens

    More information can be found on the project website http://www.cost.eu/domains_actions/ict/Actions/IC1307

    Back to all projects