DIScovery of Knowledge on Chinese Medicinal Plants in Biomedical Texts


The discovery of meaningful knowledge in free text is a current research topic in the text mining and natural language processing fields. The automated knowledge acquisition from text is often referred to as “Machine reading”. Machine reading is a complex process that involves text analysis, knowledge representation, machine learning, inference, optimization and information fusion.

Traditional Chinese medicine forms a body of knowledge built over the past millennia, whose oldest written sources go back 2000 years, and whose modernization has been ongoing since the 1950s. Much of this information has been digitized and is thus accessible by text mining techniques. Chinese medicinal plants and their natural products are also increasingly studied in the West. Their molecular mechanisms and active components form a solid link to the Western biomedical literature. The DISK project aims to use text mining for establishing the relationship between a particular disease or physiological process and a (combination of) Chinese medicinal plant(s) affecting it. The project is carried out in collaboration with Tsinghua University, China.


LIIR coordinates the DISK project and collaborates with Prof. Walter Luyten (Department of Pharmaceutical and Pharmacological Sciences) of KU Leuven and Prof. Juanzi Li, Prof. Jie Tang and Prof. Shao Li of Tsinghua University, China.


The research focuses on entity-relation recognition in biomedical texts. We have explored semi-supervised machine learning techniques, structured learning techniques and the integration of latent factors in the machine learning models. The results are published in highly ranked journals. Some of the results were presented during the international workshop entitled Knowledge Discovery from Big Text: Challenges and Opportunities when Mining Biomedical Text held on May 18, 2015 at the Faculty Club of KU Leuven.

Period From 2012-12-01 to 2016-12-31.
Financed by KU Leuven BIL 11/20T
Supervised by Marie-Francine Moens
Staff Parisa Kordjamshidi
Thomas Provoost
Huaiyu Wan
Yang Yang
Contact Marie-Francine Moens


  1. PROVOOST, Thomas & MOENS, Marie-Francine Detecting Relations in the Gene Regulation Network. In Proceedings of BioNLP 2013. ACL. 2013
  2. YANG, Yang, LUYTEN, Walter, LIU, Lu, MOENS, Marie-Francine, LI, Juanzi and TANG, Jie Forecasting Potential Diabetes Complications. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI-14). AAAI. 2014
  3. PROVOOST, Thomas & MOENS, Marie-Francine Semi-supervised Learning for the BioNLP Gene Regulation Network. BMC Bioinformatics,16 (Suppl 10): S4. 2015
  4. MASSA, Wouter, KORDJAMSHIDI, Parisa, PROVOOST, Thomas and MOENS, Marie-Francine Machine Reading of Biological Texts: Bacteria-Biotope Extraction. In Proceedings of the 8th International Joint Conference on Biomedical Engineering Systems and Technologies - 6th International Conference on Bioinformatics Models, Methods and Algorithms. (nominated for best student paper award) 2015
  5. KORDJAMSHIDI, Parisa, ROTH, Dan & MOENS, Marie-Francine Structured Learning for Spatial Information Extraction from Biomedical Text: Bacteria Biotopes. BMC Bioinformatics, 16: 129. 2015
  6. KORDJAMSHIDI, Parisa, MASSA, Wouter, PROVOOST, Thomas & MOENS, Marie-Francine Machine Reading for Extraction of Bacteria and Habitat Taxonomies. In Proceedings of BIOSTEC extended papers (Lecture Notes in Computer Science). Springer. 2015
  7. WAN, Huaiyu, MOENS, Marie-Francine, LUYTEN, Walter, ZHOU, Xuezhong, MEI, Qiaozhu, LIU, Lu & TANG, Jie Extracting Relations From Traditional Chinese Medicine Literature via Heterogeneous Entity Networks. Journal of the American Medical Informatics Association(JAMIA) (in press). 2015
  8. ZHANG, Jing, TANG, Jie, MA, Cong, TONG, Hanghang, JING, Yu, Li, Juanzi, LUYTEN, W. & MOENS, Marie-Francine Fast and Flexible Top-k Similarity Search on Large Networks. ACM Transactions on Information Systems. 2017

Back to all projects