Anticipatory Learning for Reliable Phishing Prevention (EU ICT FP6)

EU Trust & Security Newsletter 2009: Successful completion of the FP6 Antiphish project (

"The Antiphish (Anticipatory Learning for Reliable Phishing Prevention) project had completed its activities by 30 June 2009 and has promising results in the technology to fight spam and phishing emails. Spammers for instance use salting techniques to tweak the original (spam) message in order not to be detected by spam filters. The presence of such techniques makes an email already suspicious and the Antiphish project has developed a detection technique based on rendering the message and comparing the OCR message with the original message, which has been patented. Other anticipatory learning methods are based on machine learning techniques to improve the email filters by active and semi-supervised learning in such a way that new potential email and phishing emails can be detected. The results outperform the known results in research and those of commercial tools. "

The AntiPhish project developed improved technologies for anti-phishing and anti-spam, so as to help protecting and securing the global email communication infrastructure. A special focus is given to the problem of phishing, as one of the most harmful forms of spam. With phishing, malicious persons try to steal private information (such as financial data and account or password information) from ignorent clients, pretending they are legitimate actors (e.g. their financial institution). Often this involves the targetted sending of (personalised) messages that in many ways look very similar to the authentic ones that clients are grown to be familiar with. Whereas spam is often no more than a time-wasting nuisance, it is clear now that phishing poses a serious, ever-growing threat to a vulnerable public and corporate community.

The task of HMDB-LIIR within this project was to design email representations better suited to spam and phishing filtering than the traditional bag-of-words, together with the programs to automatically generate these representations from raw email text. The representations capture features of message salting, syntax, semantics, structure and layout, topical, and graphical. These representations will then be operated on by the advanced machine learning techniques of the Fraunhofer Institute, so as to build classifiers for spam and phishing messages.

The project involves huge amounts of sensitive, real-life data provided by Symantec, and is validated and implemented on communication netwerks governed by Tiscali Group and Nortel Networks S.A.


The project consortium comprises the research partners K.U.Leuven and Fraunhofer-Gesellschaft (IAIS), a world-leading company in commercial spam filtering technologies Symantec, and wired and wireless internet service providers Tiscali and Nortel.


A method for detecting hidden salting (i.e. obfuscation of content) was invented by K.U.Leuven, implemented and evaluated. Advanced feature extraction methods developed by K.U.Leuven were integrated in a prototype email filtering system operating on real-life email streams.

Period From 2006-01-01 to 2009-06-30.
Financed by EU Sixth Framework Programme ICT, FP6-027600
Supervised by Marie-Francine Moens
Staff Erik Boiy
Jan De Beer
Juan Carlos Gomez
Christina Lioma
Contact Marie-Francine Moens

More information can be found on the project website


  1. DE BEER, Jan & MOENS, Marie-Francine Challenging Hidden Text Salting in Digital Media, Technical Report, 29 p. 2008
  2. LIOMA, Christina, MOENS, Marie-Francine, GOMEZ, Juan-Carlos, DE BEER, Jan, BERGHOLZ, Andre, PAASS, Gerhard & HORKAN, Patrick Anticipating Hidden Text Salting in Emails. In Proceedings of the 11th International Symposium of Recent Advances in Intrusion Detection (RAID) Massachusetts Institute of Technology. (Lecture Notes in Computer Science 5230) (pp. 396-397). Springer. 2008
  3. BERGHOLZ, André, PAASS, Gerhard, REICHARTZ, Frank, STROBEL, Siehyun, MOENS, Marie-Francine & WITTEN, Brian Detecting Known and New Salting Tricks in Unwanted Emails. In Proceedings of the International Conference on Email and Anti-Spam (CEAS) 2008 . 2008
  4. BERGHOLZ, A., DE BEER, J., GLAHN, S., MOENS, M.-F., PAASS, G. & STROBEL, S. New Filtering Approaches for Phishing Email. Journal of Computer Security, 18, 7-35. 2010
  5. CHEN, H., DACIER, M., MOENS, M.-F., PAASS, G. & YANG, C.C. (Eds.) Proceedings of the ACM SIGKDD Workshop on Cybersecurity and Intelligence Informatics (CSI-KDD), held in conjunction with SIGKDD’09. 2009
  6. MOENS, Marie-Francine, BOIY, Erik, DE BEER, Jan & GOMEZ, Juan-Carlos Identifying and Resolving Hidden Text Salting. In IEEE Transactions on Information Forensics and Security (accepted). 2010
  7. G0MEZ Juan-Carlos & MOENS, Marie-Francine, Using Biased Discriminant Analysis for Email Filtering. In Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (Lecture Notes in Computer Science). Berlin: Springer. 2010
  8. GOMEZ, Juan Carlos, BOIY, Erik & MOENS, Marie-Francine Highly Discriminative Statistical Features for Email Classification. Knowledge and Information Systems, 31 (3), 23-53. 2011
  9. GOMEZ, Juan Carlos & MOENS, Marie-Francine PCA Document Reconstruction for Email Classification. Computational Statistics and Data Analysis, 56 (3), 741-751. 2012

Back to all projects