ZEPAD or how we use AI to detect spam malware and phishing in real time
After having successfully integrated deep learning algorithms into the spam classification engine, our R&D team has gotten to a new challenge : Anomaly detection in email flows.
The idea is conceptually simple : By carefully looking at the global inbound flow reaching our infrastructure, that is all the mail addressed to our customer base, we can detect when email surges happen. If we have a clue that some of this volume spike is not legitimate, then we have an early warning system on spammer and scammers activity.
This would be simple if all spam messages from a campaign were sharing common information such as a “Unpaid Invoice” subject line or sent from the same email or IP address. The reality is that these days, malware campaigns are morphing as they go. Messages using different subject lines are sent from different addresses and even the body of the email is changing. A human ultimately can tell that messages from a same campaign share the same pattern but certainly cannot do that for thousands of messages per second. A machine certainly has speed but the campaign identification poses a complex challenge.
A problem that we just solved with a branch of AI, called clustering. Advanced near real-time clustering techniques can be used to process large batches of messages – lets say 10000 messages. Clustering and Natural Language Processing (NLP) algorithms try to group together messages that share common traits. These traits obviously include the subject line and sender information but they also consider the message meta-data such as the message size, the display name, attachment info, and the various message features that we may find. Feeding all this data to the clustering algorithm and using the right similarity measures does the magic and thus is born ZEPAD – The ZEROSPAM Email Pattern Anomaly Detection.
Of course a lot of the tagged email clusters are legitimate, as they represent bulk emails being sent for good commercial reasons. Conversely, a lot of clusters are blocked which just confirms that we are doing our job. The jewel comes from the clusters that show a delivery rate between 10% and 90% where that typically reveals something is wrong. We just set an alarm for these thresholds et voilà!
Clustering is relatively expensive considering the time constraint for a quick feedback, so there is a practical limit to message batches size that can be handled but it allows ZEROSPAM to detect spam and malware campaigns AS THEY RISE. To our knowledge, this is a key advance in threat intelligence and it allows us to proactively protect our customer base without them even knowing a new threat is lurking.
Above are two messages with different features, subject lines and body.
These were identified by the ZEPAD clustering technique as being part of the same campaign.
This is another example of how ZEROSPAM researchers and software engineers make good use of their resources and the latest advancements in tech to provide a best of breed service.