Utilizing Multinomial Naive Bayes for Spam Filters

Joe Ramirez
3 min readOct 27, 2020

--

Unsplash (Source)

When initially learning about Bayes’ Theorem, you may end up asking yourself what is the relevance of the theorem in modern-day data science? One relevant application of Bayes’ Theorem is using the Naive Bayes’ classifier. While there are several different Naive Bayes’ classifiers currently in use, the most commonly used one is the Multinomial Naive Bayes’ classifier and its application in spam filtering.

Multinomial Naive Bayes

To start, we will breakdown all our spam and regular emails into two separate lists that contain each individual word in the emails. We can then create a histogram to visualize the difference a spam email and a regular email.

Image by Author
Image by Author

Using this information, we can then train our classifier to understand the difference between a spam email and a regular email each time a new email is received.

For example, let’s say we receive an email that says “Best Friend” and we want to decide if the message is spam or a regular message. Since this classifier is using Bayes’ Theorem after all, we will need to calculate Prior Probabilities first. For this example, let’s say the probability of a regular message is .75 and the probability of a spam message is .25.

Next, we need to calculate the probability that “Best” appears in a spam and in a regular message and then do the same for “Friend.”

The word “Best” appears once in the spam data and there are a total of 10 words in that dataset. So the probability of Best given that we know it’s a spam message is .10. Similarly for the word “Friend”, we can calculate the probability of the word appearing in a spam message is .20 since it appears two out of ten times. We then multiply these two numbers by the Prior Probability of receiving a spam message: .10 x.20 x.25 = .005.

Now let’s do the same calculation for the regular emails. The probability of a message containing the word “Best” given that it’s a regular email is .43 (6/14). Likewise for the word “Friend”, we can calculate its probability as .07 (1/14). When we multiply this out with the Prior Probability, we come to the following calculation: .43 x .07 x .75 = .03.

Since .03 is greater than .005, we can conclude that a message that contains the phrase “Best Friend” is a regular message and not spam. This is how the Multinomial Naive Bayes classifier is utilized in the creation of spam filters!

Why Is It Naive?

Now that we have determined that an email that contains the phrase “Best Friend” is a regular email, what about an email that contains the phrase “Friend Best?” To a human being, this phrase does not seem natural and we would automatically be suspicious of any email containing that phrase. However, Naive Bayes’ would calculate the probabilities from above exactly the same way and classify the email as a regular email. Why is this? Because Naive Bayes’ does not recognize word order! This is why it’s called Naive. Despite this limitation, the Naive Bayes’ classifier still performs amazingly well in spam filters.

--

--

Joe Ramirez
Joe Ramirez

Written by Joe Ramirez

Data Scientist | Data Analyst

No responses yet