Java Spam Filter
- by JackSparrow
I'm trying to create a spam filter in Java using the Bayesian algorithm.
I use a text file that contains email messages and split the tokens using regex, storing these values into a hashmap.
My problem is, with regex, the email addresses are split so instead of:
[email protected]
regex causes the token to be:
john
smith
example
The same holds true for ip addresses, so for example, instead of:
192.55.34.322
regex splits the tokens to be:
192
55
34
322
So does anybody know of a way that I could read the email messages and store their contents as is?