Java Spam Filter

Posted by JackSparrow on Stack Overflow See other posts from Stack Overflow or by JackSparrow
Published on 2010-04-30T13:19:52Z Indexed on 2010/04/30 13:27 UTC
Read the original article Hit count: 486

Filed under:

I'm trying to create a spam filter in Java using the Bayesian algorithm.

I use a text file that contains email messages and split the tokens using regex, storing these values into a hashmap.

My problem is, with regex, the email addresses are split so instead of: [email protected]

regex causes the token to be: john smith example

The same holds true for ip addresses, so for example, instead of: 192.55.34.322

regex splits the tokens to be: 192 55 34 322

So does anybody know of a way that I could read the email messages and store their contents as is?

© Stack Overflow or respective owner

Related posts about java