How do you implement a good profanity filter?

Posted by Ben Throop on Stack Overflow See other posts from Stack Overflow or by Ben Throop
Published on 2008-11-07T20:19:41Z Indexed on 2010/03/08 6:06 UTC
Read the original article Hit count: 426

Filed under:

php

Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs to be filtered out.

Where can one find a good list of swear words in various languages and dialects?

Are there APIs available to sources that contain good lists? Or maybe an API that simply says "yes this is clean" or "no this is dirty" with some parameters?

What are some good methods for catching folks trying to trick the system, like a$$, azz, or a55?

Bonus points if you offer solutions for PHP. :)

Edit: Response to answers that say simply avoid the programmatic issue:

I think there is a place for this kind of filter when, for instance, a user can use public image search to find pictures that get added to a sensitive community pool. If they can search for "penis", then they will likely get many pictures of, yep. If we don't want pictures of that, then preventing the word as a search term is a good gatekeeper, though admittedly not a foolproof method. Getting the list of words in the first place is the real question.

So I'm really referring to a way to figure out of a single token is dirty or not and then simply disallow it. I'd not bother preventing a sentiment like the totally hilarious "long necked giraffe" reference. Nothing you can do there. :)

Developer IT

How do you implement a good profanity filter? - Developer IT

How do you implement a good profanity filter?

user-input

filtering

regex

php

Edit: Response to answers that say simply avoid the programmatic issue:

Related posts about user-input

Call methods in main method

QTableWidget signal cellChanged(): distinguish between user input and change by routines

Script stops while waiting for user input from STDIN.gets

Printing Stdout In Command Line App Without Overwriting Pending User Input

QTableWidget signal cellChanged(): distinguish between user input and chage by routines

Related posts about filtering

Problem Disabling Roaming Profiles on Grouped Users

ASP.Net MVC2 (RTM) breaks response filtering - "Filtering is not allowed"

Filtering a collection based on filtering rules

Accessing and Updating Data in ASP.NET: Filtering Data Using a CheckBoxList

Accessing and Updating Data in ASP.NET: Filtering Data Using a CheckBoxList

Categories cloud