How can I handle validation of non-latin script input in PHP?
- by Matt
I am trying to adapt a php application to handle non-latin scripts (specifically: Japanese, simplified Chinese and Arabic). The app's data validation routines make frequent use of regular expressions to check input, but I am not sure how to adapt the \w character type to other languages without installing additional locales on the system (which I cannot rely on).
Previous developers to have worked on the app have simply added needed characters to the regexes as the number of languages we supported grew (you frequently see "[\wÀÁÂÃÄÅÆÇÈÉ... etc" in the code), but I can't really do this for all the alphabets I need to support now.
Does anybody out there have some advice on how to tackle this?