Regular Expression for accurate word-count using JavaScript

Posted by Haidon on Stack Overflow See other posts from Stack Overflow or by Haidon
Published on 2011-01-04T12:28:50Z Indexed on 2011/01/04 19:53 UTC
Read the original article Hit count: 192

Filed under:
|
|

I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.

One solution I had found is as follows:

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;

But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.

Another one I put together:

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;

But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.

Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?

© Stack Overflow or respective owner

Related posts about JavaScript

Related posts about regex