Regular Expression for accurate word-count using JavaScript
Posted
by
Haidon
on Stack Overflow
See other posts from Stack Overflow
or by Haidon
Published on 2011-01-04T12:28:50Z
Indexed on
2011/01/04
19:53 UTC
Read the original article
Hit count: 194
I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.
One solution I had found is as follows:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;
But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.
Another one I put together:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;
But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.
Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?
© Stack Overflow or respective owner