javascript RegEx hashtag matching #foo and #foo-fåäö but not http://this.is/no#hashtag
- by Simon B.
Currently we're using javascript new RegExp('#[^,#=!\s][^,#=!\s]*') (see [1])
and it mostly works, except that it also matches URLs with anchors like http://this.is/no#hashtag and also we'd rather avoid matching foo#bar
Some attempts have been made with look-ahead but it doesn't seem to work, or that I just don't get it.
With the below source text:
#public #writable #kommentarer-till-beta -- all these should be matched
Verkligen #bra jobbat! T ex #kommentarer till #artiklar och #blogginlägg, kool. -- mixed within text
http://this.is/no#hashtag -- problem
xxy#bar -- We'd prefer not matching this one, and...
#foo=bar =foo#bar -- we probably shouldn't match any of those either.
#foo,bar #foo;bar #foo-bar #foo:bar -- We're flexible on whether these get matched in part or in full
.
We'd like to get below output:
(showing $ instead of <a class=tag href=.....>...</a> for readability reasons)
$ $ $ -- all these should be matched
Verkligen $ jobbat! T ex $ till $ och $, kool. -- mixed within text
http://this.is/no$ -- problem
xxy$ -- We'd prefer not matching this one, and...
$=bar =foo$ -- we probably shouldn't match any of those either.
$,bar $ $ $ -- We're flexible on whether these get matched in part or in full
[1] http://github.com/ether/pad/blob/master/etherpad/src/plugins/twitterStyleTags/hooks.js