SED and Unicode Quotation Marks

Posted by Jonathan Patt on Super User See other posts from Super User or by Jonathan Patt
Published on 2010-01-02T17:35:10Z Indexed on 2012/06/28 9:18 UTC
Read the original article Hit count: 268

Filed under:
|

When testing against this string:

“… so that’s that… ”

The following should, but does not, match the opening quotation mark and following ellipsis and space:

sed "s/\([“‘\"']…\) /\1/g"

However, this correctly matches the second ellipsis and following space and closing quotation mark:

sed "s/… \([”’\"'.!?]\)/…\1/g"

If I split the first apart it works fine:

sed -e "s/\(“…\) /\1/g" \
-e "s/\(‘…\) /\1/g" \
-e "s/\(\"…\) /\1/g" \
-e "s/\('…\) /\1/g"

So why doesn't it work when it's grouped together? Especially when it works fine with the closing quotation marks.

© Super User or respective owner

Related posts about sed

Related posts about unicode