regex to match postgresql bytea

Posted by filiprem on Stack Overflow See other posts from Stack Overflow or by filiprem
Published on 2010-03-01T11:19:30Z Indexed on 2010/06/13 9:42 UTC
Read the original article Hit count: 367

Filed under:
|
|

In PostgreSQL, there is a BLOB datatype called bytea. It's just an array of bytes.

bytea literals are output in the following way:

'\\037\\213\\010\\010\\005`Us\\000\\0001.fp3\'\\223\\222%'

See PostgreSQL docs for full definition of the format.

I'm trying to construct a Perl regular expression which will match any such string.
It should also match standard ANSI SQL string literals, like 'Joe', 'Joe''s Mom', 'Fish Called ''Wendy'''
It should also match backslash-escaped variant: 'Joe\'s Mom', .

First aproach (shown below) works only for some bytea representations.

s{ '               # Opening apostrophe
    (?:            # Start group
        [^\\\']    #   Anything but a backslash or an apostrophe
    |              #  or
        \\ .       #   Backslash and anything
    |              #  or
        \'\'       #   Double apostrophe
    )*             # End of group
  '                # Closing apostrophe
}{LITERAL_REPLACED}xgo;

For other (longer ones, with many escaped apostrophes, Perl gives such warning:

Complex regular subexpression recursion limit (32766) exceeded at ./sqa.pl line 33, <> line 1.

So I am looking for a better (but still regex-based) solution, it probably requires some regex alchemy (avoiding backreferences and all).

© Stack Overflow or respective owner

Related posts about regex

Related posts about perl