What’s New in Delphi XE6 Regular Expressions
- by Jan Goyvaerts
There’s not much new in the regular expression support in Delphi XE6. The big change that should be made, upgrading to PCRE 8.30 or later and switching to the pcre16 functions that use UTF-16, still hasn’t been made. XE6 still uses PCRE 7.9 and thus continues to require conversion from the UTF-16 strings that Delphi uses natively to the UTF-8 strings that older versions of PCRE require.
Delphi XE6 does fix one important issue that has plagued TRegEx since it was introduced in Delphi XE. Previously, TRegEx could not find zero-length matches. So a regex like (?m)^ that should find a zero-length match at the start of each line would not find any matches at all with TRegEx.
The reason for this is that TRegEx uses TPerlRegEx to do the heavy lifting. TPerlRegEx sets its State property to [preNotEmpty] in its constructor, which tells it to skip zero-length matches. This is not a problem with TPerlRegEx because users of this class can change the State property. But TRegEx does not provide a way to change this property. So in Delphi XE5 and prior, TRegEx cannot find zero-length matches.
In Delphi XE6 TPerlRegEx’s constructor was changed to initialize State to the empty set. This means TRegEx is now able to find zero-length matches. TRegex.Replace() using the regex (?m)^ now inserts the replacement at the start of each line, as you would expect. If you use TPerlRegEx directly, you’ll need to set State to [preNotEmpty] in your own code if you relied on its behavior to skip zero-length matches.
You will need to check existing applications that use TRegEx for regular expressions that incorrectly allow zero-length matches. In XE5 and prior, TRegEx using \d* would match all numbers in a string. In XE6, the same regex still matches all numbers, but also finds a zero-length match at each position in the string.
RegexBuddy 4 warns about zero-length matches on the Create panel if you set it to Detailed mode. At the bottom of the regex tree there will be a node saying either “your regular expression may find zero-length matches” or “zero-length matches will be skipped” depending on whether your application allows zero-length matches (XE6 TRegEx) or not (XE–XE5 TRegEx).