lookaround - Developer IT

Does lookaround affect which languages can be matched by regular expressions?

- by sepp2k

There are some features in modern regex engines which allow you to match languages that couldn't be matched without that feature. For example the following regex using back references matches the language of all strings that consist of a word that repeats itself: (.+)\1. This language is not regular and can't be matched by a regex, which does not use back references. My question: Does lookaround also affect which languages can be matched by a regular expression? I.e. are there any languages that can be matched using lookaround, which couldn't be matched otherwise? If so, is this true for all flavors of lookaround (negative or positive lookahead or lookbehind) or just for some of them?

Read the article

[Regexp] Stop matching when meeting a sequence of chars: fixing a lookbehind

- by CFP

Hello everyone! I have the following regexp: (?P<question>.+(?<!\[\[)) It is designed to match hello world! in the string hello world! [[A string typically used in programming examples]] Yet I just matches the whole string, and I can't figure out why. I've tried all flavors of lookaround, but it just won't work... Anyone knows how to fix this problem? Thanks, CFP.

Read the article

Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

- by Lattyware

I wish to use regular expressions to split words into groups of (vowels, not_vowels, more_vowels), using a marker to ensure every word begins and ends with a vowel. import re MARKER = "~" VOWELS = {"a", "e", "i", "o", "u", MARKER} word = "dog" if word[0] not in VOWELS: word = MARKER+word if word[-1] not in VOWELS: word += MARKER re.findall("([%]+)([^%]+)([%]+)".replace("%", "".join(VOWELS)), word) In this example we get: [('~', 'd', 'o')] The issue is that I wish the matches to overlap - the last set of vowels should become the first set of the next match. This appears possible with lookaheads, if we replace the regex as follows: re.findall("([%]+)([^%]+)(?=[%]+)".replace("%", "".join(VOWELS)), word) We get: [('~', 'd'), ('o', 'g')] Which means we are matching what I want. However, it now doesn't return the last set of vowels. The output I want is: [('~', 'd', 'o'), ('o', 'g', '~')] I feel this should be possible (if the regex can check for the second set of vowels, I see no reason it can't return them), but I can't find any way of doing it beyond the brute force method, looping through the results after I have them and appending the first character of the next match to the last match, and the last character of the string to the last match. Is there a better way in which I can do this? The two things that would work would be capturing the lookahead value, or not consuming the text on a match, while capturing the value - I can't find any way of doing either.

Read the article

Building a regexp to split a string

- by Kivin

I'm seeking a solution to splitting a string which contains text in the following format: "abcd efgh 'ijklm no pqrs' tuv" which will produce the following results: ['abcd', 'efgh', 'ijklm no pqrs', 'tuv'] In otherwords, it splits by whitespace unless inside of a single quoted string. I think it could be done with .NET regexps using "Lookaround" operators, particularly balancing operators. I'm not so sure about perl.

Read the article

How can I split a string by whitespace unless inside of a single quoted string?

- by Kivin

I'm seeking a solution to splitting a string which contains text in the following format: "abcd efgh 'ijklm no pqrs' tuv" which will produce the following results: ['abcd', 'efgh', 'ijklm no pqrs', 'tuv'] In other words, it splits by whitespace unless inside of a single quoted string. I think it could be done with .NET regexps using "Lookaround" operators, particularly balancing operators. I'm not so sure about Perl.

Developer IT