How do you capture a group with regex?
Posted
by Sylvain
on Stack Overflow
See other posts from Stack Overflow
or by Sylvain
Published on 2010-04-05T06:28:12Z
Indexed on
2010/04/05
7:03 UTC
Read the original article
Hit count: 468
Hi,
I'm trying to extract a string from another using regex.
I'm using the POSIX regex functions (regcomp, regexec
...), and I fail at capturing a group ...
For instance, let the pattern be something as simple as "MAIL FROM:<(.*)>"
(with REG_EXTENDED
cflags)
I want to capture everything between '<' and '>'
My problem is that regmatch_t
gives me the boundaries of the whole pattern (MAIL FROM:<...>) instead of just what's between the parenthesis ...
What am I missing ?
Thanks in advance,
edit: some code
#define SENDER_REGEX "MAIL FROM:<(.*)>"
int main(int ac, char **av)
{
regex_t regex;
int status;
regmatch_t pmatch[1];
if (regcomp(®ex, SENDER_REGEX, REG_ICASE|REG_EXTENDED) != 0)
printf("regcomp error\n");
status = regexec(®ex, av[1], 1, pmatch, 0);
regfree(®ex);
if (!status)
printf( "matched from %d (%c) to %d (%c)\n"
, pmatch[0].rm_so
, av[1][pmatch[0].rm_so]
, pmatch[0].rm_eo
, av[1][pmatch[0].rm_eo]
);
return (0);
}
outputs:
$./a.out "012345MAIL FROM:<abcd>$"
matched from 6 (M) to 22 ($)
solution:
as RarrRarrRarr said, the indices are indeed in pmatch[1].rm_so
and pmatch[1].rm_eo
hence regmatch_t pmatch[1];
becomes regmatch_t pmatch[2];
and regexec(®ex, av[1], 1, pmatch, 0);
becomes regexec(®ex, av[1], 2, pmatch, 0);
Thanks :)
© Stack Overflow or respective owner