Regular expression to match empty HTML tags that may contain embedded JSTL?
- by Keith Bentrup
I'm trying to construct a regular expression to look for empty html tags that may have embedded JSTL. I'm using Perl for my matching.
So far I can match any empty html tag that does not contain JSTL with the following?
/<\w+\b(?!:)[^<]*?>\s*<\/\w+/si
The \b(?!:) will avoid matching an opening JTSL tag but that doesn't address the whether JSTL may be within the HTML tag itself (which is allowable). I only want to know if this HTML tag has no children (only whitespace or empty). So I'm looking for a pattern that would match both the following:
<div id="my-id">
</div>
<div class="<c:out var="${my.property}" />"></div>
Currently the first div matches. The second does not. Is it doable? I tried several variations using lookahead assertions, and I'm starting to think it's not. However, I can't say for certain or articulate why it's not.
Edit: I'm not writing something to interpret the code, and I'm not interested in using a parser. I'm writing a script to point out potential issues/oversights. And at this point, I'm curious, too, to see if there is something clever with lookaheads or lookbehinds that I may be missing. If it bothers you that I'm trying to "solve" a problem this way, don't think of it as looking for a solution. To me it's more of a challenge now, and an opportunity to learn more about regular expressions.
Also, if it helps, you can assume that the html is xhtml strict.