Little Regular Expression (against HTML) help
- by Marcos Placona
Hi, I have the following HTML
<p>Some text <a title="link" href="http://link.com/" target="_blank">my link</a> more
text <a title="link" href="http://link.com/" target="_blank">more link</a>.</p>
<p>Another paragraph.</p>
<p>[code:cf]</p>
<p><cfset ArrFruits = ["Orange", "Apple", "Peach", "Blueberry", </p>
<p>"Blackberry", "Strawberry", "Grape", "Mango", </p>
<p>"Clementine", "Cherry", "Plum", "Guava", </p>
<p>"Cranberry"]></p>
<p>[/code]</p>
<p>Another line</p>
<p><img src="http://image.jpg" alt="Array" />
</p>
<p>More text</p>
<p>[code:cf]</p>
<p><table border="1"></p>
<p> <cfoutput></p>
<p> <cfloop array="#GroupsOf(ArrFruits, 5)#" index="arrFruitsIX"></p>
<p> <tr></p>
<p> <cfloop array="#arrFruitsIX#" index="arrFruit"></p>
<p> <td>#arrFruit#</td></p>
<p> </cfloop></p>
<p> </tr></p>
<p> </cfloop></p>
<p> </cfoutput></p>
<p></table></p>
<p>[/code]</p>
<p>With an output that looks like:</p>
<p><img src="another_image.jpg" alt="" width="342" height="85" /></p>
What I'm trying to do, is write a regular expression that will remove all the or , and whenever it finds a , it will replace it with a line-break.
So far, my pattern looks like this:
/\<p\>(.*?)(<\/p>)/g
And I'm replacing the matches with:
$1\n
It all looks good, but it's also replacing the contents inside the [code][/code] tags, which in this case should not replace the tags at all, so as a result, i would lkike to get rid of the tags, when the content isn't inside the [code] tags.
I can't ever get negation right, I know it will be something along the lines of
\<p\>^\[code*\](.*?)(<\/p>)
But obviously this doesn't work :-)
Could anyone please lend me a hand with this regex?
BTW, I know I shouldn't be using regular expressions to parse HTML at all. I'm fully aware of that, but still, for this specific case, I'd like to use regex.
Thanks in advance