Little Regular Expression (against HTML) help

Posted by Marcos Placona on Stack Overflow See other posts from Stack Overflow or by Marcos Placona
Published on 2010-04-17T20:09:41Z Indexed on 2010/04/17 20:13 UTC
Read the original article Hit count: 259

Filed under:
|

Hi, I have the following HTML

<p>Some text <a title="link" href="http://link.com/" target="_blank">my link</a> more 
text <a title="link" href="http://link.com/" target="_blank">more link</a>.</p>
<p>Another paragraph.</p>
<p>[code:cf]</p>
<p>&lt;cfset ArrFruits = ["Orange", "Apple", "Peach", "Blueberry", </p>
<p>"Blackberry", "Strawberry", "Grape", "Mango", </p>
<p>"Clementine", "Cherry", "Plum", "Guava", </p>
<p>"Cranberry"]&gt;</p>
<p>[/code]</p>
<p>Another line</p>
<p><img src="http://image.jpg" alt="Array" />
</p>
<p>More text</p>
<p>[code:cf]</p>
<p>&lt;table border="1"&gt;</p>
<p> &lt;cfoutput&gt;</p>
<p> &lt;cfloop array="#GroupsOf(ArrFruits, 5)#" index="arrFruitsIX"&gt;</p>
<p>  &lt;tr&gt;</p>
<p> &lt;cfloop array="#arrFruitsIX#" index="arrFruit"&gt;</p>
<p>     &lt;td&gt;#arrFruit#&lt;/td&gt;</p>
<p> &lt;/cfloop&gt;</p>
<p>  &lt;/tr&gt;</p>
<p> &lt;/cfloop&gt;</p>
<p> &lt;/cfoutput&gt;</p>
<p>&lt;/table&gt;</p>
<p>[/code]</p>
<p>With an output that looks like:</p>
<p><img src="another_image.jpg" alt="" width="342" height="85" /></p>

What I'm trying to do, is write a regular expression that will remove all the

or

, and whenever it finds a

, it will replace it with a line-break.

So far, my pattern looks like this:

/\<p\>(.*?)(<\/p>)/g

And I'm replacing the matches with:

$1\n

It all looks good, but it's also replacing the contents inside the [code][/code] tags, which in this case should not replace the

tags at all, so as a result, i would lkike to get rid of the

tags, when the content isn't inside the [code] tags.

I can't ever get negation right, I know it will be something along the lines of

\<p\>^\[code*\](.*?)(<\/p>)

But obviously this doesn't work :-)

Could anyone please lend me a hand with this regex?

BTW, I know I shouldn't be using regular expressions to parse HTML at all. I'm fully aware of that, but still, for this specific case, I'd like to use regex.

Thanks in advance

© Stack Overflow or respective owner

Related posts about regex

Related posts about html-parsing