Is it possible to split HTML using DOMDocument?

Posted by Lynn Adrianna on Stack Overflow See other posts from Stack Overflow or by Lynn Adrianna
Published on 2014-08-22T16:15:49Z Indexed on 2014/08/22 16:20 UTC
Read the original article Hit count: 234

Filed under:
|
|

Using DOMDocument, is it possible to split a block of HTML by text wrapped in tags and those that are not, while maintaining the order? Sorry, if this doesn't make sense. My example should make it clear.

Let's say I have the following block of HTML:

text1<b style="color:pink">text2</b>text3<b>text4</b> <b style="font-weight:bold">text5</b>

Is it possible create an array as such:

array(
  [0] => text1 
  [1] => <b style="color:pink">text2</b>
  [2] => text3
  [3] => <b>text4</b>
  [4] => 
  [5] => <b style="font-weight:bold">text5</b>
)

Below is my current working solution, which uses a regular expression, to split the HTML.

$tokens = preg_split('/(<b\b[^>]*>.*?<\/b>)/i', $html, null, PREG_SPLIT_DELIM_CAPTURE);

However, I always read that it is a bad idea to parse HTML using regular expressions, so was just wondering if there is a better way.

© Stack Overflow or respective owner

Related posts about php

Related posts about html