JavaScript: Given an offset and substring length in an HTML string, what is the parent node?
- by Bungle
My current project requires locating an array of strings within an element's text content, then wrapping those matching strings in <a> elements using JavaScript (requirements simplified here for clarity). I need to avoid jQuery if at all possible - at least including the full library.
For example, given this block of HTML:
<div>
<p>This is a paragraph of text used as an example in this Stack Overflow
question.</p>
</div>
and this array of strings to match:
['paragraph', 'example']
I would need to arrive at this:
<div>
<p>This is a <a href="http://www.example.com/">paragraph</a> of text used
as an <a href="http://www.example.com/">example</a> in this Stack
Overflow question.</p>
</div>
I've arrived at a solution to this by using the innerHTML() method and some string manipulation - basically using the offsets (via indexOf()) and lengths of the strings in the array to break the HTML string apart at the appropriate character offsets and insert <a href="http://www.example.com/"> and </a> tags where needed.
However, an additional requirement has me stumped. I'm not allowed to wrap any matched strings in <a> elements if they're already in one, or if they're a descendant of a heading element (<h1> to <h6>).
So, given the same array of strings above and this block of HTML (the term matching has to be case-insensitive, by the way):
<div>
<h1>Example</a>
<p>This is a <a href="http://www.example.com/">paragraph of text</a> used
as an example in this Stack Overflow question.</p>
</div>
I would need to disregard both the occurrence of "Example" in the <h1> element, and the "paragraph" in <a href="http://www.example.com/">paragraph of text</a>.
This suggests to me that I have to determine which node each matched string is in, and then traverse its ancestors until I hit <body>, checking to see if I encounter a <a> or <h_> node along the way.
Firstly, does this sound reasonable? Is there a simpler or more obvious approach that I've failed to consider? It doesn't seem like regular expressions or another string-based comparison to find bounding tags would be robust - I'm thinking of issues like self-closing elements, irregularly nested tags, etc. There's also this...
Secondly, is this possible, and if so, how would I approach it?