I have a set of strings and I need to find all all of the occurrences in an HTML document. Where the string occurs is important because I need to handle each case differently:
String is all or part of an attribute. e.g., the string is foo: <input value="foo"> - Add class ATTR to the element.
String is the full text of an element. e.g., <button>foo</button> - Add class TEXT to the element.
String is inline in the text of an element. e.g., <p>I love foo</p> - Wrap the text in a span tag with class TEXT.
Also, I need to match the longest string first. e.g., if I have foo and foobar, then <p>I love foobar</p> should become <p>I love <span class="TEXT">foobar</span></p>, not <p>I love <span class="TEXT">foo</span>bar</p>.
The inline text is easy enough: Sort the strings descending by length and find and replace each in document.body.innerHTML with <span class="TEXT">$1</span>, although I'm not sure if that is the most efficient way to go.
For the attributes, I can do something like this:
sortedStrings.each(function(it) {
document.body.innerHTML.replace(new RegExp('(\S+?)="[^"]*'+escapeRegExChars(it)+'[^"]*"','g'),function(s,attr) {
$('[+attr+'*='+it+']').addClass('ATTR');
});
});
Again, that seems inefficient.
Lastly, for the full text elements, a depth first search of the document that compares the innerHTML to each string will work, but for a large number of strings, it seems very inefficient.
Any answer that offers performance improvements gets an upvote :)