Using a large list of terms, search through page text and replace words with links
- by dunc
A while ago I posted this question asking if it's possible to convert text to HTML links if they match a list of terms from my database.
I have a fairly huge list of terms - around 6000.
The accepted answer on that question was superb, but having never used XPath, I was at a loss when problems started occurring. At one point, after fiddling with code, I somehow managed to add over 40,000 random characters to our database - the majority of which required manual removal. Since then I've lost faith in that idea and the more simple PHP solutions simply weren't efficient enough to deal with the amount of data and the quantity of terms.
My next attempt at a solution is to write a JS script which, once the page has loaded, retrieves the terms and matches them against the text on a page.
This answer has an idea which I'd like to attempt.
I would use AJAX to retrieve the terms from the database, to build an object such as this:
var words = [
{
word: 'Something',
link: 'http://www.something.com'
},
{
word: 'Something Else',
link: 'http://www.something.com/else'
}
];
When the object has been built, I'd use this kind of code:
//for each array element
$.each(words,
function() {
//store it ("this" is gonna become the dom element in the next function)
var search = this;
$('.message').each(
function() {
//if it's exactly the same
if ($(this).text() === search.word) {
//do your magic tricks
$(this).html('<a href="' + search.link + '">' + search.link + '</a>');
}
}
);
}
);
Now, at first sight, there is a major issue here: with 6,000 terms, will this code be in any way efficient enough to do what I'm trying to do?.
One option would possibly be to perform some of the overhead within the PHP script that the AJAX communicates with. For instance, I could send the ID of the post and then the PHP script could use SQL statements to retrieve all of the information from the post and match it against all 6,000 terms.. then the return call to the JavaScript could simply be the matching terms, which would significantly reduce the number of matches the above jQuery would make (around 50 at most).
I have no problem with the script taking a few seconds to "load" on the user's browser, as long as it isn't impacting their CPU usage or anything like that.
So, two questions in one:
Can I make this work?
What steps can I take to make it as efficient as possible?
Thanks in advance,