I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).
Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.
However, this approach has a few problems:
substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).
My planned solution Create a single table having columns similar to
id, table_id, row_id, data
Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphinx and use it for doing searches instead.
Are there any more efficient solutions or best practises how to approach this? Thanks.