As your WordPress or bbPress install grows over the years, if you have a large site you will notice the search slows down quite a bit on large tables. Another problem is the quality of search results becomes poor as it has to muck through more and more posts.
It’s difficult to jump to a solution like sphinx right away, and things like the google api are not aware enough about the context of your content, so some people will code their own searches within MySQL in the meanwhile.
For example, a minimal search is done something like the following, taking advantage of MySQL’s natural search mode which takes into account the number of times the words appears and their proximity to return a score.
SELECT post_id, MATCH (post_text) AGAINST ('some words') AS score FROM posts WHERE MATCH (post_text) AGAINST ('some words') ORDER BY score DESC
In my situation, using a much more complex version of that query that takes into account all posts in a topic was taking up to 12 seconds which was completely unacceptable.
So it occurred to me that MySQL also has a binary search mode that is far faster. It’s not what I wanted for the overall results because it’s relevancy can be poor, but it instead could just be used to eliminate posts that would have to be scanned. Anytime you can eliminate rows in mysql you will get a speed boost.
So simply do this instead:
SELECT post_id, MATCH (post_text) AGAINST ('some words') AS score FROM posts WHERE MATCH (post_text) AGAINST ('some words' IN BOOLEAN MODE) ORDER BY score DESC
Basically that IN BOOLEAN MODE will eliminate rows very quickly that do not have either of those words in them, but the SELECT still calculates a valuable natural search score for the remaining results.
When using boolean mode you have to take some extra steps to sanitize the query because some characters have special meaning so keep that in mind but it’s trivial to do.
Eventually I took it a step further and preprocessed the search terms even further. (more…)