making bbPress (and WordPress) work better!

mySQL

WordPress.com has 2200 servers

500 of them are just for MySQL with 500 million tables. More here:


MySQL and MariaDB will start to diverge after version 5.5

MariaDB is a true open-source, drop-in binary replacement for MySQL – that is until the next version of MySQL 5.6 – MariaDB is going to go its own way for its version 10 jump after 5.5

So if you were hoping for innodb fulltext search in MariaDB 10, it’s not going to happen, but maybe in 10.1

Read more here…


MySQL 5.6 gold, any week now?

Oracle has promised a “production-ready product release in early 2013” for MySQL 5.6 – which would have to be some week before the end of April?

MySQL 5.6.9 release candidate is currently available for testing.

5.6 is exciting for one interesting reason to me – it adds fulltext indexing support to innodb which was preventing widespread innodb adoption imho, since you currently are tied down to myisam for that feature. Hopefully it requires little to no code modification to keep fulltext support while just altering the db storage engine. From what I can tell it just requires a couple more my.cnf settings for innodb stopwords (for some reason doesn’t use the old myisam settings).

These tests on 5.6 from late 2012 seem to show innodb fulltext search can be three times faster than myisam under heavy load conditions.

Read more at: https://blogs.oracle.com/MySQL/entry/mysql_5_6_9_release


An Easy Way to Speed Up MySQL FullText Natural Search on Large Tables

As your WordPress or bbPress install grows over the years, if you have a large site you will notice the search slows down quite a bit on large tables. Another problem is the quality of search results becomes poor as it has to muck through more and more posts.

It’s difficult to jump to a solution like sphinx right away, and things like the google api are not aware enough about the context of your content, so some people will code their own searches within MySQL in the meanwhile.

For example, a minimal search is done something like the following, taking advantage of MySQL’s natural search mode which takes into account the number of times the words appears and their proximity to return a score.

SELECT post_id,  MATCH (post_text) AGAINST ('some words') AS score
FROM posts WHERE MATCH (post_text) AGAINST ('some words')
ORDER BY score DESC

In my situation, using a much more complex version of that query that takes into account all posts in a topic was taking up to 12 seconds which was completely unacceptable.

So it occurred to me that MySQL also has a binary search mode that is far faster. It’s not what I wanted for the overall results because it’s relevancy can be poor, but it instead could just be used to eliminate posts that would have to be scanned. Anytime you can eliminate rows in mysql you will get a speed boost.

So simply do this instead:

SELECT post_id,  MATCH (post_text) AGAINST ('some words') AS score
FROM posts WHERE MATCH (post_text) AGAINST ('some words' IN BOOLEAN MODE)
ORDER BY score DESC

Basically that IN BOOLEAN MODE will eliminate rows very quickly that do not have either of those words in them, but the SELECT still calculates a valuable natural search score for the remaining results.

When using boolean mode you have to take some extra steps to sanitize the query because some characters have special meaning so keep that in mind but it’s trivial to do.

Eventually I took it a step further and preprocessed the search terms even further. (more…)


Examine the size of your MySQL data & indexes

Do you have an active server with several different programs installed that use MySQL?
Maybe even multiple wordPress and bbPress installations?

Here’s a really handy query you can throw into phpMyAdmin or otherwise to see the largest tables, their number of rows, data vs index size and the ratio of data vs index:

SELECT CONCAT(table_schema, '.', table_name) db_table,
       CONCAT(ROUND(table_rows / 1000000, 2), 'M') rows,
       CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024  ), 1), 'M') size,
       CONCAT(ROUND(data_length / ( 1024*1024 ), 1), 'M') data,
       CONCAT(ROUND(index_length / ( 1024*1024 ), 1), 'M') indx,
       ROUND(index_length / data_length, 2) ratio 
FROM   information_schema.TABLES 
ORDER  BY data_length + index_length DESC LIMIT  30;

If you want to see a summary by database and not tables, try this:

SELECT count(*) tables,  table_schema dbase,
 concat(round(sum(table_rows) / 1000000, 2), 'M') rows,
 concat(round(sum(data_length + index_length) / (1024 * 1024 ), 1),'M') size,
 concat(round(sum(data_length) / (1024*1024 ), 1), 'M') data,
 concat(round(sum(index_length) / (1024*1024), 1), 'M') indx,
 round(sum(index_length) / sum(data_length), 2) ratio 
FROM information_schema.TABLES 
GROUP BY table_schema ORDER BY sum(data_length + index_length) DESC;

Thanks to: http://www.mysqlperformanceblog.com/2008/02/04/finding-out-largest-tables-on-mysql-server/ and http://www.mysqlperformanceblog.com/2008/03/17/researching-your-mysql-table-sizes/ (with a few tweaks by me)


(whatever happened to) MySQLTuner 2.0

MySQL DBAs are amazingly smart, they really impress me.

It’s really hard to figure out the right way to make MySQL perform better. Webservers and PHP have been improved to the point where mysql is often becoming the bottleneck.

I just finished porting over MySQL-Report into bbPress as a plugin. It was originally written by Daniel Nichter in Perl and then Munroe Richard converted it to PHP for Drupal, so I “borrowed” a great deal of his code to save some time. I am not that thrilled with the original code quality but I am feeling too lazy right now to rewrite it, and it works well enough.

Other analysis tools I use are tuner-primer and MySQLTuner (v1.2) The mysql summary from aspersa (now in the Percona Toolkit) is also somewhat helpful but not as useful.

For realtime tracking the updated version of MyTop from Mark Grennan (originally by Jeremy Zawodny) is somewhat useful on busy servers.

Now, does anyone know whatever happened to MySQLTuner 2.0 ?
Seems like they stopped working on it in 2010

Nice slideshow on it from Sheeri Cabral
http://technocation.org/files/doc/2010_10_mysqltuner.pdf

Video presentation with that slideshow
(more…)


WordPress still uses the nasty SQL_CALC_FOUND_ROWS

(update: please don’t try this patch as is on WP versions newer than 2.8 – I am only leaving it here as a suggestion if some coder in the future decides to tackle the problem)

We’ve known for over two whole years now that SQL_CALC_FOUND_ROWS did nasty things in bbPress 0.8

It was fixed by mdawaffe (Michael) after discovering how SQL_CALC_FOUND_ROWS caused an overload on the Automattic wordpress.org forum servers due to a MySQL bug.

But to this very day, it’s still used in all WordPress versions, up to and including 2.8.2

(Even more ironic, now that bbPress 1.0 has switched to the backPress core which is based on WordPress, SQL_CALC_FOUND_ROWS is back inside bbPress, though it works around the bug)

SQL_CALC_FOUND_ROWS is typically three times slower than using COUNT() on the same query without LIMIT and ORDER restrictions.

I’ve seen at least one slow-log for MySQL that is FULL of SQL_CALC_FOUND_ROWS queries from a large WordPress installation on a dedicated server that took 11-15 seconds per query (and crashed MySQL, repeatedly).

Here is my quick & dirty patch to the WordPress (and BackPress) core that attempts to change SQL_CALC_FOUND_ROWS to the COUNT() workaround – it’s inside wp-includes/query.php – it’s tested working but not heavily tested so use with caution and let me know if you have improvements?

My changes are against the file from WP 2.5.1 but the file has barely changed since 2.5, even 2.8 is virtually the same so it should be easy to modify other versions.

There is one other use of SQL_CALC_FOUND_ROWS left in WordPress but it’s in the admin section so I am not going to worry about it for now.

More about the MySQL bug due to it’s poor optimization:
http://bugs.mysql.com/bug.php?id=18454 (cache)
http://bugs.mysql.com/bug.php?id=19553 (cache)
http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/


Better Geo-IP Via An ip2nation Patch

update: this post is outdated as I have now switched to the free maxmind geolite database which is much more accurate, especially after some custom patches I have created (and will share eventually when I have time)

I finally have a website that gets alot of international visitors, so I tried out the nifty ip2nation free geo-ip database with some wordpress plugins to display country flags next to visitors comments. Makes it very pretty and facinating to see where everyone is coming from.

However I noticed a bunch of visitors being lumped in “EU” with the new EU flag. The problem is some of these visitors were clearly from the UK and other specific locations, EU was far too general and perhaps insulting to them that they didn’t see their own proper flag.

Upon checking the ip2nation database from 2006 December I found 1946 entries for EU. That’s far too many. Then after alot of googling I found a script that will do WHOIS queries to get more specific country information.

Since I had to spread out the WHOIS calls over 10 seconds apart to be nice to their servers for 2000 enquiries, this takes hours to grab. However there is no reason why I can’t share the patch info with others to help them the same way!

Here the data output from the whois queries, it can be imported directly into ip2nation mysql via phpmyadmin or similar
http://www.mediafire.com/?fznyjqmjinc
I did not remove the entries that were not converted from EU, so there is some waste.

There were 1946 EU entries to start, 820 left afterwards.
So a healthy conversion of 1126 entries! Enjoy…