making bbPress (and WordPress) work better!

filemtime – the performance killer ?!

(This is NOT an issue if you just use filemtime once or twice on a page on the web. It’s just something I discovered after trying for an hour to debug a very large PHP script I wrote to process a great deal of data.)

filemtime is a “simple” function in PHP which just returns the date that a file was last modified. Sounds straightforward enough eh? Well if you are checking 1000 files, filemtime will actually DOUBLE the amount of time used if just reading the file with file_get_contents! Crazy right? Well the logic makes sense when you think about it. file_get_contents can be cached by the OS. filemtime cannot, because it assumed you need the newest, latest, uncached date fresh off the file to see if it was recently modified, even if you run the script twice, immediate after itself.

Took awhile to figure this out, with lots of messy microtime debug code all over. Drove me crazy. The script can run in just over 60 seconds. With the filemtime check on each file, the script was taking like 110 seconds! Even when I moved the files to a ramdisk just to benchmark and debug some more (I thought get_file_contents was the problem) – I still had a 30% delay with filemtime. Apparently the OS (in this case Windows) just wanted to go back to the raw filetable each and every time.

I still don’t know how to work around this, though in theory I could just grab a raw snapshot of the directory via the shell (dir or ls) when the script starts and parse the results that way, bypassing PHP.

Again this is not a problem found by simple use on the web, though if you are on a host (like Dreamhost) with an NFS filesystem (external network based storage), keep it mind that filemtime will have to go the long hard way and bypass all caches to check the current file timestamp and that will definitely slow you down.

4 responses

  1. That’s interesting, especially that it’s slowed than file_get_contents(). That could be the real reason that the garbage collection in WP Super Cache hurts hosts that run on NFS servers. I thought it was the recurring through the directories!
    Perhaps I should limit the GC to X number of files at a time rather than the whole lot.

    September 28, 2008 at 3:41 am

  2. Yup that might be the culprit. Deleting a file can be easily write-cached by NFS, but getting the filemtime cannot and will be a cache-miss – it must wait for it to be read off the disk, over and over.

    October 9, 2008 at 1:54 am

  3. Leo

    From the PHP manual on filemtime():

    Note: The results of this function are cached. See clearstatcache() for more details.

    So i think that there IS a cache on this function. Maybe, if your process is changing the files, the cache will not work, because it will be deleted on each file write, but on unmodified files, maybe the info is cached.

    I don’t know for sure, i’m using it in a thumbnail generator and was looking for references on this function performance. I’ll continue researching.

    Regards.

    July 15, 2010 at 11:59 am

  4. Leo

    You should check the safe_mode configuration as well:

    http://bugs.php.net/40970

    July 15, 2010 at 12:06 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 35 other followers