filemtime – the performance killer ?!
(This is NOT an issue if you just use filemtime once or twice on a page on the web. It’s just something I discovered after trying for an hour to debug a very large PHP script I wrote to process a great deal of data.)
filemtime is a “simple” function in PHP which just returns the date that a file was last modified. Sounds straightforward enough eh? Well if you are checking 1000 files, filemtime will actually DOUBLE the amount of time used if just reading the file with file_get_contents! Crazy right? Well the logic makes sense when you think about it. file_get_contents can be cached by the OS. filemtime cannot, because it assumed you need the newest, latest, uncached date fresh off the file to see if it was recently modified, even if you run the script twice, immediate after itself.
Took awhile to figure this out, with lots of messy microtime debug code all over. Drove me crazy. The script can run in just over 60 seconds. With the filemtime check on each file, the script was taking like 110 seconds! Even when I moved the files to a ramdisk just to benchmark and debug some more (I thought get_file_contents was the problem) – I still had a 30% delay with filemtime. Apparently the OS (in this case Windows) just wanted to go back to the raw filetable each and every time.
I still don’t know how to work around this, though in theory I could just grab a raw snapshot of the directory via the shell (dir or ls) when the script starts and parse the results that way, bypassing PHP.
Again this is not a problem found by simple use on the web, though if you are on a host (like Dreamhost) with an NFS filesystem (external network based storage), keep it mind that filemtime will have to go the long hard way and bypass all caches to check the current file timestamp and that will definitely slow you down.