Thinking about 'large' sites and scaling up to support huge numbers of users, it would be really great to be able to take out one of the major bottlenecks, that is database calls, and redundant static calculations/code.
You can obviously start by using
mysqls inbuilt query caching (turned on by default these days I think? correct me if I'm wrong). This helps a -lot- from my experiments, but if you're on a shared server or one where a lot of queries are being run (ie unique searches) the cache can quickly get stale.
Similarly, PHP has a lot of 'accelerators' which cache scripts, this is also well worth while doing, things like Zend Accelerator,
eAccelerator,
APC and the relatively new
XCache from the same people who develop
lighttpd - the super fast web server.
Anyhow, I wanted to approach this from a script-level, firstly as it's nice to handle these things without relying on outside applications and secondly, it is great to understand how these things work.
The method I'm going to use is what I would like to call a push-cache method, ie, because it doesn't rely on outside sources, the only time it needs to update the cache is when a user makes a change.
Essentially, I have a cache folder which contains a group of files, to make things a little nicer the filenames are md5 hashes (can be useful for urls, or filenames with weird characters, etc). When the template engine goes to generate the page, it first checks the cache folder for the md5 value of the requested URL. If it's found, *all* other actions can be discarded and the cache file simply read in and dumped to the browser.
This has huge benefits, as there is absolutely no database calls needed and is as simple as a single file being read in, incredibly speedy. If there is no cache file present, then the page is loaded normally and a cache file generated at the same time.
I'd be interested in hearing other peoples approaches to caching at a php level, I'm considering using template caching for specific items and retaining -some- normal pages (ie, its a little hard to effectively cache search pages).
#1 Jeroen Mulder says:
I use a similar caching system at www.crumbl.com . There's no user interaction or user specific information just yet, so I can cache the complete pages to a .html file and serve them up. Then all I need is a cron running in the background that updates the cache every five minutes (synchronised with the feed updating process).
For a more complicated level I would probably opt to cache on a database level, by introducing a Data Presentation Layer. This layer would contain non-normalized redudant information to reduce the number of JOINs and queries.
PS: It's good to see you blogging again ;-)