To those who don’t know much about the “cache” concept we could say it’s a system to speed up websites’ response and reduce the server load, avoiding the repetition of executing the same code several times in an unnecessary manner.
So as to say: when sites load (especially if they’re dynamic sites) there’s a waiting time that the user must bear with while the site is processed and the requested returned. The main goal is to cut down all you can this waiting time because if you keep them waiting a lot they’ll surely abandon ship. If the site’s goal is to sell or enact business then we probably lose a potential client.
Cache is about keeping copies of content or part of content already generated which must be returned to the client. Like this, if they receive a second request for the same content, it’ll be the cache the one to return it, reducing waiting time and server load.
We can split cache in two main groups:
- Contents cache
- Coding cache
Here we’ve got the browser caches and the proxy caches. The browser caches are controlled and stored by the client’s own browser and the proxy ones by some middle agent or node between the client and the server, generally used by ISPs. We can also include the gateway caches, typically located in a cargo balancers.
This kind of caches are based on storing a copy of the page and images and other files in the page requested by the user. This copy will have a validity time as we’ll see further on. Like this, when the user asks for the same page more than once or returns to previous pages with the “previous” button, the copy is shown instead of requesting it again. Depending on the cache leve, this will affect one user, in the browser’s case, or a group of user in the proxy’s or the gateway’s case.
The manner to control these stored versions is usign the HTML meta tags and the HTTP headers. The meta tags are easy to include because we can simply add certain goals inside of the pages’ headers although they normally aren’t taken into account by proxy caches. THe HTTP headers allow us to have further control over the page because they’re sent by the server before the HTML code and checked up by browsers and proxies. The most important elements in cache control are:
Can set the date in which this content (a whole site, a photo, etc.) expires.
Until then the cached contents will be used but once it’s been overcome there’ll be a check to see if the content changed to know if the stored version doesn’t differ. We’ll explain how to define content if it’s changed. The value of this header will be a date in HTTP format like:
Expires: Wed, 24 Apr 2013 14:19:41 GMT
It’s reccomended to calculate the date in which a content expires because if it’s set in the code then the content past that time would be considered outdates and it’ll always look up to check if it’s changed. In PHP we can control that header using the header function. An example: to set a content with 1 day expire date:
header(‘Expires: ‘ . gmdate(‘D, d M Y H:i:s’, time()+24*60*60) . ‘ GMT’);
Gives out further info about how the cache must behave with thsi conent. This header overwrites the “expires” header because it’s more specific. We can configure different attributes of this header to adjust it to our needs. We can set expiring to happen in 1 hour’s time with this header:
Last-Modified & ETag
These headers allow the browser or proxy to decide if the content which they have stored has changed or has expired. Last-Modified will tell the date of the last content change and ETag is a unique content ID which changes if the content changes.Like this, once the date which “Expires” or “Cache-Control” have to mark a content as “expired” has passed then these two headers will help decide if the content must be loaded once again.
This group refers to the actions we can realize in our sites’ programming to avoid excuting repeated web code for similar or equal requests. It’s based on reusing code previously generated based in keys which ID a unique copy.
In other words: if we’ve got a PHP file which returns a page with the detail of a product received as parameter (or pointed out in a friendly URL) we could define that we’ll need a different copy of the HTML code generated for each product and we’d define the product’s ID as key. It’s important to know to ID which elements make our code’s versions be different at any time.
We also can definie code cache at different levels according to the page’s needs. We save a copy of the generated HTML content for the whole page according to its key or we store a copy of a portion of common code in different pages like a header, a banner zone or we could even store a copy of database lookups results so that we don’y need to realize the same look-up different times if it’s going to return the same result. In this case it’ll be easy to define the cache key as the parameters which we use in the look-up or even in the look-up’s own text.
When it comes to these caches it’s important to know the key which will define plus the renewing of the caches. If a database product changes, the caches which store this product will have to renew to show the new info to the user.
There are different PHP systems to work with caches using oru codes. It’s important to realize a good analysis before dedicing for a concrete one because, according to the needs of our app, a system can benefit or impair us. There are systems which store the copies in databases or files. There are systmes like memcached which store the copies in memory to speed up their access.
In the next posts we’ll further into code caches using PHP plus comparing systems or the use of memcached.
Caches aren’t only used by the browsers so that the users need to wait less. Even browsers like Google store copies of the sites to improve their search systems. This can be used in some unexpected cases.