System: Apache mod_pagespeed settings
Google's recently released mod_pagespeed module for Apache 2 is causing a stir in the developer community. While there are claims that it can reduce download times by up to 50% for some websites, many developers are seeing little or no improvement on already-optimised websites. Our testing seems to bear this out.
The Pagespeed module has been designed to drastically clean up those websites with the worst HTML coding practices, making them faster and more likely to validate according to W3C standards. One can imagine that Google has been using something like this internally to make sense of the billions of pages of badly formatted HTML and CSS in it's index.
If your website includes dozens of CSS or JavaScript files, scales images for display using only HTML and is not cache-friendly then even the default installation will work wonders. If you would sooner run naked through the street than make those kind of errors then you will need to make a more detailed examination before deciding whether to install the module yourself.
Installing ModPagespeed
The mod_pagespeed package is now available for installation using yum (RedHat, CentOS) and apt-get (Debian, Ubuntu). For Debian servers the update command is:
# apt-get upgrade mod-pagespeed-beta
# /etc/init.d/apache2 restart
For reference, our local configuration contains the following settings which enable/disable some filters:
<IfModule pagespeed_module>
ModPagespeed on
...
ModPagespeedDisableFilters inline_css,inline_javascript
ModPagespeedEnableFilters move_css_to_head
ModPagespeedEnableFilters rewrite_css,rewrite_javascript
ModPagespeedEnableFilters outline_css,outline_javascript
...
</IfModule>
Unless otherwise noted, the following comments relate to
ModPagespeed version 0.9.8.1 0.9.10.1 (Nov 2010).
Filters enabled by default
The default settings for mod_pagespeed enable all the filters listed in this section (known collectively as CoreFilters).
Filters can be individually disabled using the ModPagespeedDisableFilters directive and others enabled using ModPagespeedEnableFilters as shown above. We have used colours in the list below to indicate which filters are currently active on this server.
You can also set ModPagespeedRewriteLevel to PassThrough which will disable all filters so you add back just the ones you want.
• Add Head (add_head)
When a page encountered has no <head>, the module inserts an XHTML-style <head/> tag at the top of the page. See also Combine Heads.
• Combine CSS (combine_css)
Multiple CSS files included anywhere on the page using the <link> tag, and from the same domain, are combined into a single file, which is made cacheable and can be minified (see rewrite_css for limitations).
• Extend Cache (extend_cache)
This affects all rewritten images, CSS and JavaScript files. They are assigned a longer filename containing a hash code and set to be cacheable for up to one year.
Important: the internal mod_pagespeed spider (Serf) relies on your server settings to determine how often it re-examines content to see if it has changed. If you have a very long TTL (set in mod_expires for example) it will take a long time for the re-written image to update and you may want to reduce the TTL to around 300 or 600 seconds.
This filter can have the effect of destroying
If-Modified-Since header handling and clash with other headers
(more details below).
Cache extension is not done on resources that have cache-control:private or cache-control:nocache unless you set the ModPagespeed value ModPagespeedForceCaching to on (for testing purposes only).
• Inline CSS (inline_css)
Includes the contents of small external CSS files (up to ModPagespeedCssInlineMaxBytes) in the page itself. The default cutoff is 2kb (2048 bytes).
This reduces the number of requests, but at the expense of cacheability. It may be useful for extremely lazy programmers. Not recommended.
• Inline JavaScript (inline_javascript)
As for the CSS inlining filter above, this will include the contents of small external JavaScript files (up to ModPagespeedJsInlineMaxBytes) in the page itself. The default cutoff is again 2kb.
Again, not recommended.
• Optimize Images (insert_img_dimensions)
To use this filter, you first need to have enabled rewrite_images (below).
Inserts any missing width and height attributes in <img> tags. Really something you should be taking care of yourself. Might be handy for some forums and blogs. Harmless.
What is more helpful is if you ever do have a good reason to set an image to a different width/height using HTML, mod_pagespeed will create a version of the image at the new size (see below).
• Optimize Images (rewrite_images)
This is the most powerful filter, and has also the largest footprint. It will attempt to compress and strip metadata from images. For us, this has only a limited effect we already do something similar during the upload process.
This filter will generate new versions of images where the width and height attributes are smaller than the actual image dimensions. So if you embed a 1Mb image as a 100 pixel thumbnail it will create and serve the thumbnail and save the world from the original.
One issue with this filter is that images are assigned a much longer filename (for example images/BikeCrashIcn.png.pagespeed.ic.HASH.png instead of images/BikeCrashIcn.png) which lengthens your HTML and may affect search engine optimisation if you're used to getting traffic via image search engines.
When the image file is modified, the HASH portion of the URL is updated, but references to the previous URL will continue to work. In fact requests with any (or no) characters in place of the hash will resolve to the same image.
A nice feature, embedded images with a size less than ModPagespeedImgInlineMaxBytes (set to 2kb by default) will be converted to inline data strings. And the module knows (we trust) to serve this only to supporting browsers. May also impact image search if these are served to search engines.
Update: In 0.9.16.9 images referenced from external CSS files will also be optimised, though never 'data-ified'. Also CSS backgrounds defined in inline CSS styles are not being optimised at all.
Another new parameter ModPagespeedImgMaxRewritesAtOnce lets you specify how many images will be re-written at once. This applies system-wide so the server will not be busy trying to optimise more than the specified number of images at the same time. The default value is 8.
• Trim URLs (trim_urls)
This filter removes unnecessary components from href and src URLs in the HTML and, if rewrite_css has been enabled, in CSS files.
This includes converting absolute links into relative ones, and even removing the protocol (e.g. http:) for links where the protocol of the target URL matches the current page. This is harmless for modern browsers, but it confusing the heck out of some spiders.
Note: This is a new filter added to CoreFilters in 0.9.16.9 so enabled by default. After some serious teething problems (see Issue 234 and Issue 238) it is now playing nicely with mod_rewrite.
Filters NOT enabled by default
The following filters are not part of CoreFilters so are not active by default and need to be explicitly enabled in the configuration file. The reason they are not activated by default is that there are known issues which could break some websites. Be warned.
• Combine Heads (combine_heads)
One <head> is better than two. Might be handy if you're ripping other websites and inserting them into your template without any processing, but really, you should never need this.
• Strip Scripts (strip_scripts)
Completely remove scripts from a page. Usefule for testing and timing purposes.
• Outline CSS (outline_css)
Replaces CSS style blocks of a size greater than ModPagespeedCssOutlineMinBytes with an external, cacheable, CSS file. By default only blocks equivalent to 3,000 bytes or more are affected.
• Outline JavaScript (outline_javascript)
Replaces JavaScript code blocks of a size greater than ModPagespeedJsOutlineMinBytes with an external, cacheable, JavaScript file which can then be minified. By default only blocks equivalent to 3,000 bytes or more are affected. Handy for complex page-specific scripts such as form validation.
If you have the same block of inline JavaScript on a number pages, that can't easily be placed in it's own file, this will allow all those pages to link to a common, generated, script URL. There are some issues, however, with trim_urls when mod_rewrite is in play.
• Move CSS to HEAD (move_css_to_head)
CSS style blocks (not inline styles) are moved into the <head>. This is handy if you're working inside a fixed template and can't directly edit the <head> section for individual pages.
These blocks can also be re-written (minified), but are not currently combined into a single CSS style block. Instead the <style> tags back up against one another.
• Rewrite CSS (rewrite_css)
The biggest problem here is that the parser does not yet recognise a range of CSS3 selectors and styles and even a single unrecognised line causes the parser to 'bail' and not minify any of the CSS in the same CSS file or code block. When the parser does recognise all your CSS syntax it works perfectly.
Update: In 0.9.16.9 many more CSS3 styles are being recognised, including vendor-prefixes. Most, but not all, of our external style sheets are now being minified.
• Make Google Analytics Asynchronous (make_google_analytics_async)
Sorry, but what were they thinking having this as a filter? If anything it should just spit out a warning that you're using old code and link to the instructions for migrating to Async.
• Minify Javascript (rewrite_javascript)
Works extremely well on locally hosted scripts. One potential problem is that comments are removed, which may be an issue if you're using scripts that require some form of attribution in the code.
This filter is considered 'high risk' by the Google team as it will break some popular JavaScript libraries. Needs to be tested on a case-by-case basis.
• Remove Comments (remove_comments)
Removes all HTML comments except for IE conditional comments. You will want to check first that none of your comments are required. For example, when using ht://Dig, HTML comments can be used to exclude sections of the page from the site search.
Update: There may be a new feature coming to allow certain comments (specified using wildcard syntax) to be left in the page. While this would solve some problems, really being able to turn this filter on/off based on user again would be better.
• Collapse Whitespace (collapse_whitespace)
Removes unnecessary line breaks, spaces and indenting from the page. This could have a big impact on some generated or WYSIWYG-edited HTML pages with huge indents.
Unfortunately, breaks the CSS style white-space: pre; which is used on this website, for example, to style the <code> blocks.
• Elide Attributes (elide_atttributes)
Removes attributes from tags "when the specified value is equal to the default value". This can have unexpected consequences such as when removing type="text" which then breaks the CSS selector input[type="text"]. There is a workaround for this, however, so whether you enable this filter really depends on how you want your code to look.
• Remove Quotes (remove_quotes)
Whether to use this filter, like Elide Attributes above, is a matter of preference as to how you want your code to appear. Any savings will be marginal as we're only talking about a few quote characters.
• Add Instrumentation (add_instrumentation)
This final filter inserts JavaScript code at the start and end of HTML pages to track page load times and record other statistics on mod_pagespeed operations.
To enable the collecting of statistics your configuration file should contain the following (uncommented) commands:
...
ModPagespeedEnableFilters add_instrumentation
<Location /mod_pagespeed_beacon>
SetHandler mod_pagespeed_beacon
</Location>
<Location /mod_pagespeed_statistics>
Order allow,deny
Allow from localhost
SetHandler mod_pagespeed_statistics
</Location>
...
After you reload Apache each page will include a request for a 'beacon' image passing the page load time in milliseconds. Other statistics are collected in the background.
To view the accumalated statistics just go to the address /mod_pagespeed_statistics and you will see something like the following:
resource_fetches_cached: 716
resource_fetch_construct_successes: 1
resource_fetch_construct_failures: 0
total_page_load_ms: 365365
page_load_count: 81
cache_extensions: 2251
not_cacheable: 0
css_file_count_reduction: 303
css_filter_files_minified: 112
css_filter_minified_bytes_saved: 20405
css_filter_parse_failures: 360
css_elements: 74
image_inline: 331
image_rewrite_saved_bytes: 21741
image_rewrites: 232
image_ongoing_rewrites: 0
javascript_blocks_minified: 543
javascript_bytes_saved: 56983
javascript_minification_failures: 0
javascript_total_blocks: 601
resource_url_domain_rejections: 571
url_trims: 0
url_trim_saved_bytes: 0
resource_404_count: 0
slurp_404_count: 0
serf_fetch_request_count: 561
serf_fetch_bytes_count: 4709324
serf_fetch_time_duration_ms: 99023
serf_fetch_cancel_count: 0
serf_fetch_outstanding_count: 0
If you get a 403 Forbidden error, try replacing localhost in the configuration with either the domain name or the ip address you are using to access the internet. (e.g. Allow from 3.138.32.53). Only one set of statistics is collected which includes data from all websites hosted on the server.
What all the different statistics mean is not yet clear.
Global Variables
There are a few global variables that are not very well explained in the documentation, but there are some clues in the code:
- ModPagespeedFileCacheSizeKb
- Set the target size (in kilobytes) for file cache. (default: 100Mb)
- ModPagespeedFileCacheCleanIntervalMs
- Set the interval (in ms) for cleaning the file cache. (default: 1hr)
- ModPagespeedLRUCacheKbPerProcess
- Set the total size, in KB, of the per-process in-memory LRU cache. (default: 1Mb)
- ModPagespeedLRUCacheByteLimit
- Set the maximum byte size entry to store in the per-process in-memory LRU cache. (default: 16kb)
- ModPagespeedFetcherTimeoutMs
- The timeout period for requests by the internal spider (Serf). Defaults to 5ms.
Basically what happens is a cache of files builds up at the specified location (ModPagespeedFileCachePath). Then every ModPagespeedFileCacheCleanIntervalMs milliseconds, if the cache has grown larger than ModPagespeedFileCacheSizeKb, the LRU (Least Recently Used) files are removed.
The other 'LRU Cache' variables let you control how much memory can be used for managing the cache. Note that these values apply per Apache process.
New features as of 0.9.11.3
.htaccess files and Directory scopes
The simplest (not the most efficient) way to disable mod_pagespeed for a specific website or directory, or to apply other site-specific settings, is to use the .htaccess file:
<IfModule pagespeed_module>
ModPagespeed off
</IfModule>
Just place this at the top of an .htaccess file in the directory for which you want to disable ModPagespeed. Commands can also be targeted using the <Directory> grouping option.
See also Bug Reports for some background and commentary.
Restricting Resouce Rewriting Via Wildcards
We can now tell mod_pagespeed to avoid processing certain requests. For example, to keep the Serf spider from ever requesting URLs ending in captcha.png - in any website - we add the following to the main configuration file:
ModPagespeedDisallow *captcha.png
There is also a ModPagespeedAllow directive. The regular expressions match the 'fully expanded URL', so should start with 'http://' or the wildcard *. Each Allow/Disallow directive will take priority over those preceding.
You can disable parsing for individual files, such as problematice CSS or JavaScript includes, using .htaccess as shown in this example:
<IfModule pagespeed_module>
ModPagespeedDisallow *some-css-file.css
</IfModule>
New features as of 0.9.16.9
Domain Sharding
Domain sharding is the practice of splitting your website content, even when it comes from the same location, over a number of different domains or subdomains.
Your browser limits the number of files that can be downloaded from a single domain at the same time, and additional items have to wait for one to finish. By sourcing page elements from multiple domains, you allow for more downloads to occur at once thus reducing wait times.
For details, read the official documentation on Sharding Domains.
You can also find information there on "Authorizing Domains", "Mapping Origin Domains" and "Mapping Rewrite Domains". These are all advanced features requiring Apache server configuration.
References
Related Articles - ModPagespeed
- System Apache mod_pagespeed settings
- System Apache mod_pagespeed issues
Dan Potter 2 August, 2013
I ended up on this page while trying to find a way to report stats of Pagespeed.
Pagespeed appears to be running, I see CSS being comined, images being served differentl, and even from the CDN its working.
however when i look in the pagespeed cache there s just one file.
When I go to /mod_pagespeed_statistics I was getting a 403, I've allowed my IP address now I get error 500.
Has anyone else seen the 500 error while getting the stats.
Everything else appears to be working fine.
Regards
Dan.
mod-pagespeed-discuss is the best place to report this kind of problem, but for a 500 error you should check your logs for errors as it's a server error. Maybe a typo in your htaccess file?
Kumar Deepam 26 November, 2012
Great post! Our website www.meraevents.com has mod_pagespeed installed and working properlly however we have some of our images/css/js stored in this location content.meraevents.com/images/ and unfortunately the mod_pagespeed is not taking care of files stored in this location.
Appreciate your help to get it configured for this.
You just need to add to your configuration and reload:
ModPagespeedDomain content.meraevents.com
This will work as long as mod_pagespeed is running on both domains - even if they're on different serveres.
Michael Pehl 10 July, 2012
I am following the mod_pagespeed discussions for nearly 1,5 years, this module still have bugs.
I do frontend performance optimizing by my own. Then I know that there is NO bug and it works.
It's still in BETA so there are meant to be bugs
Most of the 'core' modules are ok, but you need to check they are appropriate for your server and be careful when enabling non-core features. Also keep an eye on the cache size, inode and memory usage.
mynet okey 1 June, 2012
i have hight load in my server
load average: 6.78
mod_pagespeed
You should try to identify which filter is causing the problem by turning them off and then reenabling them one at a time while monitoring the server load. Depending on the filter the problem could be that the cache is not large enough or a problem with domain mapping.
A good place to look for answers is mod-pagespeed-discuss
Shawn 22 March, 2011
Nice article.
You mention that background images in inline CSS are not optimized, could you send a link to a page where this happens? Both inline and external CSS should be minified and have all images rewritten (if you have the appropriate filters enabled).
Also, ModPagespeedImgMaxRewritesAtOnce is per-server because the intention was not to overload your server by rewriting all images at once.
Cheers,
-Shawn
mod_pagespeed team
Thanks. sent you the details