System: Using mod_rewrite to canonicalize and secure your domain
So you've set up your website at www.example.net, purchased an SSL certificate, and updated your links and bookmarks to use HTTPS. That should be enough, right? But you continue to see search engines and other traffic making requests over HTTP or without the 'www.'. What can you do?
Canonicalising using mod_rewrite
This is a basic job for the Apache mod_rewrite module which can intercept poorly formatted requests and trigger a 301 redirect to the correct location:
RewriteEngine On
# force to HTTPS and WWW
RewriteCond %{SERVER_PORT} 80 [OR]
RewriteCond %{HTTP_HOST} !www\.the-art-of-web\.com [NC]
RewriteRule (.*) https://www.the-art-of-web.com/$1 [R=301,L]
The RewriteCond conditions match any request that is using HTTP (port :80) intead of HTTPS (:443) as well as any that are not using the 'www' prefix on the domain or that have arrived through an ip address or alternative domain provided by your host.
The RewriteRule then triggers a 301 (Permanent Redirect) to the canonical URL for the requested resource using the request details.
This should resolve any search engine optimisation (SEO) issues, though you can go further using rel=canonical META tags and Headers.
If you forget to change the domain name in these rewrite rules to your own website you will be redirecting your traffic to The Art of Web.
Where to put rewrite rules?
The simplest option, which many hosts support, is to create a file .htaccess at the root of your website. But if you do this you should make sure that the file is protected from outside access.
By default Apache will include settings for this as follows:
# AccessFileName: The name of the file to look for in each directory
# for additional configuration directives.
#
AccessFileName .htaccess
#
# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
#
<FilesMatch "^\.ht">
Require all denied
</FilesMatch>
You can test this by going with your browser to ~/.htaccess for your domain to confirm that you get a 403 (Forbidden) response.
Rewrite rules can also be added to the global Apache configuration or to individual virtual host (vhost) files.
Blocking spambots and hackers
Even with the above rewrite rules you will still see continuing requests using the incorrect schema or domain from automated user agents. And while they will receive a 301 request they will either just follow or ignore it.
Some spambots even seem to think that a 301 response is an indication that their request has been effective.
While we can't prevent requests being made or do anything about random GET requests, we can do something about POST and other request methods which are most likely to be used in hacking attempts:
# block misdirected POST requests
RewriteCond %{REQUEST_METHOD} !GET
RewriteCond %{HTTP_HOST} !www\.the-art-of-web\.com [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule .* - [F]
Inserting this before the previous rewrite rules has the effect of short-circuiting any POST, HEAD or similar requests with a 403 (Forbidden) response before they get to the 301 (Permanent Redirect).
Spambots are also more likely to recognise this as a failure and, if you're keen, you can use Fail2Ban or a similar log monitoring tool to detect multiple failures and block them by ip address as well.
Testing your rewrite rules
The best way to test is not in your web browser, which sometimes caches 301 and similar responses, but from the command-line:
$ curl -X POST https://the-art-of-web.com
$ curl -X POST https://www.the-art-of-web.com
$ curl -X POST http://www.the-art-of-web.com
$ curl -X POST http://the-art-of-web.com
The HTTP response codes for the above POST requests in order should be: 403, 200, 403, 403. And if you change to GET: 301, 200, 301, 301.
Related Articles - mod_rewrite
- System mod_rewrite: Seach Engine Friendly URL's
- System Saving bandwidth with mod_rewrite and ImageMagick
- System mod_rewrite: Examples
- System Using mod_rewrite to canonicalize and secure your domain
- System Avoiding duplicate content filters