Tutorial on how to redirect and rewrite URLs using Apache modules
Reasons for URL tidying
Rewriting URLs is useful for the following reasons
Easy to remember
It will make URLs easier to remember. For example, you may have a URL
http://exampleserver.com/view/index.php?user=bob
It is possible to rewrite it as
http://exampleserver.com/~bob
This will enable the user Bob to give out his URL to people and place it on his business cards.
Search Engine Optmization
Some search engines dislike URLs with GET arguments placed in the URL. e.g. you may have the URL
http://exampleserver.com/article.php?type=apache&article_id=231
Some search engines may skip over this, but if we rewrite the URL to
http://exampleserver.com/articles/apache/231
then it it is more likely to be indexed by the search engines.
Document redirection
Inevitably, when you haven't organised your URLs correctly and you need to restructure, redirecting users from the old URL to the new page is desirable. Apache can do this for you, so you don't even need to use the refresh meta tag for this.
It is better to give the browser an redirect error message, instead of having the meta tags do the redirection.
Using mod_rewrite
Take 5 minutes to try and understand the official documentation. This will save lots of time in the long run.
The easiest way to explain this is by example. Usually the web host, if it has mod_rewrite enabled (how to check can be found here), will allow you to upload a .htaccess file to configure the redirection
Example hiding a parameter to make it look like user home dirs
The example below translates the original URL
http://exampleserver.com/user/view.php?name=peter
to
http://exampleserver.com/~peter
This will allow the php script to run with the argument name equal to the given username.
The .htaccess contains the following:
RewriteEngine on RewriteRule ^~(.*) user/view.php?name=$1
You need to have some experience in using regular expressions to make sense of this, but hopefully the example was enough to get you started.
The ^ character is an anchor saying from the start of the line.
~ is literally the ~ we are trying to match in the URL.
(.*) is broken up into three pieces.
-
.
This means any character.
-
*
This follows the . character, and means any number of them.
-
()
This means count as part of a match, so we can refer to it using $1 for the first match.
Moving directories
I recently moved all our struts pages from the tomcat section to the JSP section. To preserve existing links to our site, I added the following rule
RewriteRule kb/app/tomcat/struts/(.*) /kb/prg/java/jsp/struts/$1 [R]
The second argument contains a leading /, because the files local path on the server was being added to the URL.
Domain Name prefix
You can redirect traffic for example.com to www.example.com
RewriteEngine On
RewriteCond %{HTTP_HOST} !^(.*)\.example\.com\ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
The RewriteCond above ignores requests that already contain a subdomain.
Alternatively, you can redirect all www.* traffic to your normal domain.
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
To redirect any host that doesn't match, you could use something like
RewriteEngine On
RewriteCond %{HTTP_HOST} !^example\.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
This helps to get rid of the numeric IP being used as the host name.
Moving Domain
You can also redirect traffic between domains
RewriteEngine On
RewriteRule old/pages/(.*) http://example.com/archive/$1 [R=301]
You can specify a new domain as part of the redirection.
301 means that it has been moved permanently.
If you use [R] instead of [R=301], then it defaults to sending a 302, which is moved temporarily.
Redirecting scheme from https to http
Add the following to your .htaccess
# turn on mod_rewrite
RewriteEngine On
# check that it is https
RewriteCond %{HTTPS} =on
# redirect to the plain http site
RewriteRule ^(.*)$ http://magicmonster.com/$1 [R=301,L]
Instead if .htaccess you can also add this into the module conf file.
mod_rewrit
If you have mod_rewrit instead, it isn't a typo, but an incomplete and undocumented version of mod_rewrite that is was meant to be more secure. Don't even bother trying to get support for this.
Troubleshooting
Logging
If you have access to the apache configuration, then you can edit the httpd.conf apache configuration file and add logging to mod_rewrite.
RewriteLog /usr/local/apache/logs/mod_rewrite.log RewriteLogLevel 9
9 is the highest level of logging, while 0 will turn it off. Make sure you turn it off once you've figured out the problem.
Directories don't match
If you are using mod_userdir or another module that does not map the URL directly to a directory path, you may end up with an invalid path. e.g. We have placed our .htaccess into the directory /home/aelst/public_html on the dev server, so that the url below
http://dev/~aelst/fish.html
actually maps to the file
http://dev/~aelst/whale.html
Normally you would use the following .htaccess file:
RewriteEngine on RewriteRule fish\.html whale.html
However, because of the mod_userdir, we get the following error:
Not Found The requested URL /home/aelst/public_html/whale.html was not found on this server.
We should be translating to the following url instead
http://dev/~aelst/whale.html
To fix this you can either added the directory path to the rule
RewriteEngine on RewriteRule fish\.html /~aelst/whale.html
or add another line into the .htaccess called RewriteBase
RewriteEngine on RewriteBase /~aelst/ RewriteRule fish\.html whale.html

