robots.txt exclusion file

Published: Sunday, 8 January 2006

What it does

robots.txt is a file that excludes certain URLs or parts of your website from being read by search engine robots and other web crawlers (automated programs that surf and index the web). It works by placing robots.txt at the top level, next to favicon.ico. The robot will typically look at robots.txt first, and honour that file by not loading up the URLs for in that file.

It is only used for disallowing and excluding pages. It will not help search engines include or ‘find’ pages from your website.

Reasons for excluding pages for search engines.

Sometimes it is unnecessary for pages such as the forgot password or feedback pages to be displayed in the search results of google.

Another reason for excluding pages is that it dilutes your ranking if you have unrelated or non optimised pages being indexed.

Samples

Exclude specific pages:

User-agent: *
Disallow: /users/forgot_pass.php
Disallow: /users/contact.php

Allow all:

User-agent: *
Allow: /

Limitations

Pages are either indexed or they are not. For finer control of a page you can use robots meta tag.