What it does
robots.txt
is a file that excludes certain URLs or parts of your website from being read by search engine robots
and other web crawlers (automated programs that surf and index the web). It works by placing robots.txt at
the top level, next to favicon.ico
. The robot will typically look at robots.txt
first,
and honour that file by not loading up the URLs for in that file.
It is only used for disallowing and excluding pages. It will not help search engines include or ‘find’ pages from your website.
Reasons for excluding pages for search engines.
Sometimes it is unnecessary for pages such as the forgot password or feedback pages to be displayed in the search results of google.
Another reason for excluding pages is that it dilutes your ranking if you have unrelated or non optimised pages being indexed.
Samples
Exclude specific pages:
User-agent: *
Disallow: /users/forgot_pass.php
Disallow: /users/contact.php
Allow all:
User-agent: *
Allow: /
Limitations
Pages are either indexed or they are not. For finer control of a page you can use robots meta tag.