Q. How can I control which pages are indexed by the
Search Engines?
A. By adding a robots.txt file to the root directory
of your website, you can help control the indexing of your site
by robots that ignore the <META NAME="ROBOTS" CONTENT="NOINDEX,
NOFOLLOW"> convention.
The mystery of the robots.txt file revealed
Author: turtle
Control which of your pages are NOT indexed with a robots.txt
file
You should add a robots.txt file to the root directory of all your
websites to help control the indexing of your site by robots that
ignore the <META NAME="ROBOTS" CONTENT="NOINDEX,
NOFOLLOW"> convention. In this file you specifically list
any pages that you DO NOT want walked
and indexed (such as password protected folders and folders which
contain only images, etc.). The robots.txt file is very simple yet
very powerful and every website should have a robots.txt file on
the root directory.
The Terminology
Create a new file with Notepad and call it robots.txt
The two conventions used in robots.txt file are User-agent:
and Disallow: /
User-agent: * By using the * or
wild card you are addressing ALL robots. If you wish to address
individual robots you need to list each robot separately with an
individual User-agent: statement. They must be listed by their specific
name or IP Address, along with a separate Disallow: / statement
listing the folders and files you DO NOT want the specified robot
to index.
Tip: Use the * wild card to address all
robots..... it is the safest way
Disallow: / List any folders that
you do not want to have indexed by robots.
Warning: Disallow: / used without any folder
name tells the robot do not index ANY page of the website.
ALL Files and folders in the directory named in
the Disallow: / statement as well as all of those under it will
NOT be indexed by robots.
Sample of Folders that could be in this website that we would not
like the spiders to index with the search engines:
Disallow: /tutorials/meta/
Disallow: /tutorials/images/
Disallow: /tutorials/assets/
Disallow: /tutorials/404redirect/
Example: Disallow: /tutorials/
Results: All files and
sub folders located within the folder tutorials
which includes all the folders listed in the above example as well
as any other sub folders of the tutorials
directory will not be indexed by the robots
if you use this statement.
This would mean that the /meta, /images, /assets, /404redirect,
AND any other folders as well as all of
the files in those foldes will not be seen by indexing robots.
You may also list specific files that you do not want indexed in
a robots.txt file.
Sample of Specific Files that could be in this website that we
would not like the spiders to index with the search engines:
Disallow: /tutorials/meta_tags.html
Disallow: /tutorials/custom_error_page.html
# Comments can be placed in a robots.txt
file by starting the line with #
::back to top::
The Examples
Download a sample robots.txt
or see below for an example.
###############################
#
# sample robots.txt file for this website
#
# addresses all robots by using wild card *
#
User-agent: *
# list folders robots are not allowed to index
Disallow: /tutorials/meta/
Disallow: /tutorials/images/
Disallow: /tutorials/assets/
Disallow: /tutorials/404redirect/
#
# list specific files robots are not allowed to index
#
Disallow: /tutorials/meta_tags.html
Disallow: /tutorials/custom_error_page.html
#
# End of robots.txt file
#
###############################
::back to top::
Related Tutorials
Introduction to Meta Tags
by turtle
URL: http://www.dwfaq.com/Miscellaneous/intro_to_metas.asp
Related Reference and Resources
You can read more about spiders (a.k.a. robots), META tags and
what they do, as well as search engine optimization at the following
URLs:
J.K. Bowman's Spider Food
URL: http://spider-food.net/handling-robots-b.html
Search Engine World
URL: http://www.searchengineworld.com/robots/robots_tutorial.htm
Search Engine Guide
URL: http://www.searchengineguide.com/1stsearchranking/2001/robots.html
Search Tools
URL: http://www.searchtools.com/robots/robots-txt.html
ZDNet
URL: http://www.zdnet.com/devhead/stories/articles/0,4413,1600632,00.html
|