How to use Robots.txt

Robots.txt file is a file placed in your main directory and concerns commands to crawler visiting your site. The significance of a robots file can mean certain pages/sections can be “crawled” or not crawled depending on the issues given.

Using a Robots File efficiently

In general we wish for as much as exposure as possible to our sites, but there some content that you don’t want indexed and listed on search engines. This is where a robots.txt can be used effectively.


User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
Disallow: defines which folders or files will be expelled. None means nothing will be expelled, / means everything will be expelled or /folder name/ or /filename can be used to specify the values to expelled.
Allow: this parameter works just the opposite of Disallow. You can mention which content will be allowed to be crawled here. * is a wildcard.
Request-rate: defines pages/seconds to be crawled ratio. Example, 1/20 would be 1 page in every 20 second.
Crawl-delay: defines how many seconds to wait after each successful crawling.
Visit-time: you can describe between which hours you want your pages to be crawled.
Sitemap: this is the parameter where you can show where your sitemap file is (You must use the complete URL address for the file).


This the robots.txt We can use on our site:

User-agent: *
Disallow: /cms/feed/
Disallow: */feed/*
Disallow: /feed
Disallow: /cms/wp-content/
Disallow: /cms/wp-plugins/
Disallow: */wp-content/*
Disallow: /cms/wp-content/plugins/
Disallow: /cms/index.php

January 17th, 2010 by