What is robots.txt
robots.txt is a text file which would be uploaded to the root directory of your website. It contains a set of rules for the Search Engine spiders.
Role of robots.txt
Robots.txt is mainly used to tell the web spiders to don’t crawl the following (given) links. One hing we do mind that robots.txt files cannot tell a spider to crawl and index the following page as indexing is the normal duty of a spider. I think you got the point. So no one can force the spiders to crawl their website as it is purely depends upon spiders. But one can block spiders from accessing certain part or even full of his website.
How to Write robots.txt
To write robots.txt follow the examples below.
1. Block spiders from crawling your entire website
To disallow a spider from crawling your website, the format should be.
User-agent: *
Disallow: /
2. Giving access to your website in robots.txt
To make it reverse we should change it to either
User-agent: *
Disallow:
Or
User-agent: *
allow: /
Please note that allowing a spider to your website doesn’t make any sense other than avoiding 404 error if spiders look for robots.txt file on your website.
3. Block spiders from accessing certain files on your site
To block spiders from accessing certain files from your website create a robots.txt file like below.
User-agent: *
Disallow: /cgi-bin/
Disallow: /wusage/
Disallow: /textures/
4. Block certain spiders from accessing your website
To block certain spider from accessing your website we need to write the robots.txt as:
User-agent: ” spider name”
Disallow: /
Eg:
User-agent: Googlebot-Image
Disallow: /
robots.txt is a text file which would be uploaded to the root directory of your website. It contains a set of rules for the Search Engine spiders.
Role of robots.txt
Robots.txt is mainly used to tell the web spiders to don’t crawl the following (given) links. One hing we do mind that robots.txt files cannot tell a spider to crawl and index the following page as indexing is the normal duty of a spider. I think you got the point. So no one can force the spiders to crawl their website as it is purely depends upon spiders. But one can block spiders from accessing certain part or even full of his website.
How to Write robots.txt
To write robots.txt follow the examples below.
1. Block spiders from crawling your entire website
To disallow a spider from crawling your website, the format should be.
User-agent: *
Disallow: /
2. Giving access to your website in robots.txt
To make it reverse we should change it to either
User-agent: *
Disallow:
Or
User-agent: *
allow: /
Please note that allowing a spider to your website doesn’t make any sense other than avoiding 404 error if spiders look for robots.txt file on your website.
3. Block spiders from accessing certain files on your site
To block spiders from accessing certain files from your website create a robots.txt file like below.
User-agent: *
Disallow: /cgi-bin/
Disallow: /wusage/
Disallow: /textures/
4. Block certain spiders from accessing your website
To block certain spider from accessing your website we need to write the robots.txt as:
User-agent: ” spider name”
Disallow: /
Eg:
User-agent: Googlebot-Image
Disallow: /
0 comments:
Post a Comment