Webmasters - Webserver Usage statistics - greatstatistics.com

For Webmasters


Be excited and proud at your website being taken into world website statistics report gathered by Greatstatistics.com's robot "Hcrawl". Just invite "Hcrawl" to crawl your site or to block "Hcrawl" crawling your website please follow the below instructions.

Information regarding our privacy policy can be found at Privacy Policy page.

The Great statistics crawler (robot), which calls itself as "Hcrawl" in the HTTP "User-agent" header field, uses a web-wide crawl strategy. To start with, it crawl's a list of known URLs from the entire Internet, and then it looks for local links found as the robot crawls. Major advantage of this method is that the disruption to the sites being crawled is negligible.

Greatstatistics.com will not index anything you feel not to be indexed. To exclude a link from crawling we strictly follow the Standard for Robot Exclusion (SRE) standard, developed by Martijn Koster at WebCrawler to allow content providers to control a robots behavior in their sites.

All crawlers look for a file called "robots.txt" to find the locations to be crawled by the robot. Usually Robots.txt is placed at the root of a site to direct the behavior of web crawling robots. The Hcrawl will fetch a copy of the robots.txt file, prior to crawling a website. If robots.txt file is modified while Hcrawl is crawling your site, please let us know so that Hcrawl can be instructed to get the updated information contained in the robots.txt file.

To exclude all robots, the robots.txt file should look like this:

User-agent: *
Disallow: /

To exclude just one directory and its subdirectories, say the /pictures/ directory, the file should look like this:
User-agent: *
Disallow: /pictures/

Webmasters can allow or disallow specific robots from visiting a part or the whole website. To allow Hcrawl to visit, but at the same time preventing all others robots, the robots.txt file should look like this:
User-agent: Hcrawl
Disallow:

To disallow Hcrawl from visiting, while allowing all others robots, the robots.txt file should look like this:
User-agent: Hcrawl Disallow: /

For further information regarding robots, crawling, and robots.txt visit the Web Robots Pages at www.robotstxt.org, an excellent source for the latest information on the Standard for Robots Exclusion.

©Copyright 2006-2009. All Rights Reserved. Terms of use| Privacy Policy, powered by HIOX India