- Googlebot
A
Googlebot is asearch bot used byGoogle . It collectsdocument s from the web to build a searchable index for theGoogle search engine.If a
webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in arobots.txt file, [ [http://www.google.com/support/webmasters/bin/answer.py?answer=33570&topic=8846 How do I request that Google not crawl parts or all of my site? ] ] or by adding the meta tag
to the webpage. [ [http://www.google.com/support/webmasters/bin/answer.py?answer=33581&topic=8460 How do I prevent Googlebot from following links on my pages? ] ] Googlebot requests toWeb server s are discernible from their user-agent string 'Googlebot'.Googlebot has two versions, deepbot and freshbot. Deepbot, the deep crawler, tries to follow every link on the web and download as many pages as it can to the Google indexers. It completes this process about once a month. Freshbot crawls the web looking for fresh content. It visits
websites that change frequently, according to how frequently they change. Currently Googlebot only followsHREF links and SRC links. Verify source|date=July 2008Googlebot discovers pages by harvesting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from another known page on the web in order to be crawled and indexed.
A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate. [https://www.google.com/webmasters/tools/docs/en/about.html]
ee also
*
Mediabot
*Robots Exclusion Standard References
External links
* [http://www.google.com/bot.html Google's official Googlebot FAQ]
Wikimedia Foundation. 2010.