BotSeer

BotSeer

BotSeer is a Web-based information system and search tool that provides resources and services for research on Web robots and trends in Robot Exclusion Protocol deployment and adherence. It has been created and designed by [http://www.personal.psu.edu/yus115 Yang Sun] , [http://www.personal.psu.edu/~igc2/ Isaac G. Councill] , [http://www.personal.psu.edu/users/z/x/zxz127/ Ziming Zhuang] and C. Lee Giles.

BotSeer provides three major services including robots.txt searching, robot bias analysis [Yang Sun, Z. Zhuang, I. Councill, C.L. Giles, " [http://www.personal.psu.edu/yus115/docs/sun_robotstxtbias.pdf Determining Bias to Search Engines from Robots.txt] ," "Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI 2007)", 149-155, 2007.] , and robot-generated log analysis. The prototype of BotSeer also allows users to search six thousand documentation files and source codes from 18 open source crawler projects. BotSeer serves as a resource for studying the regulation and behavior of Web robots as well as information about the creation of effective robots.txt files and crawler implementations. Currently, it is publicly available on the World Wide Web at the College of Information Sciences and Technology at the Pennsylvania State University. BotSeer has indexed and analyzed 2.2 million robots.txt files obtained from 13.2 million websites, as well as a large Web server log of real-world robot behavior and related analysis. BotSeer's goals are to assist researchers, webmasters, web crawler developers and others with web robots related research and information needs.

BotSeer has also set up a honeypot [http://www.v4d.net] to test the ethicality, performance and behavior of web crawlers.

References

* cite web
url= http://sg.us.biz.yahoo.com/ap/071128/tech_bit_internet_searches.html?.v=2
accessdate= 2008-01-15
title= Webmasters May Shape Search Results
publisher= Associated Press
date= November 28, 2007

* cite web
url= http://www.networkworld.com/news/2007/111507-google-favored.html
accessdate= 2007-12-19
title= Google favored by Web admins
publisher= Network World
date= November 15, 2007

ee also

* Robots Exclusion Standard
* Web Crawlers

External links

* [http://botseer.ist.psu.edu BotSeer website]
* [http://www.v4d.net Honeypot: Web Crawler Test]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Robots exclusion standard — selfref| For restricting Wikipedia bots, see .|The robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of …   Wikipedia

  • Lee Giles — Dr. C. Lee Giles is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems …   Wikipedia

  • CiteSeer — was a public search engine and digital library for scientific and academic papers. It is often considered to be the first automated citation indexing system and was considered a predecessor of academic search tools such as Google Scholar and… …   Wikipedia

  • Robots.txt — robots.txt  файл ограничения доступа к содержимому роботам на http сервере. Файл должен находиться в корне сайта (то есть иметь путь относительно имени сайта /robots.txt). При наличии нескольких субдоменов файл должен располагаться в… …   Википедия

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”