Grub (search engine)

Grub (search engine)

Grub is an open source distributed search crawler platform. On July 27, 2007 Jimmy Wales announced that Wikia, Inc., the for-profit company developing the open source search engine Wikia Search, had acquired Grub from LookSmart. [ [http://www.wikia.com/wiki/Search_Wikia_OSCON "Jimmy Wales and Wikia Release Open Source Distributed Web Crawler Tool"] . Wikia, Inc. Press release. 27 July, 2007] . The cost was $50,000 [ [http://www.sec.gov/Archives/edgar/data/1077866/000119312507242517/d10q.htm#toc LookSmart SEC filing] , 2007] .

The project was started in 2000 by Kord Campbell, Igor Stojanovski, and Ledio Ago in Oklahoma City. [ [http://web.archive.org/web/20001209031600/www.grub.org/investors.html Grub Inc. Investors page as archived by Archive.org, December 2000] ] Undetermined copyright, patent or trademark rights from Grub, Inc. were purchased in 2003 for $1.3 million by LookSmart, Ltd. [ [http://www.sec.gov/Archives/edgar/data/1077866/000119312504214657/dex993.htm LookSmart SEC filing] , 2003] For a short time the original team continued working on the project, releasing several new versions of the software, albeit under a closed license.

There were several controversial issues surrounding the Grub project in the time shortly after LookSmart acquired it. Grub had a slight tendency to ignore a few mis-configured robots.txt files on the sites it crawled.Fact|date=July 2007 Even when the development team addressed these issues, a few webmasters continued blaming it for crawling their site too much, and not respecting their robots.txt files.Fact|date=July 2007

Another issue was the closing of the source code base, and the apparent lack of using the crawled data for anything useful, such as a searchable index of the sites it crawled. It appears that Grub was used for a short time to seed the URL list for NetNanny, another acquisition of LookSmart.

Operations of Grub were shut down in late 2005. The site was reactivated on July 27, 2007, and the site is currently being updated. The original developers are assisting with the new deployment, and investigating the robots.txt issue, to ensure a repeat performance does not occur.

Users of Grub can download the peer-to-peer grubclient software and let it run during computer idle time. The client indexes the URLs and send them back to the main grub server in a highly compressed form. The collective crawl could then, in theory, be utilized by an indexing system, such as the one being proposed at Wikia Search. Grub is able to quickly build a large snapshot by asking thousands of clients to crawl and analyze a small portion of the web each.

Wikia has now released the entire Grub package under an open source software license. However, the old Grub clients are not functional anymore. New clients can be found on the Wikia wiki.

References

External links

* [http://grub.org/ Official website]
* [http://search.wikia.com/wiki/Grub_Clients New clients]
* [http://search.wikia.com Search Wikia Project]


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Distributed search engine — A distributed search engine is a search engine where there is no central server. Unlike traditional centralized search engines, work such as crawling, data mining, indexing, and query processing is distributed among several peers in decentralized …   Wikipedia

  • Grub — can refer to:lang* Larva, most commonly of the scarabaeoidea (beetles) superfamily * a slang term for food; also as a verb to scavenge for food * a British word for a headless set screwPlaces* Grub AR, Grub, canton of Appenzell, Switzerland *… …   Wikipedia

  • Wikia Search — Не путайте с Википедией многоязычной свободной энциклопедией Wikia Search …   Википедия

  • Wikia Search — Infobox Website name = Wikia Search caption = The Wikia Search homepage in Firefox url = [http://search.wikia.com search.wikia.com] commercial = yes type = Search Engine language = English and multiple translations registration = optional owner …   Wikipedia

  • List of search engines — This is a list of Wikipedia articles about search engines, including web search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases.By… …   Wikipedia

  • Distributed web crawling — is a distributed computing technique whereby Internet search engines employ many computers to index the Internet via web crawling. Such systems may allow for users to voluntarily offer their own computing and bandwidth resources towards crawling… …   Wikipedia

  • Список поисковых машин — …   Википедия

  • Web crawler — For the search engine of the same name, see WebCrawler. For the fictional robots called Skutters, see Red Dwarf characters#The Skutters. Not to be confused with offline reader. A Web crawler is a computer program that browses the World Wide Web… …   Wikipedia

  • User Agent — это клиентское приложение, использующее определённый сетевой протокол. Термин обычно используется для приложений, осуществляющих доступ к веб сайтам, таким как браузеры, поисковые роботы (и другие «пауки»), мобильные телефоны и другие устройства …   Википедия

  • DuckDuckGo — DuckDuckGo …   Википедия

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”