- ICDL crawling
ICDL crawling is an open
distributed web crawling technology based onWebsite Parse Template (WPT).What is Website Parse Template?
Website Parse Template (WPT) is an
XML basedopen format which providesHTML structure description ofwebsite pages. WPT format allowsweb crawlers to generateSemantic Web ’s RDF triplets forweb pages . WPT is compatible with existingSemantic Web concepts defined byW3C (RDF and OWL) and UNL specifications.Distributed ICDL crawling
ICDL crawling involves parsing of websites’ content considering
HTML structure templates represented in WPT files.Distributed crawling is carried out by
open source client/server application installed on volunteers’personal computers . Afterauthentication procedures, application registers each PC as a Distributed Crawling node. Crawler periodically receives tasks from management console to download specifiedwebsites , parse their content and submit the results into Parsed ContentStorage . Crawling processes are activated when user’s computer is in idle and Internet connection is not in use.
Internet content parse results from several Crawlers are compared by management console to increase crawling results' accuracy grade. Crawling results can be stored to be used by thematic and general search engines with different search algorithms, such as
Google , Live,Yahoo! , Froogle, etc. to perform more accurateweb search .ee also
*
Website Parse Template
*Distributed web crawling
*Web search engine
*Web crawler
*OMFICA External links
* [http://www.w3c.org World Wide Web Consortium]
* [http://www.google.com/about.html Google]
* [http://www.msn.com Live Search]
* [http://www.yahoo.com Yahoo!]
* [http://www.omfica.org OMFICA]
* [http://www.google.com/products Google Product Search]
Wikimedia Foundation. 2010.