- Web-scraping software comparison
This article provides a basic feature comparison for several types of
web scraping software. Additional feature details are available from the individual products' websites and/or articles. This article is not all-inclusive or necessarily up to date.The comparisons are made on the stable versions of software – not the upcoming versions or beta releases – and without the use of any add-ons, extensions or external programs (unless specified in footnotes).
Prices
This following companies are listed with their pricing and trial details.
Cross-platform
The following software packages have their operating system compatibility listed.
Features & Capabilities
The following table highlights several key features that are available on web-scraping software packages. To see the definitions of the Features & Capabilities, look below the table.
Features and Capabilities Definitions
* RSS Feed: The program can place the output into an
RSS Feed.* Interact w/ Database: The program can read inputs/write outputs into a Database.
* Extract Links: The program can read in links as it looks for data.
* Write to XML: The program can write data into
XML format.* Download Files: The program will download files as well as scrape data from a web page.
* Anonymous Proxies: Occasionally, sites block scrapes by blocking ip addresses from where the user is scraping. Anonymous Proxies allow the user to continually scrape the site by generating new ip addresses.
* Export to spreadsheet: This indicates that the program can export the data scraped into a spreadsheet.
* Built-in Timer: When there is a built-in timer, the user can more easily scrape at a desired set time.
* Extract Table: The program can extract the data from a table on the scraped web page.
* Traverse pages/Fill forms: The program can go through a web page and fill in forms automatically.
* Standalone web-scraper robots: Upon the creation of a robot, the program automatically scrapes so you don't have to do it manually.
* Server: The program can act much like a database server would act. This allows the possibility of invoking the program, to scrape needed data, through programs designed by the user.
* Custom parsing: Uses customizable delimiters for parsing rather than something like the DOM. This gives greater versatility at the expense of being more complex.
* DOM parsing: Uses objects from the DOM to parse HTML.
* Visual Learning: Generate web extraction code/rules by visual demonstration, including a recording interface.
* Multi-thread: Scrape web data in multi-thread mode.
ee Also
*
Screen scraping
*Web scraping Notes
For software that does not incorporate a timer, but can be run on a Linux Platform, Linux has the ability to perform a cron-job, which allows a user to run an application on a predefined schedule. Additionally, most Windows scraping programs can be run with command line options and the "Windows Task Scheduler" to get a timer effect.
References
*(1) [http://www.screen-scraper.com/download/choose_version.php Screen-scraper versions]
*(2) http://www.download.com/Data-Ferret/3000-2650_4-10526777.html?hhTest
*(3) http://www.irobotsoft.com/download.htm
*(4) http://www.newprosoft.com/web-content-extractor.htm
*(5) http://softbytelabs.com/us/products.html
*(6) http://www.tethyssolutions.com/all-automation-software-versions.htm
*(7) http://www.mozenda.com/mozenda-products.php
Wikimedia Foundation. 2010.