IronWebScraper
The C# WebScraping Library
# C# framework for extracting clean,
structured data from html web applications
# Useful for system migrations, populating search engines,
competitive analysis and data mining
Powerful Scraping Engine Under Your Control
Just write a single C# web-scraper class to scrape thousands or even millions of web pages into C# Class Instances, JSON or Downloaded Files. IronWebScraper allows you to code concise, linear workflows simulating human browsing behavior. IronWebScraper will run your code as a swarm of virtual web browsers, massively paralleled, yet polite and fault tolerant.
 
 
Simple, Flexible Logic
IronWebScraper must be programmed to know how to handle each “type” of page it encounters. This is achieved in a very concise manner using CSS Selectors or XPath expressions and can be fully customized in C#. This freedom allows you to decide which pages to scrape within a website, and what to do with the data extracted. Each method can be debugged and watched neatly in Visual Studio.
 
 
Fast and Polite Behavior
IronWebScraper deals with multithreading and web-requests to allow for hundreds of concurrent threads without the developer needing to manage them. Politeness can be set to throttle requests, so reducing risk of excessive load on target web servers.
 
 
Create virtual user Identities
IronWebScraper can use one or multiple “identities” - sessions that simulate real world human requests. Each request may programmatically or randomly assign its own Identity, User Agent, Cookies, Logins and even IP addresses. Requests are set as auto-unique with a combination of URL, parse method and post variables.
 
 
Action Replay
IronWebScraper uses advanced caching to allow developers to change their code “on the fly” and replay every previous request without contacting the internet. Every scrape job is autosaved and can be resumed in the event of an exception or power outage.
 
 
Rapid Installation with Microsoft Visual Studio
IronWebScraper puts Web Scraping tools in your own hands quickly with a Visual Studio installer. Whether installing directly from Nuget within visual studio or downloading the DLL, you’ll be setup in no time. Just one DLL and no dependancies.