You can use this Web Crawler and Scraper for Files and Links on Apps category for crawl, deep crawl, deep link, e-mail parser, file find, file finder, html, image crawl, image find, link parser, parser, pdf, scrape, web crawler, web scraper and another uses.
- None: None
- None: None
- Windows xp, windows vista, windows 7, windows 8 desktop: Windows XP, Windows Vista, Windows 7, Windows 8 Desktop
- .net 4: .NET 4
About Web Crawler and Scraper
Web Crawler can be used to get links, emails, images and files from a webpage or site.
Web Crawler has a simple and intuitive interface.
The crawler is multithreaded and optimized for performance. It scans the webpage based on MIME types and file extensions, so it can find hidden links.
Two applications are included in the package. A Windows Forms application and a new WPF application with extended functionality. The “Deep crawl” feature allows the crawler to search all the linked pages from the selected website.
After crawling, the Web Crawler will save all links and e-mail addresses to the selected folder, along with all the crawled files.
The WPF crawler/scraper allows the user to input a regular expression to scrape through the webpages. The new application gives the user a greater control over the crawling process.
How to use the Windows Forms crawler
On the top is a box for entering the URL to crawl.Underneath the URL box is a folder in which to save the crawled files. The last box is for file extensions that the crawler should look for. If the file extensions box is left empty, then the program only looks for links and e-mails on the page and saves them to the linkList.txt and emailList.txt files in the output directory.
The application is primarily meant for subpage crawling, but can crawl a whole website when the “deep crawl” option is checked. This option is very resource intensive as it tries to make parallel connections to the server for better performance.
How to use the WPF crawler and scraper
The WPF has a similar interface to the Windows Forms crawler/scraper. The first three boxes have the same functionality. The last box is optional. It can be used to enter a regular expression by which to search each crawled webpage for anything that can be matched by a regular expression. This can be used to search for phone numbers, names, locations etc.
The crawler is multithreaded and optimized for performance. It scans the webpage based on MIME types and file extensions, so it can find hidden links. There is some support for AJAX calls. The new engine allows for more control over what is crawled and the depth and scope of the crawl. The user can also control the number of concurrent threads that the program will use to scrape webpages.
About the rating
It seems that only people who do not like the product or could not use it properly decide to rate it. If you like the application, you can help the developer by rating it up.