Spider

During 2003, I was asked to create a interesting tool, a spider to walk the web pages and extract data.

I have made an analysis of how to achieve this, and implemented it fully. My focus was to parse HTML, collect the HTTP request parameters and values, and extract the data.

The tool can use a page with links as well as a form as a source to create possible request parameter value combinations. Then, the links within the each page will be located and parsed to append to the list of pages to be processed.

Configurations and results are saved as XML files. I also made a viewer for the results which can be sorted on any attribute.


Published by Bojan Bjelić

Working hard on bunch of stuff, positive future above all. I'm blogging mostly about software, productivity and digital world.

Leave a comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.