Webbots, Spiders, and Screen Scrapers by Michael Schrenk (O’Reilly Media)

I’ve been immediately caught from this book as I’m in a project that is based on a webbot, and I’m developing it using PHP and cURL.

Since the first day I’ve started to use cURL (and a lot of other PHP classes) I wondered about the sense of all these complex and extravagant classes and functions. The book author has a really different approach to the classes, and his approach is way more similar to mine that the standard PHP approach. One example of this is the LIB_http.php class that is a wrapper for some http functions.

When I started to code my own webbot, I thought I was taking the wrong approach since I preferred to parse the HTML with RegEx instead of following the tree-navigation. Reading this book I discovered that the approach I used is the one suggested by the author.

The book arrives to describe how to operate a botnet and this scared me in the first moment. Even after I read it all, I still doubt about the rightness of putting some of these knowledge public like this in a book.

The only part of the book that I did not like was the one about iMacros, since I like to see the whole code of everything I use.

I really liked the book, therefore I’ll give it a 5/5 and I would suggest this book to everyone is interested in the webbots and spiders world.

Leave a Reply