Headlines | Linux | Apps | Coding | BSD | Admin | News
Information for Linux System Administration 

Regular Expressions: Web Scraping Is Easy


Web surfing is so easy that you can easily train your computer to do it for you.
Some authors and publishers like to check the "Amazon rank" of books on a weekly basis. After once searching Amazon for a book's page, it's easy enough to bookmark that page, have a browser bring it up, and inspect the page for the "Sales Rank" value.

That's what most authors do. Suppose, though, that you don't want to wait for your browser to render everything that Amazon puts on the page, or, even more potently, that you want to feed the sales rank into your own algorithms for logging, charting, projecting, and so on. You need a computer program to extract the information that Amazon formatted and delivered for human reading.

The client-side Web programming that this requires turns out to be quite simple.
 read more | mail this link | score:5767 | -Ray, February 24, 2003
More Programming articles...

Abstract Art Prints by Ray Yeargin

coding headlines

No Starch Press has published my Perl One-Liners book!

Tutorial: Create an NFS-like Storage Server with GlusterFS on Ubuntu 12.10

Unix: Shell Script Wrapper Examples

Introduction to Perl one-liners

Tutorial: Install SVN, Configure multi-protocol access (Ubuntu 11.10)

I wrote my first programming e-book: Awk One-Liners Explained


Firefox sidebar

Site map

Site info

News feed


(to post)


Articles are owned by their authors.   © 2000-2012 Ray Yeargin