Web Scraping Proxy


Programmers often need to use information on Web pages as input to other programs. This is done by Web Scraping, writing a program to simulate a person viewing a Web site with a browser. It is often hard to write these programs because it is difficult to determine the Web requests necessary to do the simulation.

The Web Scraping Proxy (WSP) solves this problem by monitoring the flow of information between the browser and the Web site and emitting Perl LWP code fragments that can be used to write the Web Scraping program. A developer would use the WSP by browsing the site once with a browser that accesses the WSP as a proxy server. He then uses the emitted code as a template to build a Perl program that accesses the site.

For more information, see the paper "Web Scraping Proxy", Dr. Dobbs Journal 28(6), June, 2003, 46-52. Copies of the paper are available from Dr. Dobbs Journal.


WSP is covered by the Common Public License. By downloading this software you agree to the terms of this license. The user name and password needed for download are provided at the bottom of this copy of the License.

WSP is implemented as a Perl program. Version 1.0 is available for download. For more information, see the README file.

Version 2.0 allows WSP to access web sites via a proxy and adds support for client certificates. Version 2.0 is available for download. For more information, see the README file.


For more information, please contact Howard Katseff.