Dec 10 2012

Two handy perl scripts

Published by at 5:02 pm under code,resources

Here are two handy Perl scripts that I’ve developed, one of them some years ago and the other a few weeks ago.  This zip file perl-scripts-prp-and-csplit contains a “readme.txt”, the Perl scripts and sample input files.

The prp.pl script scoops up many files and does many global search and replace edits in place using character strings in a patterns file that you specify.  I developed it (with some programming help when I got stuck) when I had to download three- or four-hundred pages from Web Crossing to produce an HTML image of a workshop, with all the inline images, enclosures, internal links translated from calls to Web Crossing to references to a local file system.  Writing the patterns to do that was meticulous and thankless, so eventually we discontinued the whole thing, but the Perl script could live on to be useful.

Recently I wrote the csplit.pl script to handle the output from a file produced by Google Refine (soon to be OpenRefine).  Refine can pull many web pages with its “Add column by fetching URLs” command.  Once in Refine, you can parse the data, manipulate it, subset it, and generally slice and dice it.  I wanted to write each resulting cell to its own file, where the output contains the data in one column and it’s written to a file named in another column, producing one file per cell. Here’s the Templating Export script:

-#- {{(cells["blog-tag"].value)}}.htm
{{jsonize(cells["posts-rss-feed"].value)}}

The output results in a big file with a “-#-” delimiter and csplit.pl can then chop up the output into multiple files named appropriately.

No responses yet

Leave a Reply