Replacing XML::XPath with XML::LibXML in Perl
SoftwareI’ve used XML::XPath whenever I need to process XML, including RSS, OPML, Apache configuration, sitemaps, and such. There are dedicated parsers for these I should certainly be using instead, but I find it easier just to use the one tool and build my own data structure around it. Is that a folly? Almost certainly yes! Is it robust? So far, yes.
I’ve seen people recommend XML::LibXML on sites like PerlMonks and mailing lists, so I thought I’d give it a try for a new personal project. It’s mostly a drop-in replacement, with familiar syntax:
#!/usr/bin/env perl
use strict;
use warnings;
use URI;
use XML::LibXML;
my $rss = URI->new('http://showfeed.rubenerd.com');
my $xml = XML::LibXML->load_xml(location => $rss);
foreach my $title ($xml->findnodes('//item/title')) {
print $title->to_literal(). "\n";
}
==> Rubenerd Show 414: The thingy stuff episode
==> Rubenerd Show 413: The Wheaty 2021 episode
==> Rubenerd Show 412: The wandering mug episode
==> Rubenerd Show 411: The FreeBSD cat(1) episode
==> Rubenerd Show 410: The apothecary coffee episode
==> [..]
The load_xml
method can accept a location
such as a URL, a string
, or IO
in the form of a Perl file handle. Like XML::XPath though, you need to use an external module like LWP or LWP::Simple if you need to interface with HTTPS, then pass it as a string:
#!/usr/bin/env perl
use strict;
use warnings;
use URI;
use LWP::Simple;
use XML::LibXML;
my $rss = URI->new('https://the.geekorium.com.au/index.xml');
my $content = get($rss);
my $xml = XML::LibXML->load_xml(string => $content);
print $xml->findnodes('/rss/channel/title')->to_literal(). "\n";
==> The Geekorium
And finally, here we are using it with a local file:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;
my $file = 'feeds.opml';
open(my $filehandle, '<', $file) or die "Could not open, $!";
$xml = XML::LibXML->load_xml(IO => $filehandle);
print $xml->findnodes('/opml/head/docs')->to_literal(). "\n";
close($filehandle);
==> http://dev.opml.org/spec2.html
There’s a script in my lunchbox for those want to tinker. But the best resource I’ve found is Grant McClean’s Perl XML::LibXML by Example. Check out the links to the packages to read their docs on MetaCPAN.