use XML::LibXML;
The input callbacks are used whenever XML::LibXML has to get something other than externally parsed entities from somewhere. They are implemented using a callback stack on the Perl layer in analogy to libxml2's native callback stack.
The XML::LibXML::InputCallback class transparently registers the input callbacks for the libxml2's parser processes.
Using the function-oriented part the global callback stack of libxml2 can be manipulated. Those functions can be used as interface to the callbacks on the C- and XS Layer. At the object-oriented part, operations for working with the ``pseudo-localized'' callback stack are implemented. Currently, you can register and de-register callbacks on the Perl layer and initialize them on a per parser basis.
Callback Groups
The libxml2 input callbacks come in groups. One group contains a URI matcher (match), a data stream constructor (open), a data stream reader (read), and a data stream destructor (close). The callbacks can be manipulated on a per group basis only.
The Parser Process
The parser process works on an XML data stream, along which, links to other resources can be embedded. This can be links to external DTDs or XIncludes for example. Those resources are identified by URIs. The callback implementation of libxml2 assumes that one callback group can handle a certain amount of URIs and a certain URI scheme. Per default, callback handlers for file://*, file:://*.gz, http://* and ftp://* are registered.
Callback groups in the callback stack are processed from top to bottom, meaning that callback groups registered later will be processed before the earlier registered ones.
While parsing the data stream, the libxml2 parser checks if a registered callback group will handle a URI - if they will not, the URI will be interpreted as file://URI. To handle a URI, the match callback will have to return '1'. If that happens, the handling of the URI will be passed to that callback group. Next, the URI will be passed to the open callback, which should return a reference to the data stream if it successfully opened the file, '0' otherwise. If opening the stream was successful, the read callback will be called repeatedly until it returns an empty string. After the read callback, the close callback will be called to close the stream.
Organisation of callback groups in XML::LibXML::InputCallback
Callback groups are implemented as a stack (Array), each entry holds a reference to an array of the callbacks. For the libxml2 library, the XML::LibXML::InputCallback callback implementation appears as one single callback group. The Perl implementation however allows one to manage different callback stacks on a per libxml2-parser basis.
my $input_callbacks = XML::LibXML::InputCallback->new(); $input_callbacks->register_callbacks([ $match_cb1, $open_cb1, $read_cb1, $close_cb1 ] ); $input_callbacks->register_callbacks([ $match_cb2, $open_cb2, $read_cb2, $close_cb2 ] ); $input_callbacks->register_callbacks( [ $match_cb3, $open_cb3, $read_cb3, $close_cb3 ] ); $parser->input_callbacks( $input_callbacks ); $parser->parse_file( $some_xml_file );
If you use the old callback interface through global callbacks, XML::LibXML::InputCallback will treat them with a lower priority as the ones registered using the new interface. The global callbacks will not override the callback groups registered using the new interface. Local callbacks are attached to a specific parser instance, therefore they are treated with highest priority. If the match callback of the callback group registered as local variable is identical to one of the callback groups registered using the new interface, that callback group will be replaced.
Users of the old callback implementation whose open callback returned a plain string, will have to adapt their code to return a reference to that string after upgrading to version >= 1.59. The new callback system can only deal with the open callback returning a reference!
# Define the four callback functions sub match_uri { my $uri = shift; return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs } sub open_uri { my $uri = shift; my $handler = MyScheme::Handler->new($uri); return $handler; } # The returned $buffer will be parsed by the libxml2 parser sub read_uri { my $handler = shift; my $length = shift; my $buffer; read($handler, $buffer, $length); return $buffer; # $buffer will be an empty string '' if read() is done } # Close the handle associated with the resource. sub close_uri { my $handler = shift; close($handler); } # Register them with a instance of XML::LibXML::InputCallback my $input_callbacks = XML::LibXML::InputCallback->new(); $input_callbacks->register_callbacks([ \&match_uri, \&open_uri, \&read_uri, \&close_uri ] ); # Register the callback group at a parser instance $parser->input_callbacks( $input_callbacks ); # $some_xml_file will be parsed using our callbacks $parser->parse_file( $some_xml_file );
2002-2006, Christian Glahn.