package DateTime::Format::Brief; use DateTime::Format::Builder ( parsers => { parse_datetime => [ { regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/, params => [qw( year month day hour minute second )], }, { regex => qr/^(\d{4})(\d\d)(\d\d)$/, params => [qw( year month day )], }, ], } );
Builder provides a number of methods, most of which you'll never need, or at least rarely need. They're provided more for exposing of the module's innards to any subclasses, or for when you need to do something slightly beyond what I expected.
When a simple single specification is given for a method, the method isn't given a single parser directly. It's given a wrapper that will call "on_fail" if the single parser returns "undef". The single parser must return "undef" so that a multiple parser can work nicely and actual errors can be thrown from any of the callbacks.
Similarly, any multiple parsers will only call "on_fail" right at the end when it's tried all it could.
"on_fail" (see later) is defined, by default, to throw an error.
Multiple parser specifications can also specify "on_fail" with a coderef as an argument in the options block. This will take precedence over the inheritable and overrideable method.
That said, don't throw real errors from callbacks in multiple parser specifications unless you really want parsing to stop right there and not try any other parsers.
In summary: calling a method will result in either a "DateTime" object being returned or an error being thrown (unless you've overridden "on_fail" or "create_method", or you've specified a "on_fail" key to a multiple parser specification).
Individual parsers (be they multiple parsers or single parsers) will return either the "DateTime" object or "undef".
The precise set of keys and values varies according to parser type. There are some common ones though:
length is an optional parameter that can be used to specify that this particular regex is only applicable to strings of a certain fixed length. This can be used to make parsers more efficient. It's strongly recommended that any parser that can use this parameter does.
You may happily specify the same length twice. The parsers will be tried in order of specification.
You can also specify multiple lengths by giving it an arrayref of numbers rather than just a single scalar. If doing so, please keep the number of lengths to a minimum.
If any specifications without lengths are given and the particular length parser fails, then the non-length parsers are tried.
This parameter is ignored unless the specification is part of a multiple parser specification.
label provides a name for the specification and is passed to some of the callbacks about to mentioned.
on_match and on_fail are callbacks. Both routines will be called with parameters of:
input is the input to the parser (after any preprocessing callbacks).
label is the label of the parser if there is one.
self is the object on which the method has been invoked (which may just be a class name). Naturally, you can then invoke your own methods on it do get information you want.
These routines will be called depending on whether the regex match succeeded or failed.
preprocess is a callback provided for cleaning up input prior to parsing. It's given a hash as arguments with the following keys:
input is the datetime string the parser was given (if using multiple specifications and an overall preprocess then this is the date after it's been through that preprocessor).
parsed is the state of parsing so far. Usually empty at this point unless an overall preprocess was given. Items may be placed in it and will be given to any postprocessor and "DateTime->new" (unless the postprocessor deletes it).
self, args, label as per on_match and on_fail.
The return value from the routine is what is given to the regex. Note that this is last code stop before the match.
Note: mixing length and a preprocess that modifies the length of the input string is probably not what you meant to do. You probably meant to use the multiple parser variant of preprocess which is done before any length calculations. This "single parser" variant of preprocess is performed after any length calculations.
postprocess is the last code stop before "DateTime->new" is called. It's given the same arguments as preprocess. This allows it to modify the parsed parameters after the parse and before the creation of the object. For example, you might use:
{ regex => qr/^(\d\d) (\d\d) (\d\d)$/, params => [qw( year month day )], postprocess => \&_fix_year, }
where "_fix_year" is defined as:
sub _fix_year { my %args = @_; my ( $date, $p ) = @args{qw( input parsed )}; $p->{year} += $p->{year} > 69 ? 1900 : 2000; return 1; }
This will cause the two digit years to be corrected according to the cut off. If the year was '69' or lower, then it is made into 2069 (or 2045, or whatever the year was parsed as). Otherwise it is assumed to be 19xx. The DateTime::Format::Mail module uses code similar to this (only it allows the cut off to be configured and it doesn't use Builder).
Note: It is very important to return an explicit value from the postprocess callback. If the return value is false then the parse is taken to have failed. If the return value is true, then the parse is taken to have succeeded and "DateTime->new" is called.
See the documentation for the individual parsers for their valid keys.
Parsers at the time of writing are:
If the specification is a reference to a piece of code, be it a subroutine, anonymous, or whatever, then it's passed more or less straight through. The code should return "undef" in event of failure (or any false value, but "undef" is strongly preferred), or a true value in the event of success (ideally a "DateTime" object or some object that has the same interface).
This all said, I generally wouldn't recommend using this feature unless you have to.
Any time you see a callback being mentioned, you can, if you like, substitute an arrayref of coderefs rather than having the straight coderef.
Note that if the first element of the array is an arrayref, then you're specifying options.
preprocess lets you specify a preprocessor that is called before any of the parsers are tried. This lets you do things like strip off timezones or any unnecessary data. The most common use people have for it at present is to get the input date to a particular length so that the length is usable (DateTime::Format::ICal would use it to strip off the variable length timezone).
Arguments are as for the single parser preprocess variant with the exception that label is never given.
on_fail should be a reference to a subroutine that is called if the parser fails. If this is not provided, the default action is to call "DateTime::Format::Builder::on_fail", or the "on_fail" method of the subclass of DTFB that was used to create the parser.
User calls parser:
my $dt = $class->parse_datetime($string);
If regex did match, then on_match is called with the same arguments as would be given to on_fail. The return value is similarly ignored, but we then move to step 4 rather than exiting the parser.
See the section on error handling regarding the "undef"s mentioned above.
User calls parser:
my $dt = $class->complex_parse($string);
If the callback modifies $p then a copy of $p is given to each of the individual parsers. This is so parsers won't accidentally pollute each other's workspace.
If a "DateTime" object was returned so we go straight back to the user.
If no appropriate parser was found, or the parser returned "undef", then we progress to step 3!
For each of those the single specification flow above is performed, and is given a copy of the output from the overall preprocessor.
If a real "DateTime" object is returned then we exit back to the user.
If no parser could parse, then an error is thrown.
See the section on error handling regarding the "undef"s mentioned above.
use DateTime::Format::Builder ( ... )
That can be (almost) equivalently written as:
use DateTime::Format::Builder; DateTime::Format::Builder->create_class( ... );
The difference being that the first is done at compile time while the second is done at run time.
In the tutorial I said there were only two parameters at present. I lied. There are actually three of them.
parsers takes a hashref of methods and their parser specifications. See the DateTime::Format::Builder::Tutorial for details.
Note that if you define a subroutine of the same name as one of the methods you define here, an error will be thrown.
constructor determines whether and how to create a "new" function in the new class. If given a true value, a constructor is created. If given a false value, one isn't.
If given an anonymous sub or a reference to a sub then that is used as "new".
The default is 1 (that is, create a constructor using our default code which simply creates a hashref and blesses it).
If your class defines its own "new" method it will not be overwritten. If you define your own "new" and also tell Builder to define one an error will be thrown.
verbose takes a value. If the value is "undef", then logging is disabled. If the value is a filehandle then that's where logging will go. If it's a true value, then output will go to "STDERR".
Alternatively, call $DateTime::Format::Builder::verbose with the relevant value. Whichever value is given more recently is adhered to.
Be aware that verbosity is a global setting.
class is optional and specifies the name of the class in which to create the specified methods.
If using this method in the guise of "import" then this field will cause an error so it is only of use when calling as "create_class".
version is also optional and specifies the value to give $VERSION in the class. It's generally not recommended unless you're combining with the class option. A "ExtUtils::MakeMaker" / "CPAN" compliant version specification is much better.
In addition to creating any of the methods it also creates a "new" method that can instantiate (or clone) objects.
The default action is to call "on_fail" in the event of a non-parse, but you can make it do whatever you want.
The single argument is the input string. The default action is to call "croak". Above, where I've said parsers or methods throw errors, this is the method that is doing the error throwing.
You could conceivably override this method to, say, return "undef".
my $parser = DateTime::Format::Builder->new;
If called as a method on an object (rather than as a class method), then it clones the object.
my $clone = $parser->new;
my $clone_of_clone = $clone->clone;
$parser->parser( regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x; params => [qw( year month day )], );
The arguments given to "parser" are handed directly to "create_parser". The resultant parser is passed to "set_parser".
If called as an object method, it returns the object.
If called as a class method, it creates a new object, sets its parser and returns that object.
$parser->set_parser($coderef);
Note: this method does not take specifications. It also does not take anything except coderefs. Luckily, coderefs are what most of the other methods produce.
The method return value is the object itself.
my $code = $parser->get_parser;
my $dt = $parser->parse_datetime('1979 07 16');
The return value, if not a "DateTime" object, is whatever the parser wants to return. Generally this means that if the parse failed an error will be thrown.
Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for writing the multi-length code (both one length with multiple parsers and single parser with multiple lengths), blame for the Regex custom constructor code, spotting a bug in Dispatch, and more much needed review.
Kellan Elliott-McCrea (KELLAN) for even more review, suggestions, DateTime::Format::W3CDTF and the encouragement to rewrite these docs almost 100%!
Claus Färber (CFAERBER) for having me get around to fixing the auto-constructor writing, providing the 'args'/'self' patch, and suggesting the multi-callbacks.
Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now supports.
Matthew McGillis for pointing out that "on_fail" overriding should be simpler.
Simon Cozens (SIMON) for saying it was cool.
perl, DateTime, DateTime::Format::Builder::Tutorial, DateTime::Format::Builder::Parser
I am also usually active on IRC as 'autarch' on "irc://irc.perl.org".
Please note that I am not suggesting that you must do this in order for me to continue working on this particular software. I will continue to do so, inasmuch as I have in the past, for as long as it interests me.
Similarly, a donation made in this way will probably not make me work on this software much more, unless I get so many donations that I can consider working on free software full time (let's all have a chuckle at that together).
To donate, log into PayPal and send money to autarch@urth.org, or use the button at <https://www.urth.org/fs-donation.html>.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
The full text of the license can be found in the LICENSE file included with this distribution.