Perl Weekly Challenge: Week 259

Challenge 1:

Banking Day Offset

You are given a start date and offset counter. Optionally you also get bank holiday date list.

Given a number (of days) and a start date, return the number (of days) adjusted to take into account non-banking days. In other words: convert a banking day offset to a calendar day offset.

Non-banking days are:

a) Weekends
b) Bank holidays
Example 1
Input: $start_date = '2018-06-28', $offset = 3, $bank_holidays = ['2018-07-03']
Output: '2018-07-04'

Thursday bumped to Wednesday (3 day offset, with Monday a bank holiday)
Example 2
Input: $start_date = '2018-06-28', $offset = 3
Output: '2018-07-03'

Perl first for a change. I love calendar related problems but I know there are many quirks and corner cases that make it perilous to do date handling on your own. Fortunately, Perl has a well-tested and comprehensive library of date and time related code in the DateTime module. We will also need to get dates from the input and convert them into DateTime objects. For that we can use the allied DateTime::Format::Strptime module. To use these we need to include the following at the top of the script:

use DateTime;
use DateTime::Format::Strptime;

to get $startDate from the first command-line argument as a DateTime object, DateTime::Format::Strptime is used like this. %F is the YYYY-MM-DD format for dates used in the examples.

my $strptime = DateTime::Format::Strptime->new(
    pattern => '%F',
    on_error => sub { die "Invalid date\n"; }
);

my $startDate = $strptime->parse_datetime(shift @ARGV);

The second command-line argument is the $offset which is imported as just a normal scalar.

my $offset = shift @ARGV;

The rest of the command-line arguments represent the bank holidays. They are also converted into DateTime objects.

my @bankHolidays = map { $strptime->parse_datetime($_) } @ARGV;

Initially, the end date is set to the start date.

my $endDate = $startDate;

Then while we have a positive offset, we add one day to the end date.

while ($offset > 0 ) {
    $endDate->add(days => 1);

If the day of the week (determined by DateTimes dow() method) is not 0 (Sunday) or 6 (saturday) and the current end date is not a bank holiday, we reduce the offset by 1.

    my $dow = $endDate->dow;
    if ($dow > 0 && $dow < 6 && !grep { $_ == $endDate } @bankHolidays) {
        $offset--;
    }
}

When the offset is 0, we have the proper end date. We use say() and DateTimes ymd() method to print it in YYYY-MM-DD format.

say $endDate->ymd;

(Full code on Github.)

Raku has a standard Date class we can use. It works very similarly to DateTime except the methods are named a little differently.

my $startDate = Date.new(@*ARGS.shift);
my $offset = @*ARGS.shift;
my @bankHolidays = @*ARGS.map({ Date.new($_) });
my $endDate = $startDate;

while ($offset > 0 ) {
    $endDate += 1;
    my $dow = $endDate.day-of-week;
    if ($dow > 0 && $dow < 6 && @bankHolidays.none == $endDate) {
        $offset--;
    }
}

say $endDate.yyyy-mm-dd;

(Full code on Github.)

Challenge 2:

Line Parser

You are given a line like below:

{%  id   field1="value1"    field2="value2"  field3=42 %}

Where

a) "id" can be \w+.
b) There can be 0  or more field-value pairs.
c) The name of the fields are \w+.
d) The values are either number in which case we don't need double quotes or string in
which case we need double quotes around them.

The line parser should return structure like below:

{
    name => id,
    fields => {
        field1 => value1,
        field2 => value2,
        field3 => value3,
    }
}

It should be able to parse the following edge cases too:

{%  youtube title="Title \"quoted\" done" %}

and

{%  youtube title="Title with escaped backslash \\" %}

BONUS: Extend it to be able to handle multiline tags:

{% id  filed1="value1" ... %}
LINES
{% endid %}

You should expect the following structure from your line parser:

{
    name => id,
    fields => {
        field1 => value1,
        field2 => value2,
        field3 => value3,
    }
    text => LINES
}

This is a golden opportunity to use one of Raku's most unique and powerful features—grammars.

Based on the details provided by the spec, we create a grammar called LineParser. The grammar consists of several token rules:

grammar LineParser {

TOP is the main entry point that matches the entire line pattern with {% and %} delimiters.

    token TOP { ^ '{%' \s+ <identifier> <field>* \s* '%}' $ }

identifier matches word characters for names.

    token identifier { \w+ }

field matches field definitions in the format name = value.

    token field { 
        \s+ <name=.identifier> \s* '=' \s* <value>
    }

value can be of two types: number matches a series of digits and string matches quoted strings with escape sequences in conjunction with the content token.

    proto token value {*}
    token value:sym<number> { \d+ }
    token value:sym<string> {
        '"' <content> '"'
    }

    token content {
        [
            | <-[\"\\]>+
            | '\\"'
            | '\\\\'
        ]*
    }
}

Usually just parsing out the tokens is not enough for a grammar, there needs to be some post-processing to get the data into usable form. The LineActions class defines what to do with the parsed components. The methods of this class have the same name as the tokens in LineParser and are called when a token of that type is parsed.

class LineActions {

The TOP method creates a Hash with the identifier name and field values. This will become the return value of the parsers parse() method.

    method TOP($/) {
        make {
            name => ~$<identifier>,
            fields => $<field>.map({
                ~$_<name> => $_<value>.made
            }).Hash
        }
    }

value:sym<number> converts matches to numbers.

    method value:sym<number>($/) { make +$/ }

value:sym<string> handles string values and processes escape sequences.

    method value:sym<string>($/) {
        make ~$<content>
            .subst(/'\\"'/, '"', :g)
            .subst(/'\\\\'/, '\\', :g)
    }
}

Once the grammar has been defined, in MAIN() we can reate a LineActions object, attempt to parse the input line with LineParser and output either the parsed structure or a failure message.

my $actions = LineActions.new;
my $match = LineParser.parse($line, :$actions);
say $match ?? $match.made !! 'Failed to parse line';

As a bonus, the spec asks us to handle a multi-line format. This involves making a few changes to the grammar.

The rule that was TOP is renamed to single-line and a new TOP rule is defined like this.

rule TOP { 
    | <single-line>
    | <multi-line>
}

The new multi-line token matches an opening tag with fields, arbitrary text content and a closing tag with end prefix. Both tags must be on their own lines (note the \n requirements)

token multi-line {
    '{%' \s+ <identifier> <field>* \s* '%}' \n
    <text>
    '{%' \s+ 'end' <end-id=.identifier> \s* '%}' \n?
}

The new text token captures all content between the opening and closing tags. It matches any characters that don't start with '{' to avoid accidentally matching closing tags.

token text {
    <-[{]>* # any chars not starting with {
}

The LineActions class also needs to be enhanced.

The TOP method has to accomodate multi-line as well as single-line parses.

method TOP($/) {
    make $<single-line> ?? $<single-line>.made !! $<multi-line>.made;
}

The new Multi-line method validates that the opening and closing tags match up. It also includes the captured text in the returned output.

method multi-line($/) { die "Closing tag 'end{$}' does not match opening tag '{$}'" unless $ eq $;

make {
    name => ~$<identifier>,
    fields => $<field>.map({
        ~$_<name> => $_<value>.made
    }).Hash,
    text => ~$<text>
}

}

(Full code on Github.)

Perl doesn't have grammars but it has something almost as good—the Parse::RecDescent module. It is called that because it (like Raku) produces Recursive Descent parsers.

This is what the grammar looks like:

my $grammar = <<'-EOT-';
    startrule: multi_line | single_line

    single_line: '{%' ws identifier field(s) '%}'
        {
            $return = "{\n"
                . "    name => $item{identifier},\n"
                . "    fields => {\n        "
                . join(",\n        ", @{$item{'field(s)'}})
                . "\n    }\n"
                . "}\n";
        }

    multi_line: '{%' ws identifier field(s) '%}' text '{%' ws "endmyid" ws '%}'
        {
            $return = "{\n"
                . "    name => $item{identifier},\n"
                . "    fields => {\n        "
                . join(",\n        ", @{$item{'field(s)'}})
                . "\n    },\n"
                . "    text => $item{text}"
                . "}\n";
        }

    ws: /\s*/

    identifier: /\w+/
    {
        my $id = $item[1];
        $return = $id;
    }

    field: ws name ws '=' ws value
        { $return = "$item{ name } => $item{ value }"  }

    name: /\w+/

    value: number | string

    number: /\d+/   
        { $return = 0 + $item[1]; }

    string: /"/ content /"/
        { $return = $item{content}; }

    content: /(\\" | \\\\ | [^"])+/x

    text: /[^{]+/
-EOT-

(Full code on Github.)

Unlike Raku, parse actions are defined within the rules themselves not in a separate class.