Perl Weekly Challenge: Week 21

Challenge 1:

Write a script to calculate the value of e, also known as Euler's number and Napier's constant. Please checkout wiki page for more information.

All this talk of calculation sounds like hard work and we're lazy Perl programmers remember? Luckily I found a shortcut way back in week 4.

Let's start with Perl6 where 𝑒 is a builtin constant. So here is the solution as a super easy one liner.

perl6 -e '𝑒.say;'

(Full code on Github.)

As you can see, you can use the actual unicode character though a regular e is also allowed in case your editor only supports ascii.

Perl5 is almost as easy except you have to use a module Math::Bigrat (or the bigrat pragma as I have done here) and import its' e constant. By default it gives a ridiculously large amount of precision but you can get the same output as Perl6 by formatting it in a printf.

perl -Mbigrat=e -e 'printf("%.15f\n", e);'

(Full code on Github.)

To tell you the truth, calculating the value of 𝑒 is not that hard. It just involves a short recursive function. But a one-liner is much better don't you think?

Challenge 2:

Write a script for URL normalization based on rfc3986. This task was shared by Anonymous Contributor.

Once again, I was able to reuse a solution, this time from week 17, challenge 2.

There, for Perl5, I wrote a regular expression that matched the various required and optional parts of a URL. (I made one small change for this challenge; I added a named capture for the // which are found in many URL schemes.)

In week 17 all I needed to do was print the parts out. This time, I assembled the parts into a hash and then passed this hash to a series of functions.

sub lowerCase {
    my ($url) = @_;
    my $result = $url;

    $result->{scheme} =~ tr/A-Z/a-z/;
    $result->{host} =~ tr/A-Z/a-z/;

    return $result;
}

sub capitalizeEscape {
    my ($url) = @_;
    my $result = $url;

    for my $part (qw/ userinfo path query fragment /) {
        $result->{$part} =~ s/($pct_encoded)/\U$1/g;
    }

    return $result;
}

sub decodeUnreserved {
    my ($url) = @_;
    my $result = $url;

    for my $part (qw/ userinfo path query fragment /) {
        while ($result->{$part} =~ /($pct_encoded)/g) {
            my $pct = $1;
            my $hex = $pct;
            $hex =~ s/%//;
            if (hex($hex) =~ /$unreserved/) {
                my $decoded = chr(hex($hex));
                $result->{$part} =~ s/$pct/$decoded/;
            }
        }
    }

    return $result;
}

sub removeDefaultPort {
    my ($url) = @_;
    my $result = $url;

    if ($result->{port} eq '80') {
        $result->{port} = undef;
    }

    return $result;
}

Each of these functions makes a copy of the hash, applies some normalizing transformation (I only implemented the semantic-preserving transformations from the RFC. There are others.) to one or more values of the copy and then returns it. This means we can chain all the functions together in a pipeline like this:

say reassemble(
    decodeUnreserved(
        capitalizeEscape(
            removeDefaultPort(
                lowerCase(
                    parse(
                        shift
                    )
                )
            )
        )
    )
);

The reassemble() function takes the now normalized parts of the url and recombines them. It looks like this:

sub reassemble {
    my ($url) = @_;

    my $result = "$url->{scheme}:";
    $result .= $url->{slashes} // q{};
    $result .= $url->{userinfo} // q{};
    if (defined $url->{userinfo}) {
        $result .= '@';
    }
    $result .= $url->{host} // q{};
    if (defined $url->{port}) {
        $result .= ":$url->{port}";
    }
    $result .= $url->{path} // q{};
    $result .= "?$url->{query}" // q{};
    $result .= "#$url->{fragment}" // q{};

    return $result;
}

(Full code on Github.)

In the Perl6 version of week 17's challenge, I used a grammar and I reused it for this challenge (with an extra token for // as with Perl5.) Grammars are a special kind of class which make it much easier to create and handle highly structured regular expressions. Instead of creating a function pipeline, I chose to do the transformations in the grammars companion action class. This uses an event-driven approach where every time a token in the grammar is matched, the method with the same name in the action class is called.

class URLAction {
    method Scheme($/) {
        make($/.Str.lc);  # lowerCase
    }

    method Host($/) {
        make($/.Str.lc);
    }

    method Userinfo($/) {
        make(self.processPctEncoded($/.Str));  # decodeUnreserved and 
    }                                          # capitalizeEscape (see below)

    method Port($/) {
        if ($/.Str ~~ '80') {  # removeDefaultPort
            make(Nil);
        }
    }

    method Path($/) {
        make(self.processPctEncoded($/.Str));
    }

    method Query($/) {
        make(self.processPctEncoded($/.Str));
    }

    method Fragment($/) {
        make(self.processPctEncoded($/.Str));
    }
}

You can see in each of these methods we use the Match classes .make method. This is because when you parse with a grammar, it creates a tree of read-only match objects. The action class creates a similar tree except each of its' nodes has a space for some arbitrary data of your own. .make inserts that data into the node. It can be accessed again via the .made method.

Now here is a point where I ran into difficulties. In fact it took me so long to figure this out, I missed the deadline for this task. (I got the others in on time though.)

The problem is that the logical place to do transformations like decodeUnreserved and `capitalize is in an action class method that would look something like this:

method PctEncoded($/) {
    # do some transformations
    make("transformed token");
}

But PctEncoded is token that lives way down in the lower levels of the parse tree. It makes up part of other tokens which are in turn part of other tokens (in hindsight these composites should be rules not tokens but it doesn't make a practical difference I don't think.) Each of these would need its' own action class method in order to call PctEncoded.made and all that boilerplate code would bloat up the class.

Instead I a wrote method to be called on all the high-level parts that can contain PctEncoded values which makes a copy of that part, rescans it for PctEncoded tokens, does the transforms and then substitutes the transformed tokens back into the copy. This copy is then returned and the method for the high-level part can .make it.

I am almost 100% sure that this is the wrong way to go about this but, hey, it worked (eventually.)

method processPctEncoded($part) {
    my $processed = $part;
    for $part ~~ m:g/ \% $<HexOctet> = ( <[ 0..9 A..F a..f ]> ** 2 ) /
    -> $pct {
        my $hex = $pct<HexOctet>.Str.uc;
        if :16($hex) ~~ / <[ A..Z a..z 0..9 \- . _ \~ ]> / { # Unreserved
            $processed = $processed.subst($pct.Str, :16($hex).chr);
        } else {
            $processed = $processed.subst($pct.Str, "%$hex");
        }
    }
    return $processed;
}

Ideally I should have been able to reuse the HexOctet and Unreserved tokens from the grammar but there doesn't seem to be any way to access the whole grammar; actions only get the little piece which was actually matched. So I simply cut and pasted them in.

The last part of the Perl6 solution I want to show you is the equivalent to the reassemble() function from Perl5. The top-level rule in a grammar is called, appropriately enough, TOP. We can make an action method for it in the action class and that's the perfect place to put the URL back together again. It looks like this:

method TOP($/) {
    my $result = $<Scheme>.made ~ ':';
    if $/<Slashes> {
        $result ~= $<Slashes>;
    }

    if $/<Userinfo> {
        $result ~= $<Userinfo>.made ~ '@';
    }

    if $/<Host> {
        $result ~= $<Host>.made;
    }

    if $/<Port.made> {
        $result ~= ':' ~ $<Port>.made;
    }

    $result ~= $<Path>.made;

    if $/<Query> {
        $result ~= '?' ~ $<Query>.made;
    }

    if $<Fragment> {
        $result ~= '#' ~ $<Fragment>.made;
    }

    make($result);
}

(Full code on Github.)