Perl Weekly Challenge: Week 14

This week I once again fell into my habit of not completing the challenge until the last minute. Happily, Mohammad has moved the deadline forward to midnight (7pm my time) to help laggards like me.

Challenge 1:

Write a script to generate Van Eck's sequence starts with 0. For more information, please check out wikipedia page. This challenge was proposed by team member Andrezgz.

I must admit I didn't understand the referenced Wikipedia page but after googling a bit I found this page which made things clearer. Generating the sequence is really simple. Here's a Perl5 script that prints the first 100 Van Eck numbers:

sub backtrack {
    my ($n, $vanEcks) = @_;

    for (my $i = scalar @{$vanEcks} - 1; $i > 0; $i--) {
        if ($vanEcks->[$i - 1] == $n) {
            return scalar @{$vanEcks} - $i;
        }
    }
    return 0;
}

my @vanEcks = (0);
my $n = 0;

while (scalar @vanEcks <= 100) {
    push @vanEcks, $n;
    $n = backtrack($n, \@vanEcks);
}

say join ", ", @vanEcks;

(Full code on Github.)

And this is the Perl6 version:

sub backtrack($n, @vanEcks) {
    loop (my $i = @vanEcks.elems - 1; $i > 0; $i--) {
        if @vanEcks[$i - 1] == $n {
            return @vanEcks.elems - $i;
        }
    }
    return 0;
}

multi sub MAIN() {
    my @vanEcks = (0);
    my $n = 0;

    while @vanEcks.elems <= 100 {
        @vanEcks.push($n);
        $n = backtrack($n, @vanEcks);
    }

    @vanEcks.join(", ").say;
}

(Full code on Github.)

I've got to say I'm not happy with this one. It's just a translation from Perl5 and can probably be done in a more idiomatic way. For instance, I tried using gather and take but I couldn't get it to work in the limited time I had. Or something with generators might be appropriate I don't know. I'll definitely be asking the experts for advice on this.

Challenge 2:

Using only the official postal (2-letter) abbreviations for the 50 U.S. states, write a script to find the longest English word you can spell? Here is the list of U.S. states abbreviations as per wikipedia page. This challenge was proposed by team member Neil Bowers.

Once again Wikipedia caused a problem. The state abbreviationa are on that page alright but they are in an awkward uncopy-and-pastable table format. Perl, the savior of lazy programmers to the rescue! I wrote this little script with the aid of some handy CPAN modules to scrape the page and extract the relevant data.

use LWP::Simple;
use HTML::TableParser;
use constant FEDERAL_STATE => 1;
use constant USPS => 5;

my @abbrevs;

sub row {
    my ($columns) = $_[2];
    if ($columns->[FEDERAL_STATE] =~ /State/) {
        push @abbrevs, lc $columns->[USPS];
    }
}

This function is a callback called for every row of a table processed by HTML::TableParser.

The U.S. Postal service deals with several types of jurisdiction but the specs say we should only look at the 50 states. So no District of Columbia, Puerto Rico, U.S. Armed Forces or Johnston Atoll. At first I did a simple comparison in the Federal State column of the table with the word "State" but I was surprised to get only 46 results. This is because four states are technically commonwealths. (Kentucky, Massachusetts, Pennsylvania and Virginia FYI.) and are recorded as "State (Commonwealth)" so I used a regex instead.

my $page =
    get('https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations') //
    die "Couldn't fetch page: $OS_ERROR\n";

my $tableParser = HTML::TableParser->new (
    [
        {
            id => 'DEFAULT',
            row => \&row
        }
    ],
);

The table doesn't have an id (bad Wikipedia!) but luckily it is the only one on the page so we can use the value DEFAULT.

$tableParser->parse($page);

say 'my @abbrevs = qw/ ', (join q{ }, @abbrevs), ' /;';

(Full code on Github.)

As a bonus, my script formats the output as proper Perl code ready to be pasted directly into my code. I could also have made it wrap at 80 columns but I'm not quite that lazy. :-)

This is the actual Perl5 code that solves the challenge. I was really running low on time at this point so algorithmically it is not optimal I'm sure. But it isn't overly slow on my system.

sub makeWords {
    my ($list) = @_;

    open my $wordList, '<', $list or die "$list: $OS_ERROR\n";
    local $RS;
    my $contents = <$wordList>;
    close $wordList;
    return sort
           map { lc }
           grep { /^[[:alpha:]]+$/ }
           split /\n/,
           $contents;
}

This is basically the function I used in challenge 8. We slurp in the list of words and convert them to lower case as well as weeding out any words that have spaces or punctuation in them.

my $wordList = shift // '/usr/share/dict/words';

my @words = makeWords($wordList);
my %spelled =  map { $_ => $_ } @words;

my @abbrevs = qw/
    al ak az ar ca co ct de fl ga hi id il in ia ks ky la me md ma mi mn ms mo
    mt ne nv nh nj nm ny nc nd oh ok or pa ri sc sd tn tx ut vt va wa wv wi wy
/;

for my $word (@words) {
    for my $abbrev (@abbrevs) {
        if ($spelled{$word} eq q{}) {
            last;
        }
        if ($word =~ /$abbrev/) {
            $spelled{$word} =~ s/$abbrev//;
        }
    }
}

my $answer = q{};

for my $word (sort grep {$spelled{$_} eq q{}} keys %spelled) {
    if (length $word >= length $answer) {
        $answer = $word;
    }
}

say $answer;

(Full code on Github.)

The second for loop is more complicated than strictly necessary because I wanted to keep the list in sorted order. There are several words that tie for longest and I wanted to make the alphabetically last of these was returned each time.

This is the Perl6 version. It ran noticably slower then Perl5 but it is not readily apparent to me why even though it is nearly a straight translation. (I got a modest speedup by using .contains instead of a regex match in the first loop.)

An issue I should deal with which might have some bearing on the matter is that my rakudo installation is over a year old. Perhaps the regex engine has been optimized much further in newer versions?

sub makeWords(Str $wordList) {
    return $wordList.IO.lines
        .grep({  /^<:alpha>+$/ })
        .map({ .lc })
        .sort;
}

multi sub MAIN(
    Str $wordList = '/usr/share/dict/words'
) {

    my @words = makeWords($wordList);
    my %spelled =  @words.map({ $_ => $_ });

    my @abbrevs = qw/
    al ak az ar ca co ct de fl ga hi id il in ia ks ky la me md ma mi mn ms mo
    mt ne nv nh nj nm ny nc nd oh ok or pa ri sc sd tn tx ut vt va wa wv wi wy
    /;

    for @words -> $word {
        for @abbrevs -> $abbrev {
            if %spelled{$word} eq q{} {
                last;
            }
            if $word.contains($abbrev) {
                %spelled{$word} ~~ s/$abbrev//;
            }
        }
    }

    my $answer = q{};

    for %spelled.keys.grep({ %spelled{$_} eq q{} }).sort -> $word {
        if ($word.chars >= $answer.chars) {
            $answer = $word;
        }
    }

    say $answer;
}

(Full code on Github.)

In case you were wondering the answer according to /usr/share/dict/words on my system is "moorland." (Missouri-Oregon-Lousiana-North Dakota.)