Perl Weekly Challenge: Week 161

Challenge 1:

Abecedarian Words

An abecedarian word is a word whose letters are arranged in alphabetical order. For example, "knotty" is an abecedarian word, but "knots" is not. Output or return a list of all abecedarian words in the dictionary, sorted in decreasing order of length.

Optionally, using only abecedarian words, leave a short comment in your code to make your reviewer smile.

In Raku, first we read in the dictionary:

my @words = "dictionary.txt".IO.lines;

An array is created to hold any abecedarian words we find.

my @abcde;

Now for every word in the dictionary, we break it down into individual characters with .comb(), sort them in alphabetical order and join them up again. This reconstituted word is compared with the original and if they match, it means the word is abecedarian so we add it to our list.

for @words -> $word {
    if $word eq $word.comb.sort.join {
        @abcde.push($word);
    }
}

Once we have the full list of abcedarian words, we sort it in decreasing order of length as the spec requires and print it.

for @abcde.sort({ $^b.chars <=> $^a.chars }) -> $word {
    say $word;
}

(Full code on Github.)

Here's the Perl version.

my @abcde;

Considerably more work has to be done to read in the dictionary and check it for abecedarian words.

open my $dictionary, '<', 'dictionary.txt' or die "$OS_ERROR\n";
while (my $line = <$dictionary>) {
    chomp $line;
    my $word = $line;
    if ($word eq join q{}, sort split //, $word) {
        push @abcde, $word;
    }
}
close $dictionary;

The sorting and printing the list part is about the same as Raku though.

for my $word (sort { length $b <=> length $a } @abcde) {
    say $word;
}

(Full code on Github.)

The optional part of this challenge was fun. I came up with the following abecedarian sentences:

A deers' deep bellow at nosy chimps: "Moo!" it do cry, "Be ill o dirty fuzz foots!"
Guy airs hip hop art film. It flops in city. His boss bins it.
Boy abhors forty cent eggs; cost is crux. Cops chip in, buy a box for him.
Hippy most ably aces dirty cello ditty. Accept or boo?

Challenge 2:

Pangrams

A pangram is a sentence or phrase that uses every letter in the English alphabet at least once. For example, perhaps the most well known pangram is:

the quick brown fox jumps over the lazy dog

Using the provided dictionary, so that you don't need to include individual copy, generate at least one pangram.

Your pangram does not have to be a syntactically valid English sentence (doing so would require far more work, and a dictionary of nouns, verbs, adjectives, adverbs, and conjunctions). Also note that repeated letters, and even repeated words, are permitted.

BONUS: Constrain or optimize for something interesting (completely up to you), such as:

Shortest possible pangram (difficult)
Pangram which contains only abecedarian words (see challenge 1)
Pangram such that each word "solves" exactly one new letter. For example, such a pangram might begin with (newly solved letters in bold):
    a ah hi hid die ice tea ...
    What is the longest possible pangram generated with this method? (All solutions will contain 26 words, so focus on the letter count.)
Pangrams that have the weirdest (PG-13) Google image search results
Anything interesting goes!

Oof! This was probably the hardest challenge in the history of the weekly challenge. Atleast I spent more time on it than I have on any previous problem.

Partly, this might be because I did not do the basic form of the challenge but the bonus. The spec is quite flexible about this so what I chose was to try and find the shortest abecedarian pangram.

So I started by taking some Raku code from challenge 1 and changing it slightly.

my @words = "dictionary.txt".IO.lines;
my %wordlists;

for @words -> $word {
    my @letters = $word.comb;
    if $word eq @letters.sort.join && (@letters.categorize({ $_ })).values.all == 1 {
        for @letters -> $letter {
            %wordlists{$letter}.push($word);
        }
    }
}

Instead of one list of abecedarian words, I have a list for every letter of the alphabet. If a word contains a particular letter—regardless of position—I put it the list for that letter. The idea was that I could guarantee a pangram, if I had one word from every list. Actually because I am trying to get the shortest pangram, my result didn't use every list but more on that in a bit. The other thing I did was to weed out words that had more than one occurrence of a letter Any resulting pangram would be longer if it contained duplicate letters. I did this by using the .categorize() method to make a hash whose keys are letters and whose values are the number of times that letter occurs. If all the values of that hash are 1, I know the word does not contain duplicates.

Next I inspected my word lists. I noticed that they varied markedly in size. For instance for 'q' and 'z' there was only one word each, 'qt' and 'chintz'. I could already tell I was not going to be able to achieve a pangram of maximum shortness (Which would be 26 charactars long.) if I included both of these words as they both contain 't'. I spent a lot of time trying to see if there was a way around this but eventually I had to accept that my result would have to contain both these words. I hoped to atleast reduce the number of these interword duplicates by creating an array by indexing the word lists sorted in order of length, shortest to longest, like this:

my @order = ('a' .. 'z').sort({ %wordlists{$^a}.elems <=> %wordlists{$^b}.elems });

Armed with the word lists and order, I proceeded to find a a pangram in a function called, as you might expect, makePangram().

my @pangram = makePangram(%wordlists, @order);

makePangram() starts by allotting arrays to hold the pangram itself, and the letters which have already been used.

sub makePangram(%wordlists, @order) {
    my @pangram = ();
    my @used;

Then it enters an infinite loop.

    loop {

In each iteration of the loop, it takes the first letter in @order which will be the key of %wordlists that contains the shortest list of words.

        my $letter = @order[0];

An array is created that takes the word list for that letter and sorts it shortest word to longest word. I wasted a lot of time here by forgetting to include | which 'flattens' a list. Without it I ended up with an array of one list instead of an array of words.

        my @choices = (| %wordlists{$letter}).sort({ $^a.chars <=> $^b.chars });

$word stores our potential candidate for inclusion in the pangram.

        my $word = q{};

Now we go through the list of words and see if they contain letters that have already been used. I did this by splitting each word into letters and comparing it to @used with the nifty set intersection or ∩ operator. If the number of matches is less than the current $best amount, it becomes the candidate. $best starts of at 26, as the longest possible word is one which has each letter of the alphabet once. In hindsight, it should have been called $worst because we really want as few matches as possible, ideally 0 which would mean the order entirely consists of letters which have not already been used in the pangram.

        my $best = 26;
        for @choices -> $choice {
            my $matches = ($choice.comb ∩ @used).elems; 
            if  $matches <= $best {
                $best = $matches;
                $word = $choice;
            }
        }

By the time, we have gone through the list, we have the best (or least worst) possible $word so it is added to the pangram.

        @pangram.push($word);

The letters used in $word are added to @used.

        @used.push(| $word.comb);

The letters in @used are removed from @order.

        @order = @order.grep({ $_ ne @used.any; });

If there are no letters left in @order we must have a pangram. We can break out of the loop and return it.

        if @order.elems == 0 {
            last;
        }
    }

    return @pangram;
}

Finally, back in MAIN() we print the pangram and its' length.

say "'", @pangram.join(q{ }), "' has ", count(@pangram), ' letters.';

The length required a little extra code which I bundled into a function called count().

sub count(@pangram) {
    return [+] @pangram.map({ $_.chars});
}

(Full code on Github.)

This just counts the characters in each word in the pangram and sums them using the [+] operator.

Because Perl does not have all the bells and whistles of Raku, lots of extra code needed to be written.

This function returns true if all the letters in a word are unique or false if there are duplicates.

sub allUnique {
    my ($word) = @_;
    my %count;
    for my $letter (split //, $word) {
        $count{$letter}++;
    }


    return (scalar grep { $count{$_} > 1} keys %count) == 0;
}

This one does an intersection between two lists and returns it as a list. It took me quite a while to get this working properly.

sub intersect {
    my ($first, $second) = @_;
    my %count;

    foreach (@{$first}, @{$second}) {
        $count{$_}++;
    }

    my @intersection = sort grep { $count{$_} > 1 } keys %count;

    return wantarray ? @intersection : \@intersection;
}

And this one removes any items in the first list that are in the second list and returns the newly editted first list.

sub remove {
    my ($first, $second) = @_;

    for my $letter (@{$second}) {
        @{$first} = grep {$_ ne $letter; } @{$first};
    }

    return wantarray ? @{$first} : $first;
}

These last two functions use wantarray to make sure they return the right value (list or scalar) for the context the function is called in.

Finally, this function counts all the characters in a pangram and returns the total.

sub count {
    my ($pangram) = @_;
    my $total = 0;

    for my $word (@{$pangram}) {
        $total += length $word;
    }

    return $total;
}

Equipped with all this we can copy the algorithm used in the Raku solution.

sub makePangram {
    my ($wordlists, $order) = @_;
    my @pangram = ();
    my @used;

    while(1) {
        my $letter = $order->[0];
        my @choices = sort { (length $a) <=> (length $b) } @{$wordlists->{$letter}};
        my $word = q{};

        my $best = 26;
        my @matches;
        for my $choice (@choices) {
            @matches = @{intersect([ split //, $choice ], \@used)};
            if (scalar @matches <= $best) {
                $best = scalar @matches;
                $word = $choice;
            }
        }

        push @pangram, $word;
        push @used, remove([split //, $word], \@used);
        $order = remove($order, \@used);

        if (! scalar @{$order}) {
            last;
        }
    }

    return wantarray ? @pangram : \@pangram;
}

my %wordlists;

open my $dictionary, '<', 'dictionary.txt' or die "$OS_ERROR\n";
while (my $line = <$dictionary>) {
    chomp $line;
    my $word = $line;
    my @letters = split //, $word;
    if ($word eq (join q{}, sort @letters) && allUnique($word)) {
        for my $letter (@letters) {
            push @{$wordlists{$letter}}, $word;
        }
    }
}
close $dictionary;

my @order = sort { scalar @{$wordlists{$a}} <=> scalar @{$wordlists{$b}} } ('a' .. 'z');

my $pangram = makePangram(\%wordlists, \@order);

say "'", (join q{ }, @{$pangram}), "' has ", count($pangram), ' letters.';

(Full code on Github.)

The final result of my exertions was this:

'qt chintz joy know iv flux begs pry ad m' has 31 letters.

...which is not bad. (26 is the theoretical minimum.) I keep wondering if I could done better with a better algorithm. (Thr phrase "topological sort" is floating around my brain.) but it would probably have taken me another week to figure that out so I'm happy with what I've got.