Perl Weekly Challenge: Week 370

Challenge 1:

Popular Word

You are given a string paragraph and an array of the banned words.

Write a script to return the most popular word that is not banned. It is guaranteed there is at least one word that is not banned and the answer is unique. The words in paragraph are case-insensitive and the answer should be in lowercase. The words can not contain punctuation symbols.

Example 1
Input: $paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
    @banned = ("hit")
Output: "ball"

After removing punctuation and converting to lowercase, the word "hit" appears 3 times, and "ball" appears 2 times.
Since "hit" is on the banned list, we ignore it.
Example 2
Input: $paragraph = "Apple? apple! Apple, pear, orange, pear, apple, orange."
    @banned = ("apple", "pear")
Output: "orange"

"apple"  appears 4 times.
"pear"   appears 2 times.
"orange" appears 2 times.

"apple" and "pear" are both banned.
Even though "orange" has the same frequency as "pear", it is the only non-banned word with the highest frequency.
Example 3
Input: $paragraph = "A. a, a! A. B. b. b."
    @banned = ("b")
Output: "a"

"a" appears 4 times.
"b" appears 3 times.

The input has mixed casing and heavy punctuation.
The normalised, "a" is the clear winner, since "b" is banned, "a" is the only choice.
Example 4
Input: $paragraph = "Ball.ball,ball:apple!apple.banana"
    @banned = ("ball")
Output: "apple"

Here the punctuation acts as a delimiter.
"ball"   appears 3 times.
"apple"  appears 2 times.
"banana" appears 1 time.
Example 5
Input: $paragraph = "The dog chased the cat, but the dog was faster than the cat."
    @banned = ("the", "dog")
Output: "cat"

"the" appears 4 times.
"dog" appears 2 times.
"cat" appears 2 times.

"chased", "but", "was", "faster", "than" appear 1 time each.
"the" is the most frequent but is banned.
"dog" is the next most frequent but is also banned.
The next most frequent non-banned word is "cat".

First we use the words in @banned to make a Set for efficient lookup. In the process, the words are converted to lower case en masse with the hyper operator ».lc

my $bannedWords = @banned».lc.Set;

We will also need to keep track of how many times each word occurs. We do this in a Hash where the keys are words, and the values are their frequency.

my %wordCounts;

Now we need to extract all the words from $paragraph. Although Raku does have a .words() method, it is not what we want because it is basically the equivalent of .split(/\s+/) which means the resulting words may have punctuation etc. in them. .spit(/W+/) results in only alphanumeric words.

Each word...

for $paragraph.lc.split(/\W+/) -> $word {

...is counted nf it is not an element of () the banned word set.

    if $word ∉ $bannedWords {
        %wordCounts{$word}++;
    }
}

In order to find the most frequent word, we first allot storage for the current best candidate and its' count.

my $bestWord = q{};
my $bestCount = -∞;

Then we go through %wordcounts and find the word that occurs most frequently...

for %wordCounts.kv -> $word, $count {
    if $count > $bestCount {
        $bestWord = $word;
        $bestCount = $count;
    }
}

...and print it out.

say $bestWord;

(Full code on Github.)

This is the Perl version that works the same way as in Raku. I thought we might have to add code because Perl doesn't support Sets natively but the workaround was surprisingly simple.

my %bannedWords = map { lc $_ => 1 } @banned;
my %wordCounts;

for my $word ( split /\W+/, lc $paragraph ) {
    unless (exists $bannedWords{$word}) {
        $wordCounts{$word}++;
    }
}

my $bestWord = '';
my $bestCount = -"Inf";

while (my ($word, $count) = each %wordCounts) {
    if ( $count > $bestCount ) {
        $bestWord = $word;
        $bestCount = $count;
    }
}

say $bestWord;

(Full code on Github.)

Challenge 2:

Scramble String

You are given two strings A and B of the same length.

Write a script to return true if string B is a scramble of string A otherwise return false.

String B is a scramble of string A if A can be transformed into B by a single (recursive) scramble operation.

A scramble operation is:

- If the string consists of only one character, return the string.
- Divide the string X into two non-empty parts.
- Optionally, exchange the order of those parts.
- Optionally, scramble each of those parts.
- Concatenate the scrambled parts to return a single string.
Example 1
Input: $str1 = "abc", $str2 = "acb"
Output: true

"abc"
split: ["a", "bc"]
split: ["a", ["b", "c"]]
swap: ["a", ["c", "b"]]
concatenate: "acb"
Example 2
Input: $str1 = "abcd", $str2 = "cdba"
Output: true

"abcd"
split: ["ab", "cd"]
swap: ["cd", "ab"]
split: ["cd", ["a", "b"]]
swap: ["cd", ["b", "a"]]
concatenate: "cdba"
Example 3
Input: $str1 = "hello", $str2 = "hiiii"
Output: false

A fundamental rule of scrambled strings is that they must be anagrams.
Example 4
Input: $str1 = "ateer", $str2 = "eater"
Output: true

"ateer"
split: ["ate", "er"]
split: [["at", "e"], "er"]
swap: [["e", "at"], "er"]
concatenate: "eater"
Example 5
Input: $str1 = "abcd", $str2 = "bdac"
Output: false

In the MAIN() function we can immediately dispose of a few false conditions; if the two strings are of different lengths or not anagrams, they cannot be scrambles so we needn't proceed further.

if $str1.chars != $str2.chars || $str1.comb.sort !~~ $str2.comb.sort {
    say False;
} else {

Otherwise we pass the two strings to the isScramble() function with will return True or False depending on if they are scrambles or not. The result is printed with say().

    say isScramble($str1, $str2);
}

The isScramble() function uses a recursive divide and conquer approach to determine if $str2 is a scramble of $str1.

sub isScramble(Str $str1, Str $str2) {

As a recursive function, it is important that we have a base case that will halt execution. If the two strings are (or have become due to the next steps) we just return True.

    if $str1 eq $str2 {
        return True;
    }

If not we have to split the strings. We do so in a loop to try every possible split position.

    my $n = $str1.chars;
    for 1 ..^ $n -> $i {

For each split the code extracts the left and right parts of $str1, and similarly for $str2.

        my $str1A = $str1.substr(0, $i);
        my $str1B = $str1.substr($i, $n - $i);

        my $str2A = $str2.substr(0, $i);
        my $str2B = $str2.substr($i, $n - $i);

isScramble() is called again with the first parts of the two strings as its parameters and then one more time the second parts. If both calls return True we can return True.

        if isScramble($str1A, $str2A) && isScramble($str1B, $str2B) {
            return True;
        }

The same procedure is done again but this time with the order of the split parts of $str2 swapped.

        my $str2AS = $str2.substr($n - $i, $i);
        my $str2BS = $str2.substr(0, $n - $i);

        if isScramble($str1A, $str2AS) && isScramble($str1B, $str2BS) {
            return True;
        }
    }

If we haven't returned True by the point, we don't have a scramble so we return False.

    return False;
}

(Full code on Github.)

With the recent release of Ubuntu Linux 26.04 LTS, I was finally able to upgrade to Perl version 5.40.1. The first big difference I've noticed is that the true and false constants are not experimental any more.

The script itself follows the Raku version closely.

sub isScramble($str1, $str2) {

    if ($str1 eq $str2) {
        return true;
    }

    my $n = length $str1;
    for my $i (1 .. $n - 1) {
        my $str1A = substr $str1, 0, $i;
        my $str1B = substr $str1, $i, $n - $i;

        my $str2A = substr $str2, 0, $i;
        my $str2B = substr $str2, $i, $n - $i;

        if (isScramble($str1A, $str2A) && isScramble($str1B, $str2B)) {
            return true;
        }

        my $str2AS = substr $str2, $n - $i, $i;
        my $str2BS = substr $str2, 0, $n - $i;

        if (isScramble($str1A, $str2AS) && isScramble($str1B, $str2BS)) {
            return true;
        }
    }

    return false;
}

my ($str1, $str2) = @ARGV;

if (length($str1) != length($str2) ||
join(q{}, sort split //, $str1) ne join(q{}, sort split //, $str2)) {
    say "false";
} else {
    say isScramble($str1, $str2) ? "true" : "false";
}

(Full code on Github.)