Perl Weekly Challenge: Week 376

Challenge 1:

Chessboard Squares

You are given two coordinates of a square on 8x8 chessboard.

Write a script to find the given two coordinates have the same colour.

8 W B W B W B W B
7 B W B W B W B W
6 W B W B W B W B
5 B W B W B W B W
4 W B W B W B W B
3 B W B W B W B W
2 W B W B W B W B
1 B W B W B W B W
  a b c d e f g h
Example 1
Input: $c1 = "a7", $c2 = "f4"
Output: true
Example 2
Input: $c1 = "c1", $c2 = "e8"
Output: false
Example 3
Input: $c1 = "b5", $c2 = "h2"
Output: false
Example 4
Input: $c1 = "f3", $c2 = "h1"
Output: true
Example 5
Input: $c1 = "a1", $c2 = "g8"
Output: false

Because the colors of the squares of a chessboard alternate in a regular way it is easy given the coordinate of a square to determine its' color. In fact we can do it in one line of Raku or Perl code. As my one-liner actually contains two statements, I've split it for clarity.

First we need to get the input into the script. The first line takes two coordinates as command-line arguments, .join()s them together and then splits them into an array of individual characters with .comb(). As I write this, it occurs to me that I could have used ^2 instead of 0..1 and saved 2 characters but it's too late now. I'll try and remember for next time.

my @a = (@*ARGS[0..1].join).comb;

Now we take the numeric value of the 1st element which is a letter with .ord() and add the value of the 2nd element (which is already a number) to it. If the resulting value is even, the square is white or if it is odd, the square is black. We can determine if the value is odd or even by using the % 2 (modulus 2) operator.

The same procedure is followed for the 3rd and 4th elements. The two results are compared with == and if they are both even or both odd, True is output otherwise False.

say (@a[0].ord + @a[1]) % 2 == (@a[2].ord + @a[3]) % 2

(Full code on Github.)

We can do the same thing in Perl but unfortunately booleans are still not automatically stringified so we have to add ? "true" : "false" at the end.

my @a = split//, join q{}, @ARGV[0,1];
say (((ord($a[0]) + $a[1]) % 2 == (ord($a[2]) + $a[3]) % 2) ? "true" : "false")

(Full code on Github.)

Challenge 2:

Doubled Words

You are given a string (which may contain embedded newlines) which is taken from a page on a website. The string will not contain brackets qw{ [ ] }.

Write a script that will find doubled words (such as “this this”) and highlight (wrap in brackets) each doubled word.

The script should:

- Work across lines, even finding situations where a word at the end of
one line is repeated at the beginning of the next.

- Find doubled words despite capitalization differences, such as with
'The the...', as well as allow differing amounts of whitespace (spaces,
tabs, newlines, and the like) to lie between the words.

- Find doubled words even when separated by HTML tags. For example, to
make a word bold: '...it is <B>very</B> very important...'. Only show
lines containing doubled words.

Adapted from Mastering Regular Expressions, Third Edition by Jeffrey E. F. Friedl

Example 1
Input: $str = "you're given the job of checking the pages on a\nweb server for doubled words (such as 'this this'), a common problem\nwith documents subject to heavy editing."

Output: "web server for doubled words (such as '[this] [this]'), a common problem"
Example 2
Input: $str = "Find doubled words despite capitalization differences, such as with 'The\nthe...', as well as allow differing amounts of whitespace (spaces,\ntabs, newlines, and the like) to lie between the words."

Output: "Find doubled words despite capitalization differences, such as with '[The]\n[the]...', as well as allow differing amounts of whitespace (spaces,"
Example 3
Input: $str = "to make a word bold: '...it is <B>very</B> very important...'."

Output: "to make a word bold: '...it is <B>[very]</B> [very] important...'."
Example 4
Input: $str = "Perl officially stands for Practical Extraction and Report Language, except when it doesn't."

Output: ""
Example 5
Input: $str = "There's more than one one way to do it.\nEasy things should be easy and hard things should be possible."

Output: "There's more than [one] [one] way to do it."

The ascription to Friedl's book (which I own and is great btw) gives away how we should solve this problem. Regular Expressions are very powerful but advanced usages require quite a bit of thought. I spent more time on this challenge than any in recent memory.

Because I am more familiar with Perl regular expressions, I did Perl first. The core of the script is a single use of the s/// (substitution) operator. I used the /x flag so I could spread it out over several lines for legibility.

$str =~ s/

The first part of the operator is a regular expression search pattern. Because we are using the /i flag, the search will be case-insensitive. The /g flag makes the search global and /s flag treats the entire input as one continuous string so matches can occur accross line boundaries.

The first thing to match is a sequence of alphanumeric characters (the character class \w) preceded by a word boundary (\b). This is captured by the surrounding parentheses for future use. I'm not exactly sure why we needed the preceding \b but it would not work properly without it.

    \b (\w+)

Prospective doubled words can be separated by whitespace \s. The spec says we should also treat HTNL tags as separators. The following regular expression matches whitespace or tags. All this is also captured. Note that the expression for an HTML tag is also in parentheses but this is only to group it together. We signal to the regex engine that this group is not for capturing by putting ?: before the operning parenthesis.

    ( \s | (?: \s? <[^>]+> \s? )+ )

Then we try and match another instance of the first match represented by the back reference \1. This too is captured.

    (\1)

Now for the second part of the operator. We have three matches: a word, intervening whitespace, and the words' double. Because we have the /e flag, we can interpolate them into a string as backreferences $1,$2, and $3. We enclose the doubled words, i.e. $1 and $3 in square brackets. This string replaces whatever was previously in that position in $str.

/
    "\[$1]$2\[$3]"

In Perl, flags to the s/// operator go at the end whereas in Raku they go after the s.

/igxe;

Now we only want to print lines that contain matches so first we split() $str into lines and with grep() filter any line that contains [; or we could have looked for ] but we don't need both. The matched lines are join()ed together with newlines and the whole thing is output with say().

say join qq{\n}, grep { /[\[]/ } split /\n/, $str; 

(Full code on Github.)

For Raku, we are employing the same method but Raku regular expressions have a somewhat different syntax I'm not 100% used to yet so there may be better ways to do some of this.

Because function parameters are immutable by default in Raku, we can't modify $str so the first order of business is to copy it to another variable.

my $text = $str;

Although Raku does have a S/// operator I'm using the String classes' .subst() method to achieve the same thing.

$text = $text.subst(

The first argument is a regular expression search pattern as in the Perl version.

    /

One annoying thing about Raku regular expressions is that some flags go before the expression and some go within it and it isn't always clear to me which is which. If you are using .subst() it's even worse; some flags go after as a separate parameter as we shall see shortly.

        :i 

\b in Raku is <?wb>. I don't know why they had to make it more verbose like that. Yet another pet peeve is that if you want to use a reference to a capture group such as $0 within the same regular expression, you have to assign it to a variable first and then reference that variable. Luckily I recalled this from PWC 328,

        <?wb> (\w+) {} :my $m = $0; 

This is the part that deals with separators.

        (\s || [\s? '<' <-[>]>+ '>' \s?]+)

And this is the second instance of the doubled word.

        ($m)
    /,

The substitution is done in the same way as Perl.

    { "\[$0]$1\[$2]" },

As I wrote earlier, some flags are defined as additional parameters at the end of .subst(). There is no need for equivalents to the \x and \e flags as those are default behaviors for Raku regular expressions.

    :g
);

Printing matched lines only is done in the same way as in Perl but shortcut methods like .lines() and .contains() plus left to right method chaining make it a lot more readable in my opinion.

$text.lines.grep({ .contains(q{[}) }).join("\n").say;

(Full code on Github.)