Perl Weekly Challenge: Week 24

Challenge 1:

Create a smallest script in terms of size that on execution doesn’t throw any error. The script doesn’t have to do anything special. You could even come up with smallest one-liner.

The smallest useful Perl script I can think of is this.

perl -V

(Full code on Github.)

It gives you lots of useful information about the configuration of your perl5 installation. The Rakudo interpreter for Perl6 has the same switch.

perl6 -V

(Full code on Github.)

The output is easily machine parseable which can be handy at times.

Challenge 2:

Create a script to implement full text search functionality using Inverted Index. According to wikipedia:

In computer science, an inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

Here is a nice example of Inverted Index.

This specification is a bit vague so I'm not entirely sure my solutions do the right thing. What I did was to parse any files given on the command line, split them into lines, and then split the lines into words and for each word, make an index entry noting which file and line number within that file it appears in. Then I printed the index out. Here's the Perl5 code:

sub process {
    my ($filename, $index) = @_;
    open my $fh, '<', $filename or die "$OS_ERROR!";
    my $lineno = 0;
    while (my $line = <$fh>) {
        $lineno++;
        my @words = split /\W+/msx, $line;
        for my $word (@words) {
            push @{$index->{$word}}, { document => $filename, line => $lineno };
        }
    }
    close $fh or die "$OS_ERROR\n";
}

my %index;

for my $file (@ARGV) {
    process($file, \%index);
}

for my $word (sort keys %index) {
    say $word;
        for my $entry (sort { $a->{document} cmp $b->{document} } @{$index{$word}}) {
            say q{ }, $entry->{document}, ' - ', $entry->{line}; 
        }
}

(Full code on Github.)

And here is the Perl6 version:

sub process($filename, %index) {
    my $lineno = 0;

    for $filename.IO.lines -> $line {
        $lineno++;
        for $line.words -> $word {
            %index{$word}.push({ document => $filename, line => $lineno });
        }
    }
}

sub MAIN(*@ARGS) {
    my %index;

    for @*ARGS -> $filename {
        process($filename, %index);
    }

    for %index.keys.sort -> $word {
        $word.say;
        for %index{$word}.sort({ %^a{'document'} cmp %^b{'document'} }) -> %entry {
            say q{ }, %entry{'document'}, ' - ', %entry{'line'}; 
        }
    }
}

(Full code on Github.)

I ran into a couple of difficulties which weren't huge but caused me to miss the deadline with this particular solution. One is I apparently don't understand the Perl6 equivalent of @ARGV. Well, I know it's @*ARGS but it seems you also need a particular signature to the MAIN() function. The * in *@ARGS means that this is a "slurpy" parameter which will eat up all the subsequent parameters. Without it, the function would ask for one parameter which is an array. Once we've jumped over that hurdle, in the body of the function we can use @*ARGS (note the swapped positions of @ and *.) How confusing! Another little gotcha is that you can't use a bareword as a Hash key i.e. %entry{'line'} not %entry{line}. And speaking of hashes, Perl6 has a possibly more appropriate datatype called the BagHash which I could have used. I say possibly because I don't know how optimized it is (Rakudo has gotten faster with each release but it's still nowhere near as fast as Perl5.) But my intuition tells me a more specific datatype is going to be better than a generic one. Unfortunately in the limited time I had, I was not able to get it working so that's an experiment I will have to leave for another time.

On the plus side, once again we see that Perl6 has nice convenience methods like .words and .lines that while not strictly necessary, help avoid a lot of boilerplate code.