Perl Weekly Challenge: Week 32

Week 32 Challenge 1:

Count instances

Create a script that either reads standard input or one or more files specified on the command-line. Count the number of times and then print a summary, sorted by the count of each entry.

So with the following input in file example.txt

  apple
  banana
  apple
  cherry
  cherry
  apple

the script would display something like:

  apple     3
  cherry    2
  banana    1

For extra credit, add a -csv option to your script, which would generate:

  apple,3
  cherry,2
  banana,1

This is the kind of text-processing task for which Perl is second to none and the kind of script I've written many times so it did not take me long to come up with this. Normally I don't use modules in these challenges (apart from English to make special variable names less cryptic) but as the extra credit involves option processing I used Getopt::Long rather than a home brewed solution.

my $csv = undef;

GetOptions('csv' => \$csv);

my @contents;
if (scalar @ARGV) {
    for my $file (@ARGV) {
        open my $fh, '<', $file or die "$OS_ERROR\n";
        local $INPUT_RECORD_SEPARATOR = undef;
        push @contents, split /\n/, <$fh>;
        close $fh;
    }
} else {
    local $INPUT_RECORD_SEPARATOR = undef;
    push @contents, split /\n/, <STDIN>;
}

First we treat all the command-line arguments as filenames and read in the files they represent. Rather than read them line by line, we set the special variable $INPUT_RECORD_SEPARATOR or $RS to undef to slurp up the entire file in one go and then split it into lines in memory.

If there are no arguments we read from Standard Input instead but the rest of the procedure is the same.

my %totals;
for my $item (@contents) {
    chomp $item;
    $totals{$item}++;
}

Next we go through all the data we acquired in step one and add it into a hash, treating the value of each line as a key into that hash, and the number of times it occurs as the value of that key. In a 'real' production script we would probably want to trim any excess whitespace at the beginning or end of the item, standardize capitalization, case, spelling etc. but for this task we don't need any of that.

if ($csv) {
    for my $total (sort keys %totals) {
        say "$total,$totals{$total}";
    }
} else {
    my $width =
        length ((sort {length $b <=> length $a} keys %totals)[0]);

    for my $total (sort keys %totals) {
        printf("% -*s  %s\n", $width, $total, $totals{$total});
 }
}

(Full code on Github.)

Now we can sort the hash in order of descending values and if we are in CSV mode, just print the keys and values separated by a comma. For our non-CSV mode, we want the output to line up in nice columns. So first we sort the keys again to find the length of the longest one and then use that to pad the rest of the keys out to the same length with spaces.

The Raku version is basically the same but you can see that Raku has a number of affordances (IO.lines is one of my favorites) that make the code more compact.

sub MAIN(
    Bool :$csv, #= output results in CSV format
    *@files
) {
    my %totals;

    if @files.elems {
        for @files -> $file {
            $file.IO.lines.map({ %totals{$_}++; });
        }
    } else {
        $*IN.lines.map({ %totals{$_}++; });
    }

    if $csv {
        %totals.keys.sort.map({ say "$_,%totals{$_}"; });
    } else {
        my $width =
            %totals.keys.sort({$^b.chars <=> $^a.chars}).first.chars;

        %totals.keys.sort.map({
            printf("% -*s  %s\n", $width, $_, %totals{$_});
        });
    }
}

(Full code on Github.)

Week 30 Challenge 2:

ASCII bar chart

Write a function that takes a hashref where the keys are labels and the values are integer or floating point values. Generate a bar graph of the data and display it >to stdout.

The input could be something like:

  $data = { apple => 3, cherry => 2, banana => 1 };
  generate_bar_graph($data);

And would then generate something like this:

  apple  | ############
  cherry | ########
  banana | ####

If you fancy then please try this as well: (a) the function could let you specify whether the chart should be ordered by (1) the labels, or (2) the values.

use constant SCALE => 4;

sub generate_bar_graph {
    my ($data, $bylabels) = @_;

    my @labels = sort { $data->{$b} <=> $data->{$a}} keys %{$data};
    my $smallest = $data->{$labels[$#labels]};

    if (defined $bylabels) {
        @labels = sort @labels;
    }

    my $width = length ((sort {length $b <=> length $a} @labels)[0]);
    my $bar_graph = q{};

    for my $label (@labels) {
        my $bar = ($data->{$label} / $smallest) * SCALE;
        if ($data->{$label} % $smallest) {
            $bar += SCALE / 2;
        }
        $bar_graph .= sprintf("% -*s | %s\n", $width, $label, '#' x $bar);
    }

    return $bar_graph;
}

(Full code on Github.)

The key issue I wanted to handle when writing this code was the scale of the bars in the graph. In the example the values were all single digits but what if they were bigger numbers? I decided to make the smallest value "one unit" (expressed as '####') and express all the other values in terms of that unit. If the value is not a whole number of units, I added '##' to the bar. This is by no means perfect (what if a value is 0 or negative? for instance) but it makes the graph look a little better.

Other than that it is simple. We sort the data by value, re-sort by the lexicographical order of the labels if wanted and then print the label and a bar representing its' value for each piece of data.

The Raku version is just a translation of the Perl so there is nothing about it that bears commenting really.

sub generate_bar_graph(%data, Bool $bylabels = False) {
    constant $SCALE = 4;

    my @labels = %data.keys.sort({ %data{$^b} <=> %data{$^a} });
    my $smallest = %data{@labels[@labels.end]};

    if ($bylabels) {
        @labels = @labels.sort;
    }

    my $width = @labels.sort({$^b.chars <=> $^a.chars}).first.chars;
    my $bar_graph = q{};

    for @labels -> $label {
        my $bar = (%data{$label} / $smallest) * $SCALE;
        if %data{$label} !%% $smallest {
            $bar += $SCALE / 2;
        }
        $bar_graph ~= sprintf("% -*s | %s\n", $width, $label, '#' x $bar);
    }

    return $bar_graph;
}

(Full code on Github.)