[Haskell-cafe] More fun with micro-benchmarks and optimizations. (GHC vs Perl)

23 Jul 2008

      I have the need to regularly write tiny programs that analyze output
logs. The output logs don't have a consistent formatting so I
typically choose Perl for these tasks.

The latest instance of having to write such a program was simple
enough I figured I'd try my hand at using Haskell instead. The short
story is that I could quickly write a Haskell program that achieved
the goal. Yay! However, the performance was ~8x slower than a
comparable Perl implementation. With a lot of effort I got the Haskell
version to only 2x slower. A lot of the optimization was done with
guesses that the performance difference was due to how each line was
being read from the file. I couldn't determine much using GHC's
profiler.

I still have two questions after all this:
 - Can I get a Haskell implementation as fast as the Perl?
 - What do I need to do to get GHC's profiler to provide me usable
information? Telling me that 98% of the time was in "main" is not very
enlightening ;-)

All the code and data for this little experiment I've placed here:
http://tothepowerofdisco.com/repo/sum_optimization/
My first, and shortest, version is SumTiny.hs. The optimized version
is SumFast.hs.

The long version for the curious:

The (cleaned up) data was a 78k line file consisting of lines like the
following:
framerate (prev == no pts): 15
framerate (delta): 25
framerate (invalid timebase): 12.5
... and so on.

The need was for a program that calculated 1.0 / framerate for each
line and produced the sum. Easy no?

My straightforward Haskell solution was:
-------------------------------------------------------------------
import Text.Regex.Posix

main = do
    f_lines <- readFile "test.out" >>= return . lines
    let duration = foldl add_line 0.0 f_lines
        add_line sum line =
            let [[_,m]] = line =~ "([0-9.]+)"
                framerate = read m
                delta = 1.0 / framerate
            in sum + delta
    putStrLn $ "Duration (sec): " ++ show duration
-------------------------------------------------------------------

Just for fun I decided to compare it to a Perl implementation:
-------------------------------------------------------------------
#!/usr/bin/perl
my @f_lines = split(/\n/,`cat test.out`);
my $sum = 0.0;
foreach(@f_lines)
{
    /([0-9.]+)/;
    my $delta = 1.0 / $1;
    $sum += $delta;
}
print("Duration (sec): ", $sum, "\n");
-------------------------------------------------------------------

(I'm sure there are other ways to write this same program in both languages.)
I was pretty happy with how the Haskell implementation's code compared
to the Perl implementation's just in terms of looks. Though I think
the Perl implementation is easier to understand at the part where the
regex match is extracted.

For fun I compared the performance of the two:
(64bit linux running on a 2.4ghz Core2Duo)

$ time perl ./Sum.pl
Duration (sec): 3155.62666666438

real    0m0.121s
user    0m0.103s
sys     0m0.016s

$ time ./SumTiny
Duration (sec): 3155.626666664377

real    0m1.099s
user    0m1.073s
sys     0m0.009s

Youch! ~1s is fast enough that I don't care, but I'm still curious why
there is a 8x performance difference. Profiling with manual cost
center annotations (See SumTinyProf.hs) indicated the "do" expression
main is equal to was responsible 90% of the tiny. Which isn't
revealing at all!

Some experimenting led me to find a 2x slower implementation
(SumFast.hs). Quick notes ont he changes I made:
 - Used ByteStrings instead of String.
 - Used hGetLine instead of reading the entire file lazily and splitting on "\n"
 - Fusing the foldl' with the loop of reading each line from the file.
 - Using bang patterns to make the fused loop strict on the
accumulator argument.

I think the largest performance gain was from changing how each line
was read from the file. The lazy read and split seemed very slow
compared to a loop that used hGetLine.

-- 
-Corey O'Connor