Fast mutable arrays of ByteString?

15 Apr 2009

      Hello all,

I have been rewriting a small utility program of mine from C to Haskell
for fun.  This tool reads lines from stdin or from files, shuffles them
one or more times using the Fisher-Yates algorithm, and outputs the
result to stdout.

Since this algorithm is based on in-place updates, I've been storing my
strings in a mutable array in the ST monad.  Since it's holding strings
I could not use an unboxed array.  The resulting program works fine and
seems to run at a decent speed, even though it is much slower than the
original C version, slightly more so than I expected.

While trying to optimize it using profiling, and playing with the number
of shuffling passes, I noticed that this operation was responsible for a
significant amount of the runtime, much more so than with the C version.
I also noticed that the %GC time was around 56%.

In order to do more tests, I wrote another version of this program which
keeps the strings in a pure and immutable array, and stores the indices
of this array in an unboxed mutable ST array.  The shuffling is then
done on this indices array instead of the strings array.

This version runs much faster and only spends ~21% of its time in the
garbage collector, at the cost of consuming more memory for the indices
array.

I'm attaching both versions of the code to this e-mail, and I'd be
curious to hear about any possible improvements to it, and whether the
performance of STArray of ByteString I'm observing corresponds to
people's expectations.

Thanks in advance,
Maxime Henrion

Maxime Henrion

tags

participants (1)