
Hello all, I have been rewriting a small utility program of mine from C to Haskell for fun. This tool reads lines from stdin or from files, shuffles them one or more times using the Fisher-Yates algorithm, and outputs the result to stdout. Since this algorithm is based on in-place updates, I've been storing my strings in a mutable array in the ST monad. Since it's holding strings I could not use an unboxed array. The resulting program works fine and seems to run at a decent speed, even though it is much slower than the original C version, slightly more so than I expected. While trying to optimize it using profiling, and playing with the number of shuffling passes, I noticed that this operation was responsible for a significant amount of the runtime, much more so than with the C version. I also noticed that the %GC time was around 56%. In order to do more tests, I wrote another version of this program which keeps the strings in a pure and immutable array, and stores the indices of this array in an unboxed mutable ST array. The shuffling is then done on this indices array instead of the strings array. This version runs much faster and only spends ~21% of its time in the garbage collector, at the cost of consuming more memory for the indices array. I'm attaching both versions of the code to this e-mail, and I'd be curious to hear about any possible improvements to it, and whether the performance of STArray of ByteString I'm observing corresponds to people's expectations. Thanks in advance, Maxime Henrion
participants (1)
-
Maxime Henrion