Is Haskell capable of matching C in string processing performance?

22 Jan 2010

      Recently I've been working on a library for generating large JSON[1]
documents quickly. Originally I started writing it in Haskell, but
quickly encountered performance problems. After exhausting my (meager)
supply of optimization ideas, I rewrote some of it in C, with dramatic
results. Namely, the C solution is

* 7.5 times faster than the fastest Haskell I could write (both using
raw pointer arrays)
* 14 times faster than a somewhat functional version (uses monads, but
no explicit IO)
* >30 times faster than fancy functional solutions with iteratees, streams, etc

I'm wondering if string processing is simply a Haskell weak point,
performance-wise. The problem involves many millions of very small
(<10 character, usually) strings -- the C solution can copy directly
from string literals into a fixed buffer and flush it occasionally,
while even the fastest Haskell version has a lot of overhead from
copying around arrays.

Dons suggested I was "doing it wrong", so I'm posting on -cafe in the
hopes that somebody can tell me how to get better performance without
resorting to C.

Here's the fastest Haskell version I could come up with. It discards
all error handling, validation, and correctness in the name of
performance, but still can't get anywhere near C:
http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423

[1] http://json.org/

John Millikin

Don Stewart

Heinrich Apfelmus

John Millikin

Taru Karttunen

John Millikin

Gregory Crosswhite

Tom Nielsen

John Millikin

Don Stewart

Gregory Crosswhite

Eugene Kirpichov

tags

participants (7)