Is Haskell capable of matching C in string processing performance?

Recently I've been working on a library for generating large JSON[1] documents quickly. Originally I started writing it in Haskell, but quickly encountered performance problems. After exhausting my (meager) supply of optimization ideas, I rewrote some of it in C, with dramatic results. Namely, the C solution is * 7.5 times faster than the fastest Haskell I could write (both using raw pointer arrays) * 14 times faster than a somewhat functional version (uses monads, but no explicit IO) * >30 times faster than fancy functional solutions with iteratees, streams, etc I'm wondering if string processing is simply a Haskell weak point, performance-wise. The problem involves many millions of very small (<10 character, usually) strings -- the C solution can copy directly from string literals into a fixed buffer and flush it occasionally, while even the fastest Haskell version has a lot of overhead from copying around arrays. Dons suggested I was "doing it wrong", so I'm posting on -cafe in the hopes that somebody can tell me how to get better performance without resorting to C. Here's the fastest Haskell version I could come up with. It discards all error handling, validation, and correctness in the name of performance, but still can't get anywhere near C: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423 [1] http://json.org/

jmillikin:
Here's the fastest Haskell version I could come up with. It discards all error handling, validation, and correctness in the name of performance, but still can't get anywhere near C: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
Thanks for posting the code. You're not using bytestrings?? They were invented to deal with the problem of [Char] being a poor structure for large scale string processing, and you should have no problem getting C-like string performance. http://www.cse.unsw.edu.au/~dons/papers/CSL06.html -- Don

Don Stewart wrote:
jmillikin:
Here's the fastest Haskell version I could come up with. It discards all error handling, validation, and correctness in the name of performance, but still can't get anywhere near C: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
Thanks for posting the code.
You're not using bytestrings??
They were invented to deal with the problem of [Char] being a poor structure for large scale string processing, and you should have no problem getting C-like string performance.
In my limited experience, ByteStrings are great for reading data, but not that good for writing data that is being generated on the fly. For writing, good old difference lists or the Builder monoid / Put monad from Data.Binary seem to be best. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Correct me if I'm wrong, but ByteStrings can't contain non-ASCII
values, right? I'm looking for something like this pseudo-C:
typedef void (*Callback)(const uint32_t *chars, size_t n_chars, void *);
WriterState *new_state (Callback, void *);
I tried using the Text type, but its conversions to Ptr Word16 are all
O(n) -- not much better than String.
On Thu, Jan 21, 2010 at 22:28, Don Stewart
jmillikin:
Here's the fastest Haskell version I could come up with. It discards all error handling, validation, and correctness in the name of performance, but still can't get anywhere near C: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
Thanks for posting the code.
You're not using bytestrings??
They were invented to deal with the problem of [Char] being a poor structure for large scale string processing, and you should have no problem getting C-like string performance.
http://www.cse.unsw.edu.au/~dons/papers/CSL06.html
-- Don

Excerpts from John Millikin's message of Fri Jan 22 19:40:58 +0200 2010:
Correct me if I'm wrong, but ByteStrings can't contain non-ASCII values, right? I'm looking for something like this pseudo-C:
typedef void (*Callback)(const uint32_t *chars, size_t n_chars, void *); WriterState *new_state (Callback, void *);
I tried using the Text type, but its conversions to Ptr Word16 are all O(n) -- not much better than String.
Are you using unicode on the C side with wchar_t? You can have utf-8 inside ByteStrings. - Taru Karttunen

I'm using UCS-4, with uint32_t.
On Sat, Jan 23, 2010 at 03:43, Taru Karttunen
Excerpts from John Millikin's message of Fri Jan 22 19:40:58 +0200 2010:
Correct me if I'm wrong, but ByteStrings can't contain non-ASCII values, right? I'm looking for something like this pseudo-C:
typedef void (*Callback)(const uint32_t *chars, size_t n_chars, void *); WriterState *new_state (Callback, void *);
I tried using the Text type, but its conversions to Ptr Word16 are all O(n) -- not much better than String.
Are you using unicode on the C side with wchar_t?
You can have utf-8 inside ByteStrings.
- Taru Karttunen _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

The following snippet of code ran ~ 33% faster than yours on my computer (GHC 6.10.4, OSX):
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy as L
import System.IO
null_str = S.pack "null"
main = withBinaryFile "out2.json" WriteMode $ \h -> do
hPutStr h "["
L.hPutStr h . L.fromChunks . replicate 5000000 $ null_str
hPutStr h "]"
The following snippet ran in roughly the same speed (which should not be surprising, as it is essentially the same code):
import Control.Monad
import qualified Data.ByteString.Char8 as S
import System.IO
null_str = S.pack "null"
main = withBinaryFile "out3.json" WriteMode $ \h -> do
hPutStr h "["
replicateM_ 5000000 (S.hPutStr h null_str)
hPutStr h "]"
This C snippet ran ~ 6 times faster than the Haskell snippets:
#include
Recently I've been working on a library for generating large JSON[1] documents quickly. Originally I started writing it in Haskell, but quickly encountered performance problems. After exhausting my (meager) supply of optimization ideas, I rewrote some of it in C, with dramatic results. Namely, the C solution is
* 7.5 times faster than the fastest Haskell I could write (both using raw pointer arrays) * 14 times faster than a somewhat functional version (uses monads, but no explicit IO) * >30 times faster than fancy functional solutions with iteratees, streams, etc
I'm wondering if string processing is simply a Haskell weak point, performance-wise. The problem involves many millions of very small (<10 character, usually) strings -- the C solution can copy directly from string literals into a fixed buffer and flush it occasionally, while even the fastest Haskell version has a lot of overhead from copying around arrays.
Dons suggested I was "doing it wrong", so I'm posting on -cafe in the hopes that somebody can tell me how to get better performance without resorting to C.
Here's the fastest Haskell version I could come up with. It discards all error handling, validation, and correctness in the name of performance, but still can't get anywhere near C: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
[1] http://json.org/ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

It seems to me this indicates that the big expense here is the call into the I/O system.
So let's make fewer I/O calls: import Control.Monad import qualified Data.ByteString.Char8 as S import System.IO null_str1 = S.concat $ take 1000 $ repeat $ S.pack "null" n1 = 5000000 `div` 1000 main = withBinaryFile "out3.json" WriteMode $ \h -> do hPutStr h "[" replicateM_ n1 (S.hPutStr h null_str1) hPutStr h "]" --- this is 10x faster. Whether this is cheating or not depends on what John actually wants to do. Tom

There's no such thing as "cheating", though that particular code won't
work for my purposes because it assumes the output is merely a stream
of "null". Fine for the benchmark, but not extractable to the full
problem.
I wonder: is Handle known to be particularly slow? This code only has
to work on Linux and BSD, so if using (for example) a POSIX fd would
be much faster, it could bring the Haskell version much closer to C.
On Fri, Jan 22, 2010 at 07:30, Tom Nielsen
It seems to me this indicates that the big expense here is the call into the I/O system.
So let's make fewer I/O calls:
import Control.Monad import qualified Data.ByteString.Char8 as S import System.IO
null_str1 = S.concat $ take 1000 $ repeat $ S.pack "null"
n1 = 5000000 `div` 1000
main = withBinaryFile "out3.json" WriteMode $ \h -> do hPutStr h "[" replicateM_ n1 (S.hPutStr h null_str1) hPutStr h "]" --- this is 10x faster. Whether this is cheating or not depends on what John actually wants to do.
Tom

jmillikin:
There's no such thing as "cheating", though that particular code won't work for my purposes because it assumes the output is merely a stream of "null". Fine for the benchmark, but not extractable to the full problem.
I wonder: is Handle known to be particularly slow? This code only has to work on Linux and BSD, so if using (for example) a POSIX fd would be much faster, it could bring the Haskell version much closer to C.
Just make sure you're using the same data types and IO methods as in C, and you'll get the same performance. For serializing/writing to packed data, Data.Binary or cereal are a good choice for building bytestrings efficiently, which in turn can be output quickly via bytestring IO. -- Don

I would say that counts as cheating because it assumes that knowledge of the input in advance. However, I wonder how it would perform if there were a "reChunk" function that lazily built a new lazy ByteString by merging smaller chunks together --- i.e., it would keep pullings chunks from the ByteString until it reached some threshold size, merge them into a single strict ByteString chunk, and then recursively continue processing the rest of the lazy ByteString in this manner. Cheers, Greg On Jan 22, 2010, at 7:30 AM, Tom Nielsen wrote:
It seems to me this indicates that the big expense here is the call into the I/O system.
So let's make fewer I/O calls:
import Control.Monad import qualified Data.ByteString.Char8 as S import System.IO
null_str1 = S.concat $ take 1000 $ repeat $ S.pack "null"
n1 = 5000000 `div` 1000
main = withBinaryFile "out3.json" WriteMode $ \h -> do hPutStr h "[" replicateM_ n1 (S.hPutStr h null_str1) hPutStr h "]" --- this is 10x faster. Whether this is cheating or not depends on what John actually wants to do.
Tom

Ironically, there's a TODO comment about that in the source of
Data.ByteString.Lazy, just below 'copy':
http://hackage.haskell.org/packages/archive/bytestring/0.9.0.4/doc/html/src/...
-- TODO defrag func that concatenates block together that are below a threshold
-- defrag :: ByteString -> ByteString
2010/1/23 Gregory Crosswhite
I would say that counts as cheating because it assumes that knowledge of the input in advance. However, I wonder how it would perform if there were a "reChunk" function that lazily built a new lazy ByteString by merging smaller chunks together --- i.e., it would keep pullings chunks from the ByteString until it reached some threshold size, merge them into a single strict ByteString chunk, and then recursively continue processing the rest of the lazy ByteString in this manner.
Cheers, Greg
On Jan 22, 2010, at 7:30 AM, Tom Nielsen wrote:
It seems to me this indicates that the big expense here is the call into the I/O system.
So let's make fewer I/O calls:
import Control.Monad import qualified Data.ByteString.Char8 as S import System.IO
null_str1 = S.concat $ take 1000 $ repeat $ S.pack "null"
n1 = 5000000 `div` 1000
main = withBinaryFile "out3.json" WriteMode $ \h -> do hPutStr h "[" replicateM_ n1 (S.hPutStr h null_str1) hPutStr h "]" --- this is 10x faster. Whether this is cheating or not depends on what John actually wants to do.
Tom
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Web IR developer, market.yandex.ru
participants (7)
-
Don Stewart
-
Eugene Kirpichov
-
Gregory Crosswhite
-
Heinrich Apfelmus
-
John Millikin
-
Taru Karttunen
-
Tom Nielsen