
Using ByteStrings and the C calls does indeed speed things up a bit, but not much. real 0m6.053suser 0m1.480ssys 0m4.550s For your interest:The original version (with Strings and openFile): http://hpaste.org/73803Faster (with Strings and c_open): http://hpaste.org/73802Even faster (with ByteStrings and c_open): http://hpaste.org/73801 The problem may be that even with ByteStrings, we are stuck using show, and thus Strings, at some point. Ideas?
From: johan.tibell@gmail.com Date: Mon, 27 Aug 2012 13:48:27 -0700 Subject: Re: I/O overhead in opening and writing files To: arc38813@hotmail.com CC: glasgow-haskell-users@haskell.org
On Mon, Aug 27, 2012 at 1:43 PM, J Baptist
wrote: I'm looking into high-performance I/O, particularly on a tmpfs (in-memory) filesystem. This involves creating lots of little files. Unfortunately, it seems that Haskell's performance in this area is not comparable to that of C. I assume that this is because of the overhead involved in opening and closing files. Some cursory profiling confirmed this: most of the runtime of the program is in taken by openFile, hPutStr, and hClose.
I thought that it might be faster to call the C library functions exposed as foreign imports in System.Posix.Internals, and thereby cut out some of Haskell's overhead. This indeed improved performance, but the program is still nearly twice as slow as the corresponding C program.
I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs filesystem, and write an integer into each of them. I did this in C, using the open; and twice in Haskell, using openFile and c_open. Here are the results:
C program, using open and friends (gcc 4.4.3) real 0m4.614s user 0m0.380s sys 0m4.200s
Haskell, using System.IO.openFile and friends (ghc 7.4.2) real 0m14.892s user 0m7.700s sys 0m6.890s
Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2) real 0m7.372s user 0m2.390s sys 0m4.570s
Why question is: why is this so slow? Could the culprit be the marshaling necessary to pass the parameters to the foreign functions? If I'm calling the low-level function c_open anyway, shouldn't performance be closer to C? Does anyone have suggestions for how to improve this?
If anyone is interested, I can provide the code I used for these benchmarks.
Please do. You can paste them at http://hpaste.org/
Could you try using the Data.ByteString API. I don't have the code in front of me so I don't know if the System.Posix API uses Strings. If it does, that's most likely the issue.
-- Johan