I'm looking into high-performance I/O, particularly on a tmpfs (in-memory) filesystem. This involves creating lots of little files. Unfortunately, it seems that Haskell's performance in this area is not comparable to that of C. I assume that this is because of the overhead involved in opening and closing files. Some cursory profiling confirmed this: most of the runtime of the program is in taken by openFile, hPutStr, and hClose.
I thought that it might be faster to call the C library functions exposed as foreign imports in System.Posix.Internals, and thereby cut out some of Haskell's overhead. This indeed improved performance, but the program is still nearly twice as slow as the corresponding C program.
I took some benchmarks. I wrote a program to create 500.000 files on a tmpfs filesystem, and write an integer into each of them. I did this in C, using the open; and twice in Haskell, using openFile and c_open. Here are the results:
C program, using open and friends (gcc 4.4.3)
real 0m4.614s
user 0m0.380s
sys 0m4.200s
Haskell, using System.IO.openFile and friends (ghc 7.4.2)
real 0m14.892s
user 0m7.700s
sys 0m6.890s
Haskell, using System.Posix.Internals.c_open and friends (ghc 7.4.2)
real 0m7.372s
user 0m2.390s
sys 0m4.570s
Why question is: why is this so slow? Could the culprit be the marshaling necessary to pass the parameters to the foreign functions? If I'm calling the low-level function c_open anyway, shouldn't performance be closer to C? Does anyone have suggestions for how to improve this?
If anyone is interested, I can provide the code I used for these benchmarks.