
You might consider bypassing the Handle interface and going to the bare metal using the Posix library, which will cut down on the overhead in openFile.
That's what I was fearing. Is the conversion from Haskell Strings to C strings a performance problem?
Haskell Strings are a common performance bottleneck; for example when serving files in the Haskell web server I avoided the conversion to Haskell Strings altogether by reading/writing arrays of bytes (see the paper for details). But it sounds like in your case you need to open lots of (small?) files. What do you do with the contents of the files? Cheers, Simon

"Simon Marlow"
You might consider bypassing the Handle interface and going to the bare metal using the Posix library, which will cut down on the overhead in openFile.
That's what I was fearing. Is the conversion from Haskell Strings to C strings a performance problem?
Haskell Strings are a common performance bottleneck; for example when serving files in the Haskell web server I avoided the conversion to Haskell Strings altogether by reading/writing arrays of bytes (see the paper for details).
I was curious to see if this is also the case here. Therefore I just pasted the GHC implementation of openFile into Peter's suspicious module ('openFile' obtained from http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/base/GHC/Handle.... hope this was the right one?) to be able to also profile the GHC internal openfile code. Here are the relevant parts of the resulting output of the profiler: COST CENTRE MODULE %time %alloc withCString' MailStore 39.1 19.7 f1 MailStore 26.1 40.9 f9 MailStore 21.7 8.8 getBuffer MailStore 4.3 0.1 f6.2 MailStore 4.3 4.0 f6 MailStore 4.3 2.3 f6.3 MailStore 0.0 1.6 allocateBuffer MailStore 0.0 19.4 ... COST CENTRE MODULE no. entries %time %alloc %time %alloc f6.1 MailStore 361 0 0.0 0.1 43.5 41.2 openFile MailStore 362 1154 0.0 0.1 43.5 41.1 openFile' MailStore 365 1154 0.0 0.0 43.5 40.9 withCString' MailStore 367 0 39.1 19.7 39.1 19.7 openFd MailStore 371 1154 0.0 0.7 4.3 20.9 mkFileHandle MailStore 372 1154 0.0 0.3 4.3 20.2 initBufferState MailStore 387 1154 0.0 0.0 0.0 0.0 newFileHandle MailStore 376 1154 0.0 0.1 0.0 0.3 handleFinalizer MailStore 377 0 0.0 0.1 0.0 0.2 flushWriteBufferOnly MailStore 389 1154 0.0 0.0 0.0 0.0 getBuffer MailStore 373 1154 4.3 0.1 4.3 19.6 allocateBuffer MailStore 374 1154 0.0 19.4 0.0 19.5 newEmptyBuffer MailStore 375 0 0.0 0.1 0.0 0.1 ... The cost centre "f6.1" is the location of the recurring call of "openFile". As you can see almost all of the time is spent in the function "withCString" translating Haskell strings representing the file names to the C representation. I knew that Haskell strings are bad, but I really did not expect them to cause such a huge time penalty ... Cheers, Matthias -- Matthias Neubauer | Universität Freiburg, Institut für Informatik | tel +49 761 203 8060 Georges-Köhler-Allee 79, 79110 Freiburg i. Br., Germany | fax +49 761 203 8052
participants (2)
-
Matthias Neubauer
-
Simon Marlow