[Haskell-cafe] Re: Processing of large files

3 Nov 2004


      On 2004-11-02, Peter Simons  wrote:
...
John Goerzen writes:
...
...
Read and process the file in blocks:
...
I don't think that would really save much memory [...]
Given that the block-oriented approach has constant space
requirements, I am fairly confident it would save memory.
Perhaps a bit, but not a significant amount.
...
...
and in fact, would likely just make the code a lot more
complex. It seems like a simple wrapper around
hGetContents over a file that uses block buffering would
suffice.
Either your algorithm can process the input in blocks or it
cannot. If it can, it doesn't make one bit a difference if
you do I/O in blocks, because your algorithm processes
blocks anyway. If your algorithm is *not* capable of
Yes it does.  If you don't set block buffering, GHC will call read()
separately for *every* single character.  (I've straced stuff!)  This is
a huge performance penalty for large files.  It's a lot more efficient
if you set block buffering in your input, even if you are using interact
and lines or words to process it.

-- John