
Paul Moore wrote:
Haskell handles this with laziness. The canonical example is counting characters in a file, where you just grab the whole file, and use length. An imperative programmer's intuition says that this wastes huge amounts of memory compared to reading character by character and incrementing a count. Lazy I/O means that no more than 1 character needs to be in RAM at any one time, without the programmer need to do the bookkeeping.
Indeed, I had *this* conversation with Mr C++ as well... He proudly showed off a 3-page alphabet soup of C++ which allows him to do bit-level processing of a file as if it's really a collection of bits. And I said that in my program, I just grab a list of bytes and convert it into a list of bits. And he was like "wow - that's going to waste a heck of a lot of RAM..." But using the magic of getContents... actually no, it isn't. ;-)
If lazy I/O was publicised in this way, as separation of concerns (I/O and processing) with the compiler and language handling the work of minimising memory use and avoiding unnecessary I/O, then maybe the message might get through better. However, the only article I've ever seen taking this approach (http://blogs.nubgames.com/code/?p=22) didn't seem to get a good reception in the Haskell community, sparking comments that hGetContents and similar functions had a number of issues which made them "bad practice". The result was to leave me with a feeling that separating I/O and processing in Haskell really was hard, but I never quite understood why...
So I guess that leaves me with the question: is separating I/O and processing really the right thing to do (in terms of memory usage and performance) in Haskell, and if so, why isn't it advertised more?
It's something I use all the time... Of course, as soon as you want to scan the data *twice*... well, if you do it in the obvious way, the GC system will hold who knows how many MB (or even GB) of data in RAM ready for you to scan it the second time. I have a vague recollection of somebody muttering something about ByteStrings and memory-mapped files...?