
On Sunday 15 July 2007, Paul Moore wrote:
On 15/07/07, Andrew Coppin
wrote: I guess because in most normal programming languages you can do I/O anywhere you damn like, it doesn't occur to most programmers that it's possible to make a seperation. (Most seem to realise that, e.g., mixing business logic with GUI code is a Bad Thing though...)
Hmm, I would speculate (I have no hard data, in other words...) that it's more the case that in imperative languages, you do I/O throughout the program, because that defers the I/O (which is slow) to the last possible moment, and it allows you to reuse memory buffers.
People's intuition about performance and memory usage says that delaying I/O is good, and "separating" I/O and logic (which is taken to mean slurping data in all at once, and then processing it) is memory intensive and risks doing unnecessary I/O.
Haskell handles this with laziness. The canonical example is counting characters in a file, where you just grab the whole file, and use length. An imperative programmer's intuition says that this wastes huge amounts of memory compared to reading character by character and incrementing a count. Lazy I/O means that no more than 1 character needs to be in RAM at any one time, without the programmer need to do the bookkeeping.
If lazy I/O was publicised in this way, as separation of concerns (I/O and processing) with the compiler and language handling the work of minimising memory use and avoiding unnecessary I/O, then maybe the message might get through better. However, the only article I've ever seen taking this approach (http://blogs.nubgames.com/code/?p=22) didn't seem to get a good reception in the Haskell community, sparking comments that hGetContents and similar functions had a number of issues which made them "bad practice". The result was to leave me with a feeling that separating I/O and processing in Haskell really was hard, but I never quite understood why...
Because hGetContents only buys you laziness /if you use it lazily/. And laziness is, technically, a denotational property, but it is a very operational-feeling denotational property. And operational reasoning is difficult in imperative languages and gets really, really hard in lazy functional languages. And the article you cite falls flat on its face in trying to be lazy:
readWithIncludes :: String -> IO [String] readWithIncludes f = do s <- readFile f ss <- mapM expandIncludes (lines s) return (concat ss)
expandIncludes :: String -> IO [String] expandIncludes s = if isInclude s then readWithIncludes (includeFile s) else return [s]
That's calling mapM, a strict function, on the result of lines ss --- an arbitrarily long list. More generally, I suspect the Haskell community has a collective memory of stream I/O, back when this sort of thing used to be /really, really important/, because your program had type [Response] -> [Request] and if it wasn't lazy enough in its argument, you'd get a deadlock --- and that deadlock had nothing whatsoever to do with the result of applying your function to total arguments, so reasoning about it required abandoning every Haskeller's instinct to reason about functions only over total (or even finite total) arguments. interact takes a function with a type eerily similar to [Response] -> [Request], which means its argument has all the same problems. Laziness is great and everything --- but it's a lot of work, even in Haskell.
So I guess that leaves me with the question: is separating I/O and processing really the right thing to do (in terms of memory usage and performance) in Haskell, and if so, why isn't it advertised more? (And for extra credit, please explain why the article I quoted above didn't make more of an impact in the Haskell community... :-))
Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs