Re: [Haskell-cafe] Haskell & monads for newbies

15 Jul 2007

      On Sunday 15 July 2007, Paul Moore wrote:
...
On 15/07/07, Andrew Coppin  wrote:
...
I guess because in most normal programming languages you can do I/O
anywhere you damn like, it doesn't occur to most programmers that it's
possible to make a seperation. (Most seem to realise that, e.g., mixing
business logic with GUI code is a Bad Thing though...)
Hmm, I would speculate (I have no hard data, in other words...) that
it's more the case that in imperative languages, you do I/O throughout
the program, because that defers the I/O (which is slow) to the last
possible moment, and it allows you to reuse memory buffers.
People's intuition about performance and memory usage says that
delaying I/O is good, and "separating" I/O and logic (which is taken
to mean slurping data in all at once, and then processing it) is
memory intensive and risks doing unnecessary I/O.
Haskell handles this with laziness. The canonical example is counting
characters in a file, where you just grab the whole file, and use
length. An imperative programmer's intuition says that this wastes
huge amounts of memory compared to reading character by character and
incrementing a count. Lazy I/O means that no more than 1 character
needs to be in RAM at any one time, without the programmer need to do
the bookkeeping.
If lazy I/O was publicised in this way, as separation of concerns (I/O
and processing) with the compiler and language handling the work of
minimising memory use and avoiding unnecessary I/O, then maybe the
message might get through better. However, the only article I've ever
seen taking this approach (http://blogs.nubgames.com/code/?p=22)
didn't seem to get a good reception in the Haskell community, sparking
comments that hGetContents and similar functions had a number of
issues which made them "bad practice". The result was to leave me with
a feeling that separating I/O and processing in Haskell really was
hard, but I never quite understood why...
Because hGetContents only buys you laziness /if you use it lazily/.  And 
laziness is, technically, a denotational property, but it is a very 
operational-feeling denotational property.  And operational reasoning is 
difficult in imperative languages and gets really, really hard in lazy 
functional languages.  And the article you cite falls flat on its face in 
trying to be lazy:
...
readWithIncludes :: String -> IO [String]
readWithIncludes f = do
  s <- readFile f
  ss <- mapM expandIncludes (lines s)
  return (concat ss)
...
expandIncludes :: String -> IO [String]
expandIncludes s =
  if isInclude s
    then
      readWithIncludes (includeFile s)
    else
      return [s]
That's calling mapM, a strict function, on the result of lines ss --- an 
arbitrarily long list.

More generally, I suspect the Haskell community has a collective memory of 
stream I/O, back when this sort of thing used to be /really, really 
important/, because your program had type [Response] -> [Request] and if it 
wasn't lazy enough in its argument, you'd get a deadlock --- and that 
deadlock had nothing whatsoever to do with the result of applying your 
function to total arguments, so reasoning about it required abandoning every 
Haskeller's instinct to reason about functions only over total (or even 
finite total) arguments.  interact takes a function with a type eerily 
similar to [Response] -> [Request], which means its argument has all the same 
problems.  Laziness is great and everything --- but it's a lot of work, even 
in Haskell.
...
So I guess that leaves me with the question: is separating I/O and
processing really the right thing to do (in terms of memory usage and
performance) in Haskell, and if so, why isn't it advertised more? (And
for extra credit, please explain why the article I quoted above didn't
make more of an impact in the Haskell community... :-))
Jonathan Cast
http://sourceforge.net/projects/fid-core
http://sourceforge.net/projects/fid-emacs