
Hello all, I've been spending some time recently learning Haskell. I am not a newbie in programming, and I've had some experience with lisp in the past, but I have not really studied functional programming. I am mostly seeking feedback regarding style, and glaring mistakes that I may make in some toy examples, but I am not sure if this is the proper place to ask. In case it is though, here is an attempt (not yet complete) at the word count program: http://pastebin.com/JZw22QGA Thanks in advance! -- Panagiotis Koutsourakis

Hi, Panagiotis. The wc program is a good choice for a learning program, but it reveals a long-standing weakness in Haskell: Lazy-IO. I think you will find that your program will get an error if you try to run wc on many files. You will open too many files at once. There is a solution called IterateeIO. It's extremely elegant and very good Haskell, but quite challenging to understand. So far, I have not found a good tutorial. Luck for you the best example program is an implementation of wc! I posted it in hpaste.org. (A Haskell-specific version of pasteBin.)
Cheers, David ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com On Jun 24, 2011, at 7:48 AM, Panagiotis Koutsourakis wrote:
Hello all,
I've been spending some time recently learning Haskell. I am not a newbie in programming, and I've had some experience with lisp in the past, but I have not really studied functional programming. I am mostly seeking feedback regarding style, and glaring mistakes that I may make in some toy examples, but I am not sure if this is the proper place to ask.
In case it is though, here is an attempt (not yet complete) at the word count program:
Thanks in advance!
-- Panagiotis Koutsourakis
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Hi David,
Thanks very much for your reply. I'll take a look at IterateIO and I
didn't know about hpaste.org, so i'll check it out as well.
On Fri, Jun 24, 2011 at 2:43 PM, David Place
Hi, Panagiotis.
The wc program is a good choice for a learning program, but it reveals a long-standing weakness in Haskell: Lazy-IO. I think you will find that your program will get an error if you try to run wc on many files. You will open too many files at once.
There is a solution called IterateeIO. It's extremely elegant and very good Haskell, but quite challenging to understand. So far, I have not found a good tutorial.
Luck for you the best example program is an implementation of wc!
I posted it in hpaste.org. (A Haskell-specific version of pasteBin.)
Cheers, David
____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com
On Jun 24, 2011, at 7:48 AM, Panagiotis Koutsourakis wrote:
Hello all,
I've been spending some time recently learning Haskell. I am not a newbie in programming, and I've had some experience with lisp in the past, but I have not really studied functional programming. I am mostly seeking feedback regarding style, and glaring mistakes that I may make in some toy examples, but I am not sure if this is the proper place to ask.
In case it is though, here is an attempt (not yet complete) at the word count program:
Thanks in advance!
-- Panagiotis Koutsourakis
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- Panagiotis Koutsourakis

Hi David,
On Fri, 24 Jun 2011 08:43:59 -0400
David Place
Hi, Panagiotis.
The wc program is a good choice for a learning program, but it reveals a long-standing weakness in Haskell: Lazy-IO.
Is it really a weakness of Haskell or just something one has to get used to it? That is a question I would like to get opinions from the experts! It is for sure something a beginner stumbles upon. E.g. I (as a beginner) stumbled upon Lazy IO but the guys out here helped me a great deal to get over it, and I'm sure they will help me again if I'll stumble again (which might happen). -- Manfred

On Fri, Jun 24, 2011 at 11:10 AM, Manfred Lotz
Hi David,
On Fri, 24 Jun 2011 08:43:59 -0400 David Place
wrote: Hi, Panagiotis.
The wc program is a good choice for a learning program, but it reveals a long-standing weakness in Haskell: Lazy-IO.
Is it really a weakness of Haskell or just something one has to get used to it? That is a question I would like to get opinions from the experts!
No, the only weakness (imo) with lazy io is a beginners understanding of its implications. There are options, most notably iteratees. Lazy IO, much like lazy evaluation, are definitely different than what you might expect from other languages, but I think it's a huge mistake, especially when you are just starting out, to categorize them as weaknesses.

Is it really a weakness of Haskell or just something one has to get used to it? That is a question I would like to get opinions from the experts!
The problem is that everything you call out to may or may not be lazy, and there is really no way to know it. For a programming language that made great strides in safety in many other areas, this seems to me like a huge oversight, perhaps even the number one mistake. (I'm not an expert, but I've been with the language for a decade and have completed a few Real Projects in it) --vk

On Jun 24, 2011, at 2:10 PM, Manfred Lotz wrote:
Is it really a weakness of Haskell or just something one has to get used to it? That is a question I would like to get opinions from the experts!
I would say that it is a weakness in the original handling of IO in Haskell. The non-determinism of Lazy IO is just plain bad. The solution doesn't require any extensions, though. It's just a different way of programming in Haskell.
____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
Hi, Panagiotis.
The wc program is a good choice for a learning program, but it reveals a long-standing weakness in Haskell: Lazy-IO. I think you will find that your program will get an error if you try to run wc on many files. You will open too many files at once.
There is a solution called IterateeIO. It's extremely elegant and very good Haskell, but quite challenging to understand. So far, I have not found a good tutorial.
Personally, I think that lazy IO is not a weakness and works extremely well for prototyping. In contrast, I find IterateeIO to be ugly and non-idiomatic Haskell, since it exposes implementation details and is not built on an algebraic "combinators & laws" approach. Being challenging to understand is not a substitute for beauty. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Jun 25, 2011, at 5:39 AM, Heinrich Apfelmus wrote:
Personally, I think that lazy IO is not a weakness and works extremely well for prototyping. In contrast, I find IterateeIO to be ugly and non-idiomatic Haskell, since it exposes implementation details and is not built on an algebraic "combinators & laws" approach. Being challenging to understand is not a substitute for beauty.
Interesting, I appreciate your point of view. My experience with Lazy IO is that its non-detereminism breaks down the algebraic/equational intuition. In order to manage handle resources, I must have in mind too many operational details. I feel that iteratee IO encapsulates these in nice compositional combinators. I agree that it's not very beautiful, though. My feeling is that it is not quite finished. Perhaps there are a couple of more insights that will make it really nice.

Interesting, I appreciate your point of view. My experience with Lazy IO is that its non-detereminism breaks down the algebraic/equational intuition. In order to manage handle resources, I must have in mind too many operational details. I feel that iteratee IO encapsulates these in nice compositional combinators. I agree that it's not very beautiful, though. My feeling is that it is not quite finished. Perhaps there are a couple of more insights that will make it really nice.
I think laziness (IO and otherwise) would be perfect if only it was visible somehow. Perhaps it could be signified by a type in some future Haskell? --vk

On Jun 25, 2011, at 10:14 AM, Vesa Kaihlavirta wrote:
I think laziness (IO and otherwise) would be perfect if only it was visible somehow. Perhaps it could be signified by a type in some future Haskell?
Laziness is visible in Haskell. Everything is lazy by default. Anything that is strict has a separate annotation.

David Place wrote:
Heinrich Apfelmus wrote:
Personally, I think that lazy IO is not a weakness and works extremely well for prototyping. In contrast, I find IterateeIO to be ugly and non-idiomatic Haskell, since it exposes implementation details and is not built on an algebraic "combinators & laws" approach. Being challenging to understand is not a substitute for beauty.
Interesting, I appreciate your point of view. My experience with Lazy IO is that its non-determinism breaks down the algebraic/equational intuition. In order to manage handle resources, I must have in mind too many operational details.
Well, one could say that it's the fault of the operating system for not doing garbage collection on file handles (c.f. Oberon).
I feel that iteratee IO encapsulates these in nice compositional combinators.
What about the combinator withFile :: FilePath -> (String -> a) -> IO a withFile name f = bracket (openFile name ReadMode) hClose $ \h -> evaluate . f =<< hGetContents h ? It gives you the same thing as Iteratees - a way to apply a function to the contents of a file - without the need to rewrite all the existing list functions like map , lines , words , and so on.
I agree that it's not very beautiful, though. My feeling is that it is not quite finished. Perhaps there are a couple of more insights that will make it really nice.
I'm all for beauty in programming. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Jun 26, 2011, at 8:25 AM, Heinrich Apfelmus wrote:
What about the combinator
withFile :: FilePath -> (String -> a) -> IO a withFile name f = bracket (openFile name ReadMode) hClose $ \h -> evaluate . f =<< hGetContents h
? It gives you the same thing as Iteratees - a way to apply a function to the contents of a file - without the need to rewrite all the existing list functions like map , lines , words , and so on.
How would you, for instance, implement the program for counting all the words in a list of files that Oleg describes in his message?
Nested calls to withFile would require too many open handles. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

On Sun, 26 Jun 2011 09:27:47 -0400
David Place
On Jun 26, 2011, at 8:25 AM, Heinrich Apfelmus wrote:
What about the combinator
withFile :: FilePath -> (String -> a) -> IO a withFile name f = bracket (openFile name ReadMode) hClose $ \h -> evaluate . f =<< hGetContents h
? It gives you the same thing as Iteratees - a way to apply a function to the contents of a file - without the need to rewrite all the existing list functions like map , lines , words , and so on.
When I stumbled upon lazy IO newbie-wise I was pointed to withFile resp. bracket by Daniel Fischer and now that I know how to do it it seems fine to me. It also alerted me to pay more attention to lazyness as this is a Haskell immanent thingie.
How would you, for instance, implement the program for counting all the words in a list of files that Oleg describes in his message?
I'm not quite sure if it boils down to the same problem structure-wise but I had to scan thru some 4000 small xml files, and I did it like this (after Daniel lifted me up...): import qualified System.IO.UTF8 as U ... dt <- scanDir ccbd let xmlfiles = filter (\x -> "xml" `isSuffixOf` x) dt -- insert xml contents into two maps s, and ht. (s,ht) <- foldM insertXml (M.empty,M.empty) xmlfiles ... and insertXml (stat, m) xf = U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle ... some code... return $! (stat',m')) and getXmlContent xf inh = do xml <- U.hGetContents inh let content = parseXMLDoc xml ... -- Manfred

On Jun 26, 2011, at 1:29 PM, Manfred Lotz wrote:
When I stumbled upon lazy IO newbie-wise I was pointed to withFile resp. bracket by Daniel Fischer and now that I know how to do it it seems fine to me. It also alerted me to pay more attention to lazyness as this is a Haskell immanent thingie.
Of course, it is possible to use hGetContents with withFile. You can still get into trouble because hGetContents is unsafe. Beginners get tripped up trying to do something like the following getting unexpected results. (i remember I did.) print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents So, you have to do this keeping in mind a rather procedural model of the evaluation of the lazy data structures. I feel this is not very declarative or intuitive. print10' = do h <- openFile "/usr/share/dict/words" ReadMode contents <- hGetContents h print $ take 10 contents hClose h Iteratee IO provides a declarative way to do this that is safe, compositional and efficient. But not yet very pretty. In haskell cafe, John Lato says that he is working on that. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

On Sun, 26 Jun 2011 15:02:07 -0400
David Place
On Jun 26, 2011, at 1:29 PM, Manfred Lotz wrote:
When I stumbled upon lazy IO newbie-wise I was pointed to withFile resp. bracket by Daniel Fischer and now that I know how to do it it seems fine to me. It also alerted me to pay more attention to lazyness as this is a Haskell immanent thingie.
Of course, it is possible to use hGetContents with withFile. You can still get into trouble because hGetContents is unsafe. Beginners get tripped up trying to do something like the following getting unexpected results. (i remember I did.)
print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents
So, you have to do this keeping in mind a rather procedural model of the evaluation of the lazy data structures. I feel this is not very declarative or intuitive.
print10' = do h <- openFile "/usr/share/dict/words" ReadMode contents <- hGetContents h print $ take 10 contents hClose h
Iteratee IO provides a declarative way to do this that is safe, compositional and efficient. But not yet very pretty. In haskell cafe, John Lato says that he is working on that.
Thanks for showing this. I have to admit that I'm as a beginner not in a position to judge about the merits of Iteratee IO versus the standard way. Just wanted to point out that the way I work thru the XML files in my particular task seems to work fine. I will watch what happens to Iteratee IO. Is there any problem in the code snippet I pasted? If so I would like to get a hint of course. -- Thanks, Manfred

On Jun 26, 2011, at 4:59 PM, Manfred Lotz wrote:
Is there any problem in the code snippet I pasted? If so I would like to get a hint of course.
There really isn't enough code in your snippet to be able to say. The rule is that if you return a lazy data structure from withFile you may find the file has been closed before you have read the data. If your parseXMLDoc is strict then you will get away with it. I feel that isn't very nice. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

On Sun, 26 Jun 2011 17:32:02 -0400
David Place
On Jun 26, 2011, at 4:59 PM, Manfred Lotz wrote:
Is there any problem in the code snippet I pasted? If so I would like to get a hint of course.
There really isn't enough code in your snippet to be able to say. The rule is that if you return a lazy data structure from withFile you may find the file has been closed before you have read the data. If your parseXMLDoc is strict then you will get away with it. I feel that isn't very nice.
What I had learned (and it seems to work fine) is to force the result which gets done here in return $! insertXml (stat, m) xf = U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle ... some code... return $! (stat',m')) So I hope that is the one and important recipe to add when using withBinaryFile. -- Manfred

On Mon, Jun 27, 2011 at 11:57 AM, Manfred Lotz
On Sun, 26 Jun 2011 17:32:02 -0400 David Place
wrote: On Jun 26, 2011, at 4:59 PM, Manfred Lotz wrote:
Is there any problem in the code snippet I pasted? If so I would like to get a hint of course.
There really isn't enough code in your snippet to be able to say. The rule is that if you return a lazy data structure from withFile you may find the file has been closed before you have read the data. If your parseXMLDoc is strict then you will get away with it. I feel that isn't very nice.
What I had learned (and it seems to work fine) is to force the result which gets done here in return $!
insertXml (stat, m) xf = U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle ... some code... return $! (stat',m'))
So I hope that is the one and important recipe to add when using withBinaryFile.
The operator ($!) isn't recursive, so all that it does is force the tuple you're returning, not the contents of the tuple. Internally it calls 'seq', which only forces its argument enough to determine which constructor it was called with. Sometimes this can be enough to do what you want, but not usually. And certainly not in this case. Antoine
-- Manfred
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Mon, 27 Jun 2011 12:39:42 +0900
Antoine Latter
On Mon, Jun 27, 2011 at 11:57 AM, Manfred Lotz
wrote: On Sun, 26 Jun 2011 17:32:02 -0400 David Place
wrote: On Jun 26, 2011, at 4:59 PM, Manfred Lotz wrote:
Is there any problem in the code snippet I pasted? If so I would like to get a hint of course.
There really isn't enough code in your snippet to be able to say. The rule is that if you return a lazy data structure from withFile you may find the file has been closed before you have read the data. If your parseXMLDoc is strict then you will get away with it. I feel that isn't very nice.
What I had learned (and it seems to work fine) is to force the result which gets done here in return $!
insertXml (stat, m) xf = U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle ... some code... return $! (stat',m'))
So I hope that is the one and important recipe to add when using withBinaryFile.
The operator ($!) isn't recursive, so all that it does is force the tuple you're returning, not the contents of the tuple.
Hmm, you are right.
Internally it calls 'seq', which only forces its argument enough to determine which constructor it was called with. Sometimes this can be enough to do what you want, but not usually. And certainly not in this case.
I tried now: getXmlContent :: String -> Handle -> IO Ctan getXmlContent xf inh = do xml <- U.hGetContents inh let content = xml `deepseq` parseXMLDoc xml case content of ... deepseq really ensures parseXmlDoc gets the full file as a string. It is unclear to me how this could be done without deepseq. -- Manfred

On Monday 27 June 2011, 11:15:13, Manfred Lotz wrote:
I tried now:
getXmlContent :: String -> Handle -> IO Ctan getXmlContent xf inh = do xml <- U.hGetContents inh let content = xml `deepseq` parseXMLDoc xml case content of ...
deepseq really ensures parseXmlDoc gets the full file as a string.
It is unclear to me how this could be done without deepseq.
To ensure that the entire file is read, you can seq on the length, xml <- U.hGetContents inh let content = parseXMLDoc xml length xml `seq` case content of ... that might be faster than deepseq (or it might make no difference, deepseq is cheap on Chars). In general, it is often good to avoid deepseq if that would traverse large data structures which are already evaluated (consider modifying a Map or something similar, if you force it with deepseq, each alteration is O(size), so it's worth some effort to find a way to only force the modified parts to have the alterations be O(log size)).

On Mon, 27 Jun 2011 11:49:37 +0200
Daniel Fischer
On Monday 27 June 2011, 11:15:13, Manfred Lotz wrote:
I tried now:
getXmlContent :: String -> Handle -> IO Ctan getXmlContent xf inh = do xml <- U.hGetContents inh let content = xml `deepseq` parseXMLDoc xml case content of ...
deepseq really ensures parseXmlDoc gets the full file as a string.
It is unclear to me how this could be done without deepseq.
To ensure that the entire file is read, you can seq on the length,
xml <- U.hGetContents inh let content = parseXMLDoc xml length xml `seq` case content of ...
Thanks a lot for the suggestion. Works fine. I only tried deepseq because my first try with seq let content = xml `seq` parseXMLDoc xml didn't work. -- Manfred

On Jun 27, 2011, at 9:58 AM, Manfred Lotz wrote:
To ensure that the entire file is read, you can seq on the length,
xml <- U.hGetContents inh let content = parseXMLDoc xml length xml `seq` case content of ...
Thanks a lot for the suggestion. Works fine.
I only tried deepseq because my first try with seq
let content = xml `seq` parseXMLDoc xml
didn't work.
You can call me squeamish, but using a pure function (length) for its side-effects seems kind of kludge-y. :-) ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

On Mon, 27 Jun 2011 10:51:03 -0400
David Place
On Jun 27, 2011, at 9:58 AM, Manfred Lotz wrote:
To ensure that the entire file is read, you can seq on the length,
xml <- U.hGetContents inh let content = parseXMLDoc xml length xml `seq` case content of ...
Thanks a lot for the suggestion. Works fine.
I only tried deepseq because my first try with seq
let content = xml `seq` parseXMLDoc xml
didn't work.
You can call me squeamish, but using a pure function (length) for its side-effects seems kind of kludge-y. :-)
Yes, I admit it looks strange. :-) -- Manfred

On Jun 27, 2011, at 9:58 AM, Manfred Lotz wrote:
Thanks a lot for the suggestion. Works fine.
I only tried deepseq because my first try with seq
let content = xml `seq` parseXMLDoc xml
didn't work.
Have you considered using ByteStrings for your application. There, hGetContents is strict. Also, you will benefit from more efficient string operations.

On Mon, 27 Jun 2011 11:35:24 -0400
David Place
On Jun 27, 2011, at 9:58 AM, Manfred Lotz wrote:
Thanks a lot for the suggestion. Works fine.
I only tried deepseq because my first try with seq
let content = xml `seq` parseXMLDoc xml
didn't work.
Have you considered using ByteStrings for your application. There, hGetContents is strict. Also, you will benefit from more efficient string operations.
Yes, I thought about it but didn't dig deeper because Text.XML.Light wants a string as input and I didn't find a conversion. But now I found toString in Text.XML.Light.UTF8 so I tried it. It doesn't make much difference in performance because the xml files are pretty small. However, using the strict functions makes the code a bit less esoteric. :-) -- Thanks, Manfred

On Jun 27, 2011, at 5:15 AM, Manfred Lotz wrote:
deepseq really ensures parseXmlDoc gets the full file as a string.
It is unclear to me how this could be done without deepseq.
When I started to be truly comfortable programming in Haskell, I found the constant desire to use deepseq just went away. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
On Jun 26, 2011, at 8:25 AM, Heinrich Apfelmus wrote:
What about the combinator
withFile :: FilePath -> (String -> a) -> IO a withFile name f = bracket (openFile name ReadMode) hClose $ \h -> evaluate . f =<< hGetContents h
? It gives you the same thing as Iteratees - a way to apply a function to the contents of a file - without the need to rewrite all the existing list functions like map , lines , words , and so on.
How would you, for instance, implement the program for counting all the words in a list of files that Oleg describes in his message?
Nested calls to withFile would require too many open handles.
Good point, but this actually shows that withFile should be even lazier. In particular: * The file should not be opened until the string is demanded. * The file should be closed as soon as the string has been demanded in full. As before, the idea is that resource usage is pushed into the operational semantics and file handles are treated as if they were ordinary lazy values. It's just that the operating system doesn't play along very well and demands explicit resource management. Again, let me stress that the biggest drawback of the Iteratee approach is that you have to rewrite the consumer and cannot reuse ordinary list functions like length , words , lines , and so on. But these functions are already lazy, so why not make use of this. (You don't lose anything, every Iteratee can be rewritten as an ordinary function String -> IO a by using `seq` in corresponding places.) Here one possibility for a lazier version of withFile' : -- an even lazier version of withFile withFile' :: FilePath -> (String -> IO a) -> IO a withFile' name f = do fin <- newIORef (return ()) let close = readIORef fin >>= id open = do putStrLn "open" h <- openFile name ReadMode writeIORef fin (do putStrLn "close"; hClose h) lazyRead h finally (unsafeInterleaveIO open >>= f >>= evaluate) close where lazyRead h = hIsEOF h >>= \b -> if b then do putStrLn "close"; hClose h; return [] else do c <- hGetChar h cs <- unsafeInterleaveIO $ lazyRead h return (c:cs) withFiles :: [FilePath] -> (String -> IO a) -> IO a withFiles [x] f = withFile' x f withFiles (x:xs) f = withFile' x $ \s -> let f' t = f (s ++ t) in withFiles xs f' test = withFiles (replicate 200 "Test.hs") (return . length) >>= print Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Heinrich Apfelmus wrote:
Again, let me stress that the biggest drawback of the Iteratee approach is that you have to rewrite the consumer and cannot reuse ordinary list functions like length , words , lines , and so on. But these functions are already lazy, so why not make use of this. (You don't lose anything, every Iteratee can be rewritten as an ordinary function String -> IO a by using `seq` in corresponding places.) Here one possibility for a lazier version of withFile' :
Let me just add my voice in support of Heinrich. There has been a lot of FUD lately about lazy IO being "unsafe" or even "evil". This is just not so. When used correctly, there is nothing unsafe about lazy IO. It's a matter of taste in the end, I suppose, but to me lazy IO is the most natural way of doing IO in Haskell. True, there are some subtleties. These stem from the fact that, like Iteratees and any other current approach, they involve superimposing operational semantics on top of Haskell. I find that code written using lazy IO is clearer, more natural, and easier to write than when written using Iteratees/Enumeratees, even in more complicated cases when you need to do some work to get the laziness to match up. I have great respect for Oleg's enlightening research on Iteratees. But for practical real-life programming, I choose lazy IO over Iteratees when I have the choice. Whether you are using lazy IO or Iteratees, we really need some better higher-level combinators with simpler semantics for more of the common use cases. Then it won't really matter. And yes, that is possible with Iteratees too. See for, example, Text.XML.Enumerator.Parse in Michael Snoyman's xml-enumerator package. (But he conspicuously omits the ugly type signatures in his sample usage. Those could use some wrapping.) Regards, Yitz

On Mon, 27 Jun 2011 12:54:51 +0300
Yitzchak Gale
Heinrich Apfelmus wrote:
Again, let me stress that the biggest drawback of the Iteratee approach is that you have to rewrite the consumer and cannot reuse ordinary list functions like length , words , lines , and so on. But these functions are already lazy, so why not make use of this. (You don't lose anything, every Iteratee can be rewritten as an ordinary function String -> IO a by using `seq` in corresponding places.) Here one possibility for a lazier version of withFile' :
Very nice. I hope you don't mind if I use it? I've been watching this thread closely, because one of my concerns for my first project is making sure the file IO gets handled right. It's processing a set of files in order (typical Unix command line stuff), and I came up with two requirements: 1) Only the file currently being processed should be open. 2) Don't read the entire file into memory at once. You seem to have hit both of them.
Let me just add my voice in support of Heinrich.
Ditto.
Whether you are using lazy IO or Iteratees, we really need some better higher-level combinators with simpler semantics for more of the common use cases. Then it won't really matter.
And that was my conclusion last night. Iteratees provide generalized
tools that don't seem to have obvious ways to do common
operations. While that kind of thing is needed, I suspect that over
90% of file processing (at least for command line arguments) could be
handled by things like Heinrich's withFiles and the obvious variants
of the HOF list functions, like:
fileMap :: (String -> a) -> [FilePath] -> [a]
fileFoldl' :: (a -> String -> a) -> a -> [FilePath] -> a
The user doesn't care whether it's using lazy IO or iteratees, so
long as it has the proper semantics.

Mike Meyer wrote:
Yitzchak Gale wrote:
Heinrich Apfelmus wrote:
Here one possibility for a lazier version of withFile' :
Very nice. I hope you don't mind if I use it?
Sure, go ahead. (I hereby put it into the public domain.)
Whether you are using lazy IO or Iteratees, we really need some better higher-level combinators with simpler semantics for more of the common use cases. Then it won't really matter.
And that was my conclusion last night. Iteratees provide generalized tools that don't seem to have obvious ways to do common operations. While that kind of thing is needed, I suspect that over 90% of file processing (at least for command line arguments) could be handled by things like Heinrich's withFiles and the obvious variants of the HOF list functions, like:
fileMap :: (String -> a) -> [FilePath] -> [a] fileFoldl' :: (a -> String -> a) -> a -> [FilePath] -> a
The user doesn't care whether it's using lazy IO or iteratees, so long as it has the proper semantics.
Indeed. It might be worthwhile to make a package on Haskell that provides these functions, or even to put them into the base libraries. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com On Jun 27, 2011, at 3:50 AM, Heinrich Apfelmus wrote:
Good point, but this actually shows that withFile should be even lazier. In particular:
* The file should not be opened until the string is demanded. * The file should be closed as soon as the string has been demanded in full.
Suppose the file is only partially demanded as in the case I quoted earlier. print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents The file would never be closed.

David Place wrote:
Suppose the file is only partially demanded as in the case I quoted earlier.
print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents
The file would never be closed.
I tried it - it does close the file. Thanks, Yitz

On Jun 27, 2011, at 8:33 AM, Yitzchak Gale wrote:
David Place wrote:
Suppose the file is only partially demanded as in the case I quoted earlier.
print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents
The file would never be closed.
I tried it - it does close the file.
Thanks, Yitz
II'm not sure what you mean. Did you create a new version of withFile that behaves as Heinrich suggested? If so, please post its definition. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
Suppose the file is only partially demanded as in the case I quoted earlier.
print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents
The file would never be closed.
I wrote:
I tried it - it does close the file.
II'm not sure what you mean. Did you create a new version of withFile that behaves as Heinrich suggested? If so, please post its definition.
I just loaded Heinrich's withFile' in GHCi and ran it: *WithFile> withFile' "/usr/share/dict/words" (print . take 10 . words) open ["A","A's","AOL","AOL's","Aachen","Aachen's","Aaliyah","Aaliyah's","Aaron","Aaron's"] close

On Jun 27, 2011, at 1:45 PM, Yitzchak Gale wrote:
David Place wrote:
Suppose the file is only partially demanded as in the case I quoted earlier.
print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents
The file would never be closed.
I wrote:
I tried it - it does close the file.
II'm not sure what you mean. Did you create a new version of withFile that behaves as Heinrich suggested? If so, please post its definition.
I just loaded Heinrich's withFile' in GHCi and ran it:
*WithFile> withFile' "/usr/share/dict/words" (print . take 10 . words) open ["A","A's","AOL","AOL's","Aachen","Aachen's","Aaliyah","Aaliyah's","Aaron","Aaron's"] close
Thanks. I understand now.

David Place wrote:
On Jun 27, 2011, at 3:50 AM, Heinrich Apfelmus wrote:
Good point, but this actually shows that withFile should be even lazier. In particular:
* The file should not be opened until the string is demanded. * The file should be closed as soon as the string has been demanded in full.
Suppose the file is only partially demanded as in the case I quoted earlier.
print10 = do contents <- withFile "/usr/share/dict/words" ReadMode (\h -> hGetContents h) print $ take 10 contents
The file would never be closed.
Note that my version of withFile has a different type and replaces hGetContents entirely. Regardless, in both cases the intention is that the file contents is evaluated completely *inside* the scope of withFile . In particular, a correct way to write your example would be print10 = withFile' "/usr/share/dict/words" $ print . take 10 (This is completely analogous to how Iteratees require you to process a file. The difference is that Iteratees expresses demand for the characters inside a file via >>= whereas lazy IO expresses demand via `seq` .) Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Jun 27, 2011, at 9:32 AM, Heinrich Apfelmus wrote:
Note that my version of withFile has a different type and replaces hGetContents entirely.
Regardless, in both cases the intention is that the file contents is evaluated completely *inside* the scope of withFile . In particular, a correct way to write your example would be
print10 = withFile' "/usr/share/dict/words" $ print . take 10
(This is completely analogous to how Iteratees require you to process a file. The difference is that Iteratees expresses demand for the characters inside a file via >>= whereas lazy IO expresses demand via `seq` .)
So, the type of your withFile would be withFile :: FilePath -> IOMode -> (Handle -> IO ()) -> IO () completely prohibiting return values from withFile? Sounds good. In that case, you don't need to change the behavior of withFile at all, just its type. It's hGetContents that bothers me. I have found that friends who I have convinced to learn Haskell always trip over this right away. They want to write a simple program that processes a file and immediately get tangled up with lazy IO and looking up the definition of deepseq. It's kind of embarrassing. Sadly, they never pick creating beautiful, incremental lazy algorithms with sharing as their "Hello World" projects. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
It's hGetContents that bothers me. I have found that friends who I have convinced to learn Haskell always trip over this right away. They want to write a simple program that processes a file and immediately get tangled up with lazy IO and looking up the definition of deepseq. It's kind of embarrassing.
There's nothing to be embarrassed about. Haskell is lazy by default. That is certainly much different than what most people are used to. So it needs to be explained to beginners. It's the same for IO laziness as for pure laziness. I'd much rather explain that, which fits in with the whole style of Haskell, than to have to explain Iteratees to a newcomer. In any case, you can't expect the switch from Python to Haskell to go nearly as smoothly as the switch from Perl to Python. The paradigm shift is much bigger. -Yitz

On Jun 27, 2011, at 1:55 PM, Yitzchak Gale wrote:
There's nothing to be embarrassed about. Haskell is lazy by default. That is certainly much different than what most people are used to. So it needs to be explained to beginners. It's the same for IO laziness as for pure laziness.
:-) It's not the difficulty of understanding laziness that I find embarrassing. My Ph.D Computer Science friends that I have introduced Haskell to understand that very well. It's their "You've got to be kidding" looks when I tell them they have to call 'length' for effect or some such to get their simple programs to work. They understand Lazy IO and think that it is broken. :-) ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
So, the type of your withFile would be
withFile :: FilePath -> IOMode -> (Handle -> IO ()) -> IO ()
completely prohibiting return values from withFile? Sounds good. In that case, you don't need to change the behavior of withFile at all, just its type.
Actually, the type is withFile :: FilePath -> (String -> IO a) -> IO a There is still a return value, but withFile makes sure that it's evaluated to WHNF. If you choose a = (), then you're fine, otherwise it's not entirely foolproof.
It's hGetContents that bothers me. I have found that friends who I have convinced to learn Haskell always trip over this right away. They want to write a simple program that processes a file and immediately get tangled up with lazy IO and looking up the definition of deepseq. It's kind of embarrassing. Sadly, they never pick creating beautiful, incremental lazy algorithms with sharing as their "Hello World" projects.
That's why I usually only teach readFile and writeFile in the beginning. By the way, from a Haskell point of view, it's the operating system that is being unreasonable here, not the other way round. :D Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Jun 28, 2011, at 8:39 AM, Heinrich Apfelmus wrote:
There is still a return value, but withFile makes sure that it's evaluated to WHNF. If you choose a = (), then you're fine, otherwise it's not entirely foolproof.
Yes, I was wondering about this. Since it is still possible to get in trouble in exactly the same way, why is this an improvement over withFile? If you return () from withFile, you have the same benefit. ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
On Jun 28, 2011, at 8:39 AM, Heinrich Apfelmus wrote:
There is still a return value, but withFile makes sure that it's evaluated to WHNF. If you choose a = (), then you're fine, otherwise it's not entirely foolproof.
Yes, I was wondering about this. Since it is still possible to get in trouble in exactly the same way, why is this an improvement over withFile? If you return () from withFile, you have the same benefit.
The improvement is that my variant of withFile absorbs a few more moving parts, namely calling hGetContents and forcing the result. It is still possible to write programs that return unexpected _|_, but they violate a clear conceptual guideline ("only fully evaluated values may escape the scope of withFile "). Granted, Iteratees make it impossible to write such programs, but they come with the terrible price of code duplication. Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

On Jun 29, 2011, at 4:30 AM, Heinrich Apfelmus wrote:
It is still possible to write programs that return unexpected _|_, but they violate a clear conceptual guideline ("only fully evaluated values may escape the scope of withFile ").
I wonder. If that is the behavior you desire, why not just call deepseq on the value before you return it to guarantee it.
Granted, Iteratees make it impossible to write such programs, but they come with the terrible price of code duplication.
I don't really understand this objection, though. Won't any iteratee library provide all those functions? The user shouldn't have to write them. In this way, it is certainly no worse than the duplication brought about by ByteStrings. Also, isn't that the point of the ListLike class? ____________________ David Place Owner, Panpipes Ho! LLC http://panpipesho.com d@vidplace.com

David Place wrote:
On Jun 29, 2011, at 4:30 AM, Heinrich Apfelmus wrote:
It is still possible to write programs that return unexpected _|_, but they violate a clear conceptual guideline ("only fully evaluated values may escape the scope of withFile ").
I wonder. If that is the behavior you desire, why not just call deepseq on the value before you return it to guarantee it.
Hm. The idea was that while evaluating the return value in full is a sufficient requirement, it is not a necessary requirement. The return value may contain unevaluated expressions as long as the input stream is being forced. Example: do s <- hGetContents h let x = take 10 s `deepseq` [reverse . drop 3 $ take 7 s] evaluate x hClose h Evaluating x to weak head normal form will force the first 10 characters of the input, but x will still contain unevaluated expressions. (This is what Iteratees do: they force the input without forcing the return value.) But you are probably right, it's better to give guarantees. The behavior above can still be simulated with a broken deepseq instance. data Lazy a = Lazy a data NFData (Lazy a) where rnf _ = () -- broken on purpose This way, breaking the guarantee takes more effort than not breaking it, as it should be.
Granted, Iteratees make it impossible to write such programs, but they come with the terrible price of code duplication.
I don't really understand this objection, though. Won't any iteratee library provide all those functions? The user shouldn't have to write them. In this way, it is certainly no worse than the duplication brought about by ByteStrings. Also, isn't that the point of the ListLike class?
I think the ListLike class demonstrates very well that there will always be useful list functions that are not provided by an existing API. :) Also note that the ListLike class from the ListLike package doesn't work for Iteratees, they have to provide their own functions and so on. At some point, the count of different implementations for, holy lambda, *lazy lists* simply becomes ridiculous. ;) Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Okay, I'm going to give some more stylistic notes.
8. let combined = parseArgs Args
You don't need this one.
Lines 9 - 11 let sep = separateFiles args fl = head (fst sep) showResultsFile fl (snd sep)
Why not: let (f, s) = separateFiles args showResultsFile (head f) s
Lines 19-20 processFile fl flags = mapM dispatch [(flag, fl) | flag <- flags]
Why isn't this on one line? Its not particularly long.
Lines 25-31 dispatch (fl, fn) = do handle <- openFile fl ReadMode contents <- hGetContents handle let cnt = getCounter fn res = cnt contents hClose handle return res
Cleaner is: dispatch (fl, fn) = do contents <- withFile fl ReadMode hGetContents return $ getCounter fn contents However also note that you are defining a function of the form f :: (a, b) -> c rather than the better f :: a -> b -> c Naturally, I see that the reason is that you are calling it from mapM. However, I would define dispatch as a curried function, and then use `curry dispatch` in your mapM. It makes dispatch much cleaner.
Lines 36-44 a bunch of putStrLn
You could also use mapM_ putStrLn [list of string to put out], removing duplicate uses of putStrLn here.

Thank you very much Jared. Much appreciated.
On Fri, Jun 24, 2011 at 4:51 PM, Jared Hance
Okay, I'm going to give some more stylistic notes.
8. let combined = parseArgs Args
You don't need this one.
Lines 9 - 11 let sep = separateFiles args fl = head (fst sep) showResultsFile fl (snd sep)
Why not:
let (f, s) = separateFiles args showResultsFile (head f) s
Lines 19-20 processFile fl flags = mapM dispatch [(flag, fl) | flag <- flags]
Why isn't this on one line? Its not particularly long.
Lines 25-31 dispatch (fl, fn) = do handle <- openFile fl ReadMode contents <- hGetContents handle let cnt = getCounter fn res = cnt contents hClose handle return res
Cleaner is:
dispatch (fl, fn) = do contents <- withFile fl ReadMode hGetContents return $ getCounter fn contents
However also note that you are defining a function of the form
f :: (a, b) -> c
rather than the better
f :: a -> b -> c
Naturally, I see that the reason is that you are calling it from mapM. However, I would define dispatch as a curried function, and then use `curry dispatch` in your mapM. It makes dispatch much cleaner.
Lines 36-44 a bunch of putStrLn
You could also use mapM_ putStrLn [list of string to put out], removing duplicate uses of putStrLn here.
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
-- Panagiotis Koutsourakis

On Fri, Jun 24, 2011 at 11:51 PM, Jared Hance
Lines 25-31 dispatch (fl, fn) = do handle <- openFile fl ReadMode contents <- hGetContents handle let cnt = getCounter fn res = cnt contents hClose handle return res
Cleaner is:
dispatch (fl, fn) = do contents <- withFile fl ReadMode hGetContents return $ getCounter fn contents
I had thought hGetContents was lazy IO. If so, will that even work? The handle to the file will be closed before much of the file is even read. Antoine

I had thought hGetContents was lazy IO. If so, will that even work? The handle to the file will be closed before much of the file is even read.
Antoine
My guess is that the code I gave, which honestly is bad because I forgot about readFile, is probably lazy rather than strict because according to [0], readFile is lazy despite the fact that it doesn't require explicit hClose. Sorry, looks like I accidently hit r instead of L so it got taken off the mailing list. [0] http://book.realworldhaskell.org/read/io.html Look at that--the guts of the program take up only two lines! readFile returned a lazy String, which we stored in inpStr. We then took that, processed it, and passed it to writeFile for writing. 5 comments
participants (11)
-
Antoine Latter
-
Clint Moore
-
Daniel Fischer
-
David Place
-
Heinrich Apfelmus
-
Jared Hance
-
Manfred Lotz
-
Mike Meyer
-
Panagiotis Koutsourakis
-
Vesa Kaihlavirta
-
Yitzchak Gale