problem with IO, strictness, and "let"

Michael Vanier

13 Jul 2007 13 Jul '07

12:22 a.m.

I stumbled across a problem with IO and strictness that I could fix, but I can't understand why the fix works. I've compressed it down into a program which simply computes the number of lines in a file. Here is a version that doesn't work: module Main where import System.IO import System.Environment process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h let cs = unlines $ lines c hClose h putStrLn $ show $ length cs main :: IO () main = do args <- getArgs process_file (args !! 0) This will return a length of 0 lines for any input file. Obviously, the "let" is not being evaluated strictly (nor would we expect it to be), so that when the evaluation is requested, the file is already closed and the length of the list of lines is 0 (though I might have expected an error). I then tried this: process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h let cs = id $! lines c -- try to strictly evaluate the let binding hClose h putStrLn $ show $ length cs which also failed exactly as the previous version did (i.e. always returning 0). Then I gave up on "let" and did this: process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h cs <- return $! lines c hClose h putStrLn $ show $ length cs This works. However, I don't understand why this version works and the previous version doesn't. Can anyone walk me through the evaluation? Also, is there a way to make "let" strict? TIA, Mike

Show replies by date

Stefan O'Rear

13 Jul 13 Jul

12:30 a.m.

On Thu, Jul 12, 2007 at 09:22:09PM -0700, Michael Vanier wrote:

...

I stumbled across a problem with IO and strictness that I could fix, but I can't understand why the fix works. I've compressed it down into a program which simply computes the number of lines in a file. Here is a version that doesn't work:

module Main where

import System.IO import System.Environment

process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h let cs = unlines $ lines c hClose h putStrLn $ show $ length cs

The problem is that you're closing the file twice. When you call any function of the getContents family, you assign to that function the responsibility to close the file, no sooner than it is no longer needed. Don't call hClose yourself, Bad Things will happen. If you get rid of hClose, laziness will not hurt you - infact it will help you, by allowing your program to run in constant space. Stefan

Michael Vanier

12:54 a.m.

That makes sense. Thanks! Mike Stefan O'Rear wrote:

...

On Thu, Jul 12, 2007 at 09:22:09PM -0700, Michael Vanier wrote:

...
I stumbled across a problem with IO and strictness that I could fix, but I can't understand why the fix works. I've compressed it down into a program which simply computes the number of lines in a file. Here is a version that doesn't work:

module Main where

import System.IO import System.Environment

process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h let cs = unlines $ lines c hClose h putStrLn $ show $ length cs

The problem is that you're closing the file twice. When you call any function of the getContents family, you assign to that function the responsibility to close the file, no sooner than it is no longer needed. Don't call hClose yourself, Bad Things will happen.

If you get rid of hClose, laziness will not hurt you - infact it will help you, by allowing your program to run in constant space.

Stefan

Andrew Coppin

2:59 p.m.

Stefan O'Rear wrote:

...

The problem is that you're closing the file twice. When you call any function of the getContents family, you assign to that function the responsibility to close the file, no sooner than it is no longer needed. Don't call hClose yourself, Bad Things will happen.

Care to elaborate on "bad things"? (I.e., will this just crash the program with an error, or will it do something more serious?) I must admit, I thought closing such a file was simply no-op. I can't remember why exactly, but somewhere or other I wrote some code that does this. (Basically I want to shut the file without reading all of it, so I can reopen it and start reading from the beginning again.) Is there a sane way to do this? Or am I going to have to start playing with explicit reads and writes? (Obviously I could just hang on to *all* of the input stream returned from getContents - but that could be quite large. The current way gives me low memory usage...)

Stefan O'Rear

3:08 p.m.

On Fri, Jul 13, 2007 at 07:59:22PM +0100, Andrew Coppin wrote:

...

Stefan O'Rear wrote:

...
The problem is that you're closing the file twice. When you call any function of the getContents family, you assign to that function the responsibility to close the file, no sooner than it is no longer needed. Don't call hClose yourself, Bad Things will happen.

Care to elaborate on "bad things"? (I.e., will this just crash the program with an error, or will it do something more serious?) I must admit, I thought closing such a file was simply no-op.

If you close the file, the stream will suddenly end. I believe silent data corruption is worse than a crash :) (currently, hGetContents also truncates on I/O error, but that's much less common and syslog will tell you about it anyway)

...

I can't remember why exactly, but somewhere or other I wrote some code that does this. (Basically I want to shut the file without reading all of it, so I can reopen it and start reading from the beginning again.) Is there a sane way to do this? Or am I going to have to start playing with explicit reads and writes? (Obviously I could just hang on to *all* of the input stream returned from getContents - but that could be quite large. The current way gives me low memory usage...)

Did you actually try it? getContents is *lazy*. it doesn't read any more of the file than it has to. If it read the whole file immediately, you obviously DO need the whole file. This is also why closing a file after getContentsing it is Bad - getContents can't keep reading after you've closed the handle. Stefan

Andrew Coppin

3:37 p.m.

Stefan O'Rear wrote:

...

On Fri, Jul 13, 2007 at 07:59:22PM +0100, Andrew Coppin wrote:

...
Care to elaborate on "bad things"? (I.e., will this just crash the program with an error, or will it do something more serious?) I must admit, I thought closing such a file was simply no-op.

If you close the file, the stream will suddenly end. I believe silent data corruption is worse than a crash :)

Agreed...! o_O

...

...
I can't remember why exactly, but somewhere or other I wrote some code that does this. (Basically I want to shut the file without reading all of it, so I can reopen it and start reading from the beginning again.) Is there a sane way to do this? Or am I going to have to start playing with explicit reads and writes? (Obviously I could just hang on to *all* of the input stream returned from getContents - but that could be quite large. The current way gives me low memory usage...)

Did you actually try it?

I am not 100% sure about this. I wrote code that had an execution path that does this, but I don't know if I ever actually *excercised* it... (The idea was to read, say, 10% of the file, and then close it and start reading from the beginning again.) How about redundantly closing the file after you've already read all of it anyway? Is that dangerous?

Tim Newsham

16 Jul 16 Jul

3 p.m.

...

...
...
The problem is that you're closing the file twice. When you call any function of the getContents family, you assign to that function the responsibility to close the file, no sooner than it is no longer needed. Don't call hClose yourself, Bad Things will happen.

If you close the file, the stream will suddenly end. I believe silent data corruption is worse than a crash :) (currently, hGetContents also truncates on I/O error, but that's much less common and syslog will tell you about it anyway)

Why can't hClose be more... um... lazy?

...

Stefan

Tim Newsham http://www.thenewsh.com/~newsham/

Matthew Brecknell

7:23 p.m.

Tim Newsham:

...

Why can't hClose be more... um... lazy?

Lazy in what way? hGetContents already closes the OS handle when it reaches the end of file.

Bulat Ziganshin

13 Jul 13 Jul

3:05 a.m.

Hello Michael, Friday, July 13, 2007, 8:22:09 AM, you wrote:

...

cs <- return $! lines c hClose h putStrLn $ show $ length cs

...

This works. However, I don't understand why this version works and the previous version doesn't. Can anyone walk me through the evaluation? Also, is there a way to make "let" strict?

first, it may be written also as

...

return $! lines c hClose h let cs = lines c putStrLn $ show $ length cs

the key is that return is *action* and should be executed before doing next action, hClose. '$!' means that argument should evaluated before calling 'return', so 'return $!' is a sort of idiom that force immediate evaluation of its argument (but be cautious! for list it evaluates only its first element and you program actually always prints 1 for non-empty file! the right solution will be to force evaluation of whole list by calculating its 'length' or 'last', for example) as usual, i suggest you to read http://haskell.org/haskellwiki/IO_inside -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Brandon Michael Moore

3:39 a.m.

On Thu, Jul 12, 2007 at 09:22:09PM -0700, Michael Vanier wrote:

...

I stumbled across a problem with IO and strictness that I could fix, but I can't understand why the fix works. I've compressed it down into a program which simply computes the number of lines in a file. Here is a version that doesn't work:

snip snip snip

...

This will return a length of 0 lines for any input file. Obviously, the "let" is not being evaluated strictly (nor would we expect it to be), so that when the evaluation is requested, the file is already closed and the length of the list of lines is 0 (though I might have expected an error). I then tried this:

process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h let cs = id $! lines c -- try to strictly evaluate the let binding hClose h putStrLn $ show $ length cs

Calling hClose after hGetContents is the root of the problem, but this is a good example for understanding seq better too. To quote Chris Kuklewicz:

...

To quote John Meacham:

| A quick note, | x `seq` x | is always exactly equivalant to x. the reason being that your seq | would never be called to force x unless x was needed anyway. | | I only mention it because for some reason this realization did not hit | me for a long time and once it did a zen-like understanding of seq | (relative to the random placement and guessing method I had used | previously) suddenly was bestowed upon me.

I remember this anecdote because when I first read it, a zen-like understanding of seq suddenly was bestowed upon /me/. Maybe it should be in the docs. :-)

A little corrallary is (id $!) = id id $! x = x `seq` id x = x `seq` x = x = id x Remember, things defined with let don't get forced when execution in IO goes past that clause in the do block. If want to force something at a particular time between IO actions return $! should work, or Control.Exception.evaluate. I think with return $! strictness analysis might end up evaluating things earlier than you requested, but I haven't found any examples that actually get that to happen. Brandon

Albert Y. C. Lai

6:35 p.m.

Brandon Michael Moore wrote:

...

Calling hClose after hGetContents is the root of the problem, but this is a good example for understanding seq better too.

To further this end, I'll "take issue" :) with the final version that has been tested to work, and show that it still won't work. First, the program in question: import System.IO import System.Environment process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h cs <- return $! lines c hClose h putStrLn $ show $ length cs It will give a wrong answer to a file large enough. The short explanation is that seq, or its friend $!, does not evaluate its argument (lines c) entirely; it only evaluates "so much". The jargon is "weak head normal form", but concretely here are some examples: for Int, it evaluates until you have an actual number, which is all good and expected; for lists, it evaluates only until the first cons cell emerges (let's say we know the list will be non-empty). It will not hunt down the rest of the list. (Moreover, it will not even hunt down what's in the cons cell, e.g., the details of the first item of the list. But this is not too important for now.) It still happens to give the right answer to a file small enough, thanks to buffering. So here is a chronicle of execution, with confusing details - confusing because two wrongs conspire to make a right, almost: 0. open file, hGetContents. Remember that block buffering with a pretty large buffer is the default. 1. $! evalutes lines c for the first cons cell. To do that, latent code (the jargon is "thunk") installed by hGetContents is invoked and it reads something. It is in block buffering mode, so it reads blockful. The first cons cell will only emerge when the first line break is found, so it reads blocks until a block contains a line break. But it does not read more blocks. Whatever has been read will be accessible to cs. Maybe not immediately in the form of lists of strings. Part of it is already in that form, the other part is in the form of buffer content plus a thunk to convert the buffer to lists of strings just in time. That thunk intermingles code from the lines function and hGetContent. Perhaps you don't need to know that much. The bottomline is that cs has access to one or more blocks worth of stuff, which may or may not be the whole file. Exactly how much is defined by: as many blocks as to contain the first line break. 2. close file. Henceforth no further reading is possible. cs still has access to whatever has been done in the above step; it is already in memory and can't be lost. But cs has no access to whatever not in memory; it does not exist. 3. count the number of lines accessible to cs. As examples here are some scenerios: A. The whole file fits into the buffer. You will get the correct count. B. Five lines plus a little bit more fit into the buffer. The answer is six. C. The first line is very long, or the buffer is very small. The answer is one or two, depending on whether the line break falls in the middle or at the boundary of the buffer. To test for these scenerios, you can fudge the buffer size and have fun: process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode hSetBuffering h (BlockBuffering (Just 20)) c <- hGetContents h cs <- return $! lines c hClose h putStrLn $ show $ length cs There are two conclusions you can draw: For a task satisfied by a single pass, and the task traverses the whole file unconditionally: let go of control. Use hGetContents and don't bother to hClose yourself (it will be closed just in time). For a task requiring several passes, and you want the whole file read "here and now": seq won't cut it. Some people use "return $! length c" for that. There are also other ways. Consider Data.ByteString. What about a task satisfied by a single pass but it does not necessarily traverse the whole file? Automatic close won't kick in. You will hClose yourself but where to put it is a long story.

Michael Vanier

7:29 p.m.

Albert, Thanks for the very detailed reply! That's the great thing about this mailing list. I find your description of seq somewhat disturbing. Is this behavior documented in the API? I can't find it there. It suggests that perhaps there should be a really-truly-absolutely-I-mean-right-now-seq function that evaluates the first argument strictly no matter what (not that this should be something that gets used very frequently). Or are there reasons why this is not feasible? Sorry to belabor this. Learning to think lazily is IMO one of the hardest aspects of learning Haskell. Mike Albert Y. C. Lai wrote:

...

Brandon Michael Moore wrote:

...
Calling hClose after hGetContents is the root of the problem, but this is a good example for understanding seq better too.

To further this end, I'll "take issue" :) with the final version that has been tested to work, and show that it still won't work.

First, the program in question:

import System.IO import System.Environment

process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode c <- hGetContents h cs <- return $! lines c hClose h putStrLn $ show $ length cs

It will give a wrong answer to a file large enough. The short explanation is that seq, or its friend $!, does not evaluate its argument (lines c) entirely; it only evaluates "so much". The jargon is "weak head normal form", but concretely here are some examples: for Int, it evaluates until you have an actual number, which is all good and expected; for lists, it evaluates only until the first cons cell emerges (let's say we know the list will be non-empty). It will not hunt down the rest of the list. (Moreover, it will not even hunt down what's in the cons cell, e.g., the details of the first item of the list. But this is not too important for now.)

It still happens to give the right answer to a file small enough, thanks to buffering.

So here is a chronicle of execution, with confusing details - confusing because two wrongs conspire to make a right, almost:

0. open file, hGetContents. Remember that block buffering with a pretty large buffer is the default.

1. $! evalutes lines c for the first cons cell. To do that, latent code (the jargon is "thunk") installed by hGetContents is invoked and it reads something. It is in block buffering mode, so it reads blockful. The first cons cell will only emerge when the first line break is found, so it reads blocks until a block contains a line break. But it does not read more blocks.

Whatever has been read will be accessible to cs. Maybe not immediately in the form of lists of strings. Part of it is already in that form, the other part is in the form of buffer content plus a thunk to convert the buffer to lists of strings just in time. That thunk intermingles code from the lines function and hGetContent. Perhaps you don't need to know that much. The bottomline is that cs has access to one or more blocks worth of stuff, which may or may not be the whole file. Exactly how much is defined by: as many blocks as to contain the first line break.

2. close file. Henceforth no further reading is possible. cs still has access to whatever has been done in the above step; it is already in memory and can't be lost. But cs has no access to whatever not in memory; it does not exist.

3. count the number of lines accessible to cs.

As examples here are some scenerios:

A. The whole file fits into the buffer. You will get the correct count.

B. Five lines plus a little bit more fit into the buffer. The answer is six.

C. The first line is very long, or the buffer is very small. The answer is one or two, depending on whether the line break falls in the middle or at the boundary of the buffer.

To test for these scenerios, you can fudge the buffer size and have fun:

process_file :: FilePath -> IO () process_file filename = do h <- openFile filename ReadMode hSetBuffering h (BlockBuffering (Just 20)) c <- hGetContents h cs <- return $! lines c hClose h putStrLn $ show $ length cs

There are two conclusions you can draw:

For a task satisfied by a single pass, and the task traverses the whole file unconditionally: let go of control. Use hGetContents and don't bother to hClose yourself (it will be closed just in time).

For a task requiring several passes, and you want the whole file read "here and now": seq won't cut it. Some people use "return $! length c" for that. There are also other ways. Consider Data.ByteString.

What about a task satisfied by a single pass but it does not necessarily traverse the whole file? Automatic close won't kick in. You will hClose yourself but where to put it is a long story. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Stefan O'Rear

7:59 p.m.

On Fri, Jul 13, 2007 at 04:29:12PM -0700, Michael Vanier wrote:

...

Albert,

Thanks for the very detailed reply! That's the great thing about this mailing list.

I find your description of seq somewhat disturbing. Is this behavior documented in the API? I can't find it there. It suggests that perhaps there should be a really-truly-absolutely-I-mean-right-now-seq function that evaluates the first argument strictly no matter what (not that this should be something that gets used very frequently). Or are there reasons why this is not feasible?

Sorry to belabor this. Learning to think lazily is IMO one of the hardest aspects of learning Haskell.

Can you clarify what you mean by really-truly-absolutely-I-mean-right-now-seq? The entire specification of seq is in http://haskell.org/onlinereport/basic.html#sect6.2: The function seq is defined by the equations: seq ⊥ b = ⊥ seq a b = b, if a ≠ ⊥ In particular, seq is not defined in terms of evaluation. Even in a lazy model, you cannot assume any *order* of evaluation, only that both arguments will be demanded if the whole expression is. *if* the whole expression is. Seq is uniform - for ALL data types on the lhs, it evaluates it to WHNF (seeing the top constructor or lambda). A recursive seq would not be uniform, and would require a type class or just specialization. If you can see why (x `seq` x) is redundant, you probably understand laziness. Perhaps it would help to see a definition of seq in Haskell? class Eval a where seq :: a -> b -> b instance Eval (a,b) where seq (x,y) b = b instance Eval [a] where seq [] b = b seq (x:xs) b = b instance Eval (Maybe a) where seq Nothing b = b seq (Just x) b = b instance Eval (Either a b) where seq (Left x) b = b seq (Right x) b = b instance Eval Bool where seq True b = b seq False b = b instance Eval Int where ... seq (-1) b = b seq 0 b = b seq 1 b = b ... ? Stefan

Felipe Almeida Lessa

8:19 p.m.

On 7/13/07, Stefan O'Rear wrote:

...

instance Eval Int where ... seq (-1) b = b seq 0 b = b seq 1 b = b ...

I don't think you need all these cases. In fact, you can write instance Eval Int where seq 0 b = b seq _ b = b which, in GHC -O2, desugars to Main.$f1 :: {Main.Eval GHC.Base.Int} [GlobalId] [Arity 2 NoCafRefs Str: DmdType U(A)S] Main.$f1 = \ (@ b_adV) (ds_dxz :: GHC.Base.Int) (a_adi :: b_adV) -> case ds_dxz of wild_Xd { GHC.Base.I# ds1_dxA -> a_adi } As a matter of fact, the function seq' :: Int -> b -> b seq' n a = Prelude.seq n a desugars to Main.seq' :: forall b_ade. GHC.Base.Int -> b_ade -> b_ade [GlobalId] [Arity 2 NoCafRefs Str: DmdType U(A)S] Main.seq' = \ (@ b_adX) (n_adg :: GHC.Base.Int) (a_adh :: b_adX) -> case n_adg of tpl_Xf { GHC.Base.I# a1_sxX -> a_adh } which is the same as before. For those not familiar with GHC core, the above corresponds to seq :: Int -> b -> b seq n a = case n of I# _ -> a and the DmdType (created by the strictness analyser) tell us that the first argument is unboxed and absent ("U(A)"), so indeed its use is strict. It also tell us that the second argument is strict ("S"). Although we don't evaluate it to WHNF, we *always* return it (unless, of course, our Int is _|_), so it is needed to produce the result of the function regardless of the other argument. I'm also tempted to note that the second argument is indeed strict as seq n _|_ = _|_ for every n. HTH, -- Felipe.

Stefan O'Rear

8:36 p.m.

On Fri, Jul 13, 2007 at 09:19:58PM -0300, Felipe Almeida Lessa wrote:

...

On 7/13/07, Stefan O'Rear wrote:

...
instance Eval Int where ... seq (-1) b = b seq 0 b = b seq 1 b = b ...

I don't think you need all these cases. In fact, you can write

instance Eval Int where seq 0 b = b seq _ b = b [snip long explanation of why this is correct]

I thought of that, but decided that it would just obscure things, so I wrote out an infinite sequence of cases instead. This way it reinforces the symmetry of the instances and doesn't depend on an (often lacking) understanding of the pattern strictness rules. Both are equally valid. Stefan

Michael Vanier

8:20 p.m.

Stefan, Thanks for your comments, as always. What I meant by really-truly-absolutely-I-mean-right-now-seq is something that would evaluate its argument as far as it is possible to do so i.e. something that forces strict evaluation of an argument. That's what I thought seq did, but now I see I was wrong; it only "goes one deep" as it were. In fact, as you say, seq is not defined in terms of evaluation; all that it guarantees is that its first argument is either (a) bottom, in which the result of the entire seq is bottom, or (b) not bottom. To do so it has to evaluate the first argument only far enough to show bottom-ness or not, which is not strict evaluation as I understand it. So am I right in saying that Haskell has no way to force strict evaluation? Or am I confused as to the correct definition of "strict"? Mike Stefan O'Rear wrote:

...

On Fri, Jul 13, 2007 at 04:29:12PM -0700, Michael Vanier wrote:

...
Albert,

Thanks for the very detailed reply! That's the great thing about this mailing list.

I find your description of seq somewhat disturbing. Is this behavior documented in the API? I can't find it there. It suggests that perhaps there should be a really-truly-absolutely-I-mean-right-now-seq function that evaluates the first argument strictly no matter what (not that this should be something that gets used very frequently). Or are there reasons why this is not feasible?

Sorry to belabor this. Learning to think lazily is IMO one of the hardest aspects of learning Haskell.

Can you clarify what you mean by really-truly-absolutely-I-mean-right-now-seq?

The entire specification of seq is in http://haskell.org/onlinereport/basic.html#sect6.2:

The function seq is defined by the equations:

seq âŠ¥ b = âŠ¥ seq a b = b, if a â‰ âŠ¥

In particular, seq is not defined in terms of evaluation.

Even in a lazy model, you cannot assume any *order* of evaluation, only that both arguments will be demanded if the whole expression is. *if* the whole expression is.

Seq is uniform - for ALL data types on the lhs, it evaluates it to WHNF (seeing the top constructor or lambda). A recursive seq would not be uniform, and would require a type class or just specialization.

If you can see why (x `seq` x) is redundant, you probably understand laziness.

Perhaps it would help to see a definition of seq in Haskell?

class Eval a where seq :: a -> b -> b

instance Eval (a,b) where seq (x,y) b = b instance Eval [a] where seq [] b = b seq (x:xs) b = b instance Eval (Maybe a) where seq Nothing b = b seq (Just x) b = b instance Eval (Either a b) where seq (Left x) b = b seq (Right x) b = b instance Eval Bool where seq True b = b seq False b = b instance Eval Int where ... seq (-1) b = b seq 0 b = b seq 1 b = b ...

?

Stefan

Tim Chevalier

8:24 p.m.

On 7/13/07, Michael Vanier wrote:

...

Stefan,

Thanks for your comments, as always.

What I meant by really-truly-absolutely-I-mean-right-now-seq is something that would evaluate its argument as far as it is possible to do so i.e. something that forces strict evaluation of an argument. That's what I thought seq did, but now I see I was wrong; it only "goes one deep" as it were. In fact, as you say, seq is not defined in terms of evaluation; all that it guarantees is that its first argument is either (a) bottom, in which the result of the entire seq is bottom, or (b) not bottom. To do so it has to evaluate the first argument only far enough to show bottom-ness or not, which is not strict evaluation as I understand it. So am I right in saying that Haskell has no way to force strict evaluation? Or am I confused as to the correct definition of "strict"?

A function f is strict if f _|_ = _|_ -- seq, as defined in Haskell, is strict in its first argument. So seq does force strict evaluation. Even if you define "strict evaluation" as evaluating a structure completely or returning _|_, then you can certainly force strict evaluation in Haskell -- but you can't write a polymorphic function that does so, unless you use type class overloading. Cheers, Tim -- Tim Chevalier* catamorphism.org *Often in error, never in doubt "'There are no atheists in foxholes' isn't an argument against atheism, it's an argument against foxholes." -- James Morrow

Felipe Almeida Lessa

8:27 p.m.

On 7/13/07, Michael Vanier wrote:

...

What I meant by really-truly-absolutely-I-mean-right-now-seq is something that would evaluate its argument as far as it is possible to do so i.e. something that forces strict evaluation of an argument.

What you want is to reduce something to normal form. We have the rnf function (from the NFData class) on Control.Parallel.Strategies for this purpose, see http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Parallel-... and the must-read (IMHO) paper on http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/strategies.html HTH, -- Felipe.

Derek Elkins

8:29 p.m.

On Fri, 2007-07-13 at 17:20 -0700, Michael Vanier wrote:

...

Stefan,

Thanks for your comments, as always.

What I meant by really-truly-absolutely-I-mean-right-now-seq is something that would evaluate its argument as far as it is possible to do so i.e. something that forces strict evaluation of an argument. That's what I thought seq did, but now I see I was wrong; it only "goes one deep" as it were. In fact, as you say, seq is not defined in terms of evaluation; all that it guarantees is that its first argument is either (a) bottom, in which the result of the entire seq is bottom, or (b) not bottom. To do so it has to evaluate the first argument only far enough to show bottom-ness or not, which is not strict evaluation as I understand it. So am I right in saying that Haskell has no way to force strict evaluation? Or am I confused as to the correct definition of "strict"?

There's no polymorphic reduce to normal form function, though one could be implemented as a primitive. There are two typeclasses for this, the DeepSeq library that floats around and NFData in Control.Parallel.Strategies (with method rnf, wonder where they got that...) This is not unreasonable. seq being polymorphic is a significant source of woe (it was not always so).

Daniel McAllansmith

8:23 p.m.

On Saturday 14 July 2007 11:29, Michael Vanier wrote:

...

Albert,

Thanks for the very detailed reply! That's the great thing about this mailing list.

I find your description of seq somewhat disturbing. Is this behavior documented in the API? I can't find it there.

As I understand it seq is well-behaved. Though there is the x `seq` x = id trickiness and maybe some trickiness around undefined. The disturbing behaviour is a consequence of the inherent unsafeness of hGetContents. seq is being used to try and mitigate some of this unsafeness but that's not really the purpose of seq so you still have to be careful.

...

It suggests that perhaps there should be a really-truly-absolutely-I-mean-right-now-seq function that evaluates the first argument strictly no matter what (not that this should be something that gets used very frequently). Or are there reasons why this is not feasible?

Albert mentioned "return $! length c", that forces the whole lot because it's all needed to calculate the length. You could also look at Control.Parallel.Strategies. (I think) that it has classes and functions that let you define how to reduce to other than WHNF.

...

Sorry to belabor this. Learning to think lazily is IMO one of the hardest aspects of learning Haskell.

Like any unsafe operation if you can convince yourself that it will actually be safe go ahead and use it. But as Albert has pointed out there can be hidden complexity in proving this to yourself. There have been discussions on haskell-cafe previously about the risks of getContents, safe alternatives, whether it should be renamed unsafeGetContents, etc. Sorry I can't dredge up exact dates/subjects, maybe someone else can recall. Daniel

6523

Age (days ago)

6526

Last active (days ago)

List overview

Download

19 comments

12 participants

participants (12)

Albert Y. C. Lai
Andrew Coppin
Brandon Michael Moore
Bulat Ziganshin
Daniel McAllansmith
Derek Elkins
Felipe Almeida Lessa
Matthew Brecknell
Michael Vanier
Stefan O'Rear
Tim Chevalier
Tim Newsham