
On 07 April 2006 22:38, Andy Gill wrote:
On Apr 7, 2006, at 3:59 AM, Rene de Visser wrote:
Hello,
As deepSeq has a non local effect, I think it requires a non-local source transformation to implement it. One option would be for the compiler to create a second deepSeq version of every function definition.
e.g.
If the user defines a function f
f x = g h x
then the compile creates an additional function !!f
!!f x = temp `seq` temp where temp = !!g !!h x
which uses the compiler generated functions !!g and !!h.
It looks like library writers are increasingly doing this manually. Creating a strict and non strict version of a number of the functions provided. This would automate that.
Rene.
It depend on the semantics of deepSeq. If deepSeq just performs seq on all constructors recursively, then that can be implemented as a runtime primitive. If deepSeq is making all embedded partial applications strict, then yes this might be a non-local effect.
What are the semantics of !!(\ x -> ...)?
I am calling for the version of deepSeq/strict that evaluates all thunks, but does not strictify the arguments to partial application, because - I believe this is straightforward to implement
It's not *completely* straightforward to implement, at least in GHC, and at least if you want to implement it in a modular way (i.e. without touching lots of different parts of the system). The obvious way to "add a bit to a closure" is to use the LSB of the info pointer, which currently is always 0. However, that means masking out this bit every time you want to get the info pointer of a closure, which means lots of changes to the runtime. The price seems pretty high. An alternative is to have two info tables for every constructor, one normal one and one "deepSeq'd", and the normal one probably needs to point to the deepSeq'd version. This doesn't require masking out any bits, but it does increase code size (one extra info table + entry code for every constructor, except possibly those that don't contain any pointer fields), and one extra field in a constructor's info table. Plus associated cache pollution. Yet another alternative is to store fully evaluated data in a segregated part of the heap. The garbage collector could do this - indeed we already do something similar, in that data that has no pointer fields is kept separate. Checking the "deepSeq" bit on a closure is then more complicated - but this has the advantage that only the GC and storage manager are affected. None of these solutions is as simple and self-contained as I'd like :-( Cheers, Simon

On Mon, Apr 10, 2006 at 10:10:18AM +0100, Simon Marlow wrote:
It's not *completely* straightforward to implement, at least in GHC, and at least if you want to implement it in a modular way (i.e. without touching lots of different parts of the system).
The obvious way to "add a bit to a closure" is to use the LSB of the info pointer, which currently is always 0. However, that means masking out this bit every time you want to get the info pointer of a closure, which means lots of changes to the runtime. The price seems pretty high.
An alternative is to have two info tables for every constructor, one normal one and one "deepSeq'd", and the normal one probably needs to point to the deepSeq'd version. This doesn't require masking out any bits, but it does increase code size (one extra info table + entry code for every constructor, except possibly those that don't contain any pointer fields), and one extra field in a constructor's info table. Plus associated cache pollution.
Yet another alternative is to store fully evaluated data in a segregated part of the heap. The garbage collector could do this - indeed we already do something similar, in that data that has no pointer fields is kept separate. Checking the "deepSeq" bit on a closure is then more complicated - but this has the advantage that only the GC and storage manager are affected.
None of these solutions is as simple and self-contained as I'd like :-(
it is unlikely it will even be possible to implement in jhc without radical changes to its internals. there is just no where to attach a bit to, and even if there were, there is no generic way to evaluate something to WHNF, or even a concept of WHNF in final grin. (grin code can look inside unevaluated closures, hopefully making the thunk non-updatable) John -- John Meacham - ⑆repetae.net⑆john⑈

On Apr 10, 2006, at 2:25 AM, John Meacham wrote:
On Mon, Apr 10, 2006 at 10:10:18AM +0100, Simon Marlow wrote:
It's not *completely* straightforward to implement, at least in GHC, and at least if you want to implement it in a modular way (i.e. without touching lots of different parts of the system).
The obvious way to "add a bit to a closure" is to use the LSB of the info pointer, which currently is always 0. However, that means masking out this bit every time you want to get the info pointer of a closure, which means lots of changes to the runtime. The price seems pretty high.
An alternative is to have two info tables for every constructor, one normal one and one "deepSeq'd", and the normal one probably needs to point to the deepSeq'd version. This doesn't require masking out any bits, but it does increase code size (one extra info table + entry code for every constructor, except possibly those that don't contain any pointer fields), and one extra field in a constructor's info table. Plus associated cache pollution.
Yet another alternative is to store fully evaluated data in a segregated part of the heap. The garbage collector could do this - indeed we already do something similar, in that data that has no pointer fields is kept separate. Checking the "deepSeq" bit on a closure is then more complicated - but this has the advantage that only the GC and storage manager are affected.
None of these solutions is as simple and self-contained as I'd like :-(
it is unlikely it will even be possible to implement in jhc without radical changes to its internals. there is just no where to attach a bit to, and even if there were, there is no generic way to evaluate something to WHNF, or even a concept of WHNF in final grin. (grin code can look inside unevaluated closures, hopefully making the thunk non-updatable)
I do not understand. - (A) I'm calling for a recursive descent function that does seq. I could write it in Haskell, for any specific type. How is seq implemented jhs? - (B) Once we have this recursive function, I'm advocating for an optimization which will make it cheap. Why can't we just steal a bit in the (GHC) info table, rather than mess with LSB of pointers, or have two info tables? Yes, in grin this information would need to be used at compile time but the resulting code would be considerably faster. A deepSeq is a gift to the compiler from the programmer. Andy Gill

You're assuming some particular representation where there are bits to steal. I don't like this at all. I think tying deepSeq to some particular implementation techniques is a reall *BAD* idea. Any function that is not defineable in (pure) Haskell should be viewed with utmost suspicion. The seq function is one of these. At least seq has simple denotational semantics, which can't be said for deepSeq. I say, put deepSeq in a type class (which is what I've done when I need it). -- Lennart Andy Gill wrote:
On Apr 10, 2006, at 2:25 AM, John Meacham wrote:
On Mon, Apr 10, 2006 at 10:10:18AM +0100, Simon Marlow wrote:
It's not *completely* straightforward to implement, at least in GHC, and at least if you want to implement it in a modular way (i.e. without touching lots of different parts of the system).
The obvious way to "add a bit to a closure" is to use the LSB of the info pointer, which currently is always 0. However, that means masking out this bit every time you want to get the info pointer of a closure, which means lots of changes to the runtime. The price seems pretty high.
An alternative is to have two info tables for every constructor, one normal one and one "deepSeq'd", and the normal one probably needs to point to the deepSeq'd version. This doesn't require masking out any bits, but it does increase code size (one extra info table + entry code for every constructor, except possibly those that don't contain any pointer fields), and one extra field in a constructor's info table. Plus associated cache pollution.
Yet another alternative is to store fully evaluated data in a segregated part of the heap. The garbage collector could do this - indeed we already do something similar, in that data that has no pointer fields is kept separate. Checking the "deepSeq" bit on a closure is then more complicated - but this has the advantage that only the GC and storage manager are affected.
None of these solutions is as simple and self-contained as I'd like :-(
it is unlikely it will even be possible to implement in jhc without radical changes to its internals. there is just no where to attach a bit to, and even if there were, there is no generic way to evaluate something to WHNF, or even a concept of WHNF in final grin. (grin code can look inside unevaluated closures, hopefully making the thunk non-updatable)
I do not understand.
- (A) I'm calling for a recursive descent function that does seq. I could write it in Haskell, for any specific type. How is seq implemented jhs?
- (B) Once we have this recursive function, I'm advocating for an optimization which will make it cheap. Why can't we just steal a bit in the (GHC) info table, rather than mess with LSB of pointers, or have two info tables?
Yes, in grin this information would need to be used at compile time but the resulting code would be considerably faster. A deepSeq is a gift to the compiler from the programmer.
Andy Gill
_______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://haskell.org/mailman/listinfo/haskell-prime

On Mon, Apr 10, 2006 at 02:40:44PM -0700, Andy Gill wrote:
it is unlikely it will even be possible to implement in jhc without radical changes to its internals. there is just no where to attach a bit to, and even if there were, there is no generic way to evaluate something to WHNF, or even a concept of WHNF in final grin. (grin code can look inside unevaluated closures, hopefully making the thunk non-updatable)
I do not understand.
- (A) I'm calling for a recursive descent function that does seq. I could write it in Haskell, for any specific type. How is seq implemented jhs?
it is true, if you can express it in core, then jhc can implement it. but there is no way to express 'the bit you want to steal' in core. indeed there is no where to steal a bit from. there isn't a thunk type at runtime in jhc. so implementing deepSeq in the traditinal way is fine, optimizing it in the way you describe is not.
- (B) Once we have this recursive function, I'm advocating for an optimization which will make it cheap. Why can't we just steal a bit in the (GHC) info table, rather than mess with LSB of pointers, or have two info tables?
Yes, in grin this information would need to be used at compile time but the resulting code would be considerably faster. A deepSeq is a gift to the compiler from the programmer.
actually, it may be slower in jhc. reducing to normal form is not necessarily a win in jhc, and if you do do it, you want to evaluate a function as _early_ as possible, as in, as close to its definition point. 'evals' are inlined to have a branch for every possible way a value could come about, since you could deepseq pretty much any type of object, you would end up with huge expanded evals, but worse, it would cause all these thunks to be updatable when they didn't need to be. imagine a use of a thunk, rather than evaluate it to WHNF, jhc will just pull the components right out of the unevaluated thunk, if at some point in the past it might have been deepsequed, you suddenly need a case statement to determine whether it has been evaluated or not. knowing something has not been evaluated is just as valuable as knowing it has definitely been. for a quick example, imagine you have this function and it is not optimized away in core for some reason or another.
mktup x y = (y,x)
foo x = case x of (x,y) -> bar x y main = let ... in foo (mktup a b)
now we convert to grin
fmktup x y = do return (CTuple y x)
ffoo x = do y <- eval x update x y (CTuple a b) <- y bar a b
main = do ... x <- Store (Fmktup a b) ffoo x
things starting with capital letters are tags, parenthesized things are nodes on the heap. now, we do eval-expansion in ffoo, points to analysis determines a suspended Fmktup may be passed into ffoo.
ffoo x = do x' <- fetch x y <- case x' of (Fmktup a b) -> fmktup a b update x y (CTuple a b) <- y bar a b
now, the case-of-case code motion (the y scrutinization is treated as a simple case pulls the code into ffoo
ffoo x = do x' <- fetch x case x' of (Fmktup a b) -> y <- fmktup a b update x y (CTuple a b) <- y bar a b
now, points-to analysis showed that nothing scrutinized this memory location looking for a CTuple, therefor the update is uneeded. also, fmktup is trivial so it is inlined
ffoo x = do x' <- fetch x case x' of (Fmktup a b) -> y <- Return (CTuple b a) -- inlined fmktup (CTuple a b) <- y bar a b
wrich trivially simplifise too
ffoo x = do x' <- fetch x case x' of (Fmktup a b) -> bar b a
ffoo x = do (Fmktup a b) <- fetch x bar b a
notice, there is no longer a concept of WHNF, Fmktup effectivly has become the head normal form of that call due to standard optimization, this type of transformation might happen to some suspended versions of mktup, but not others, there is no way to tell from the heap location itself whether it should be evaluated into WHNF or if uses are going to pull its arguments right out of its closure or not. deepseqing this case would only hurt performance as it would mean we couldn't get rid of the 'update' or 'check if it is already a tuple' case. of course, if mktup were some expensive call, then the opposite might be true. in any case, deepseq is not always a win. the above ffoo actually has another optimization waiting, arity raising, since it knows x is always a pointer to a Fmktup, it pulls the arguments out and passes 'a and b' to ffoo directly. since all function calls are explicit in grin, we just go through and modify each call site accordingly. John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Tuesday, April 11, 2006, 2:43:49 AM, you wrote:
true. in any case, deepseq is not always a win.
don't forget that Andy don't plan to apply deepSeq to any expression. in his program, there is a LARGE datastructure with a couple of unevaluated thunks what may be simplified by call to deepSeq. your example is based exclusively on the syntax transformations of source code, i think that in his program the logic is so complex that such syntax transformations is entirely impossible anyway i think that the easisest way for Andy to get what he need is to write ghc-specific `deepSeq` implementation that should just walk unevaluated parts of datastructure and evaluate them all. as i understand, he don't need to evaluate arguments of partially applied functions - there is just no such beasts in his data -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (5)
-
Andy Gill
-
Bulat Ziganshin
-
John Meacham
-
Lennart Augustsson
-
Simon Marlow