
On Mon, Apr 10, 2006 at 02:40:44PM -0700, Andy Gill wrote:
it is unlikely it will even be possible to implement in jhc without radical changes to its internals. there is just no where to attach a bit to, and even if there were, there is no generic way to evaluate something to WHNF, or even a concept of WHNF in final grin. (grin code can look inside unevaluated closures, hopefully making the thunk non-updatable)
I do not understand.
- (A) I'm calling for a recursive descent function that does seq. I could write it in Haskell, for any specific type. How is seq implemented jhs?
it is true, if you can express it in core, then jhc can implement it. but there is no way to express 'the bit you want to steal' in core. indeed there is no where to steal a bit from. there isn't a thunk type at runtime in jhc. so implementing deepSeq in the traditinal way is fine, optimizing it in the way you describe is not.
- (B) Once we have this recursive function, I'm advocating for an optimization which will make it cheap. Why can't we just steal a bit in the (GHC) info table, rather than mess with LSB of pointers, or have two info tables?
Yes, in grin this information would need to be used at compile time but the resulting code would be considerably faster. A deepSeq is a gift to the compiler from the programmer.
actually, it may be slower in jhc. reducing to normal form is not necessarily a win in jhc, and if you do do it, you want to evaluate a function as _early_ as possible, as in, as close to its definition point. 'evals' are inlined to have a branch for every possible way a value could come about, since you could deepseq pretty much any type of object, you would end up with huge expanded evals, but worse, it would cause all these thunks to be updatable when they didn't need to be. imagine a use of a thunk, rather than evaluate it to WHNF, jhc will just pull the components right out of the unevaluated thunk, if at some point in the past it might have been deepsequed, you suddenly need a case statement to determine whether it has been evaluated or not. knowing something has not been evaluated is just as valuable as knowing it has definitely been. for a quick example, imagine you have this function and it is not optimized away in core for some reason or another.
mktup x y = (y,x)
foo x = case x of (x,y) -> bar x y main = let ... in foo (mktup a b)
now we convert to grin
fmktup x y = do return (CTuple y x)
ffoo x = do y <- eval x update x y (CTuple a b) <- y bar a b
main = do ... x <- Store (Fmktup a b) ffoo x
things starting with capital letters are tags, parenthesized things are nodes on the heap. now, we do eval-expansion in ffoo, points to analysis determines a suspended Fmktup may be passed into ffoo.
ffoo x = do x' <- fetch x y <- case x' of (Fmktup a b) -> fmktup a b update x y (CTuple a b) <- y bar a b
now, the case-of-case code motion (the y scrutinization is treated as a simple case pulls the code into ffoo
ffoo x = do x' <- fetch x case x' of (Fmktup a b) -> y <- fmktup a b update x y (CTuple a b) <- y bar a b
now, points-to analysis showed that nothing scrutinized this memory location looking for a CTuple, therefor the update is uneeded. also, fmktup is trivial so it is inlined
ffoo x = do x' <- fetch x case x' of (Fmktup a b) -> y <- Return (CTuple b a) -- inlined fmktup (CTuple a b) <- y bar a b
wrich trivially simplifise too
ffoo x = do x' <- fetch x case x' of (Fmktup a b) -> bar b a
ffoo x = do (Fmktup a b) <- fetch x bar b a
notice, there is no longer a concept of WHNF, Fmktup effectivly has become the head normal form of that call due to standard optimization, this type of transformation might happen to some suspended versions of mktup, but not others, there is no way to tell from the heap location itself whether it should be evaluated into WHNF or if uses are going to pull its arguments right out of its closure or not. deepseqing this case would only hurt performance as it would mean we couldn't get rid of the 'update' or 'check if it is already a tuple' case. of course, if mktup were some expensive call, then the opposite might be true. in any case, deepseq is not always a win. the above ffoo actually has another optimization waiting, arity raising, since it knows x is always a pointer to a Fmktup, it pulls the arguments out and passes 'a and b' to ffoo directly. since all function calls are explicit in grin, we just go through and modify each call site accordingly. John -- John Meacham - ⑆repetae.net⑆john⑈