
Questions over questions arise in my quest to understand GHC's code generator. In this nice little piece of STG $w$snewPArray = NO_CCS[] \r[ww w] case newIntArray# [ww realWorld#] of wild { (#,#) s2# mba# -> case -# [ww 1] of y { DEFAULT -> let { wild1 = NO_CCS I#! [ww]; } in case ># [0 y] of wild2 { True -> PArray [wild1 mba#]; False -> let { $wgo2 = NO_CCS[] \r[y1 y2] let { a = NO_CCS PArray! [wild1 mba#]; } in case w of wild3 { I# e# -> case writeIntArray# [mba# y1 e# y2] of s2#1 { DEFAULT -> case ==# [y1 y] of wild11 { True -> (#,#) [s2#1 a]; False -> case +# [y1 1] of stg_c2aL { DEFAULT -> $wgo2 stg_c2aL s2#1 }; } }; }; } in case $wgo2 0 s2# of wild3 { (#,#) ds r -> r; }; } }; }; I am wondering whether there is a particular reason why the optimiser doesn't pull the (1) a = NO_CCS PArray! [wild1 mba#]; and the (2) case w of wild3 { I# e# -> out off the $wgo2 loop - or at least push (1) down into the True branch of `case ==# [y1 y] of'. I would say that (1) in this form pointlessly allocates heap like mad, but is only using one of the many copies of `a', namely the one allocated in the final iteration and returned to the caller. As for (2), the loop would be nice and straight if that unboxing where outside of the loop - as it is, we break the pipeline once per iteration it seems (if the branch prediction isn't very clever). Or do I misunderstand something here, or is there maybe some magic after STG that gets rid of the stuff? I attach the Haskell source, which I compiled with ghc -O -c PArrays.hs -fglasgow-exts -ddump-stg (This is 4.08.1 of course.) Also if somebody is looking at the attached source, I was wondering why, when I use the commented out code in `newPArray', I get a lot worse code (the STG code is in a comment at the end of the file). In particular, the lambda abstraction is not inlined, whereas `fill' gets inlined into the code of which the dump is above. Is it because the compiler has a lot harder time with explicit recursion than with fold/build? If so, the right RULES magic should allow me to do the same for my own recursively defined combinators, shouldn't it? Cheers, Manuel PS: Otherwise, it is quite impressive what the RULES and inliner do to the foldr that produced the above code.
participants (1)
-
Manuel M. T. Chakravarty