
I can never resist messages like these, even when I'm meant to be doing other things. It's very helpful when people offer fairly precise performance-bug reports. Thanks! | I am wondering whether there is a particular reason why the | optimiser doesn't pull the | | (1) a = NO_CCS PArray! [wild1 mba#]; This one is a definite bug. It turns out that the head of the before-ghci-branch doesn't have this bug, so I'm disinclined to investigate it further. | (2) case w of wild3 { | I# e# -> | | As for (2), the loop would be nice and straight if that | unboxing where outside of the loop - as it is, we break the | pipeline once per iteration it seems This one is a bit harder. Basically we want to make a wrapper for a recursive function if it's sure to evaluate its free variables. In fact the 'liberate-case' pass (which isn't switched on in 4.08) is meant to do just this. It's in simplCore/LiberateCase.lhs, and it's not very complicated. I've just tried it and it doesn't seem to have the desired effect, but I'm sure that's for a boring reason. If anyone would like to fix it, go ahead! (You can't just say '-fliberate-case' on the command line to make it go; you have to add -fliberate-case at a sensible point to the minusOflags in driver/Main.hs.) Incidentally, you'll find that -ddump-simpl gives you a dump that is pretty close to STG and usually much more readable. Most performance bugs show up there. -dverbose-simpl gives you more clues about what is happening where. | Also if somebody is looking at the attached source, I was | wondering why, when I use the commented out code in | `newPArray', I get a lot worse code (the STG code is in a | comment at the end of the file). In particular, the lambda | abstraction is not inlined, whereas `fill' gets inlined into | the code of which the dump is above. Is it because the | compiler has a lot harder time with explicit recursion than | with fold/build? If so, the right RULES magic should allow | me to do the same for my own recursively defined | combinators, shouldn't it? I couldn't figure out exactly what you meant. The only commented out code is STG code. Maybe send a module with the actual source you are bothered about. S

Simon Peyton-Jones
I can never resist messages like these, even when I'm meant to be doing other things.
That's good to know ;-)
It's very helpful when people offer fairly precise performance-bug reports. Thanks!
Thanks for the prompt reply! To SimonM, too.
| I am wondering whether there is a particular reason why the | optimiser doesn't pull the | | (1) a = NO_CCS PArray! [wild1 mba#];
This one is a definite bug. It turns out that the head of the before-ghci-branch doesn't have this bug, so I'm disinclined to investigate it further.
No problem, I can always build a new ghc from CVS.
| (2) case w of wild3 { | I# e# -> | | As for (2), the loop would be nice and straight if that | unboxing where outside of the loop - as it is, we break the | pipeline once per iteration it seems
This one is a bit harder. Basically we want to make a wrapper for a recursive function if it's sure to evaluate its free variables.
In fact the 'liberate-case' pass (which isn't switched on in 4.08) is meant to do just this. It's in simplCore/LiberateCase.lhs, and it's not very complicated. I've just tried it and it doesn't seem to have the desired effect, but I'm sure that's for a boring reason. If anyone would like to fix it, go ahead!
Ok - something for a rainy weekend, I guess...
Incidentally, you'll find that -ddump-simpl gives you a dump that is pretty close to STG and usually much more readable. Most performance bugs show up there. -dverbose-simpl gives you more clues about what is happening where.
Good to know. I just wasn't sure whether I wouldn't miss out on some optimisations.
| Also if somebody is looking at the attached source, I was | wondering why, when I use the commented out code in | `newPArray', I get a lot worse code (the STG code is in a | comment at the end of the file). In particular, the lambda | abstraction is not inlined, whereas `fill' gets inlined into | the code of which the dump is above. Is it because the | compiler has a lot harder time with explicit recursion than | with fold/build? If so, the right RULES magic should allow | me to do the same for my own recursively defined | combinators, shouldn't it?
I couldn't figure out exactly what you meant. The only commented out code is STG code. Maybe send a module with the actual source you are bothered about.
Appended the version of the module where the code is activated (it was commented out with --) - you'll find the problematic code by searching for (**) in the code. It generates $w$snewPArray = \ ww :: Int# w :: Int -> case newIntArray# @ RealWorld ww realWorld# of wild { (# s2#, mba# #) -> case $wsimpleGen ww (\ i :: Int -> case i of wild1 { I# i# -> case w of wild2 { I# e# -> __coerce (ST RealWorld ()) (\ s# :: (State# RealWorld) -> case writeIntArray# @ RealWorld mba# i# e# s# of s2#1 { __DEFAULT -> (# s2#1, () #) }) } }) s2# of wild1 { (# new_s, r #) -> let { a :: Int a = $wI# ww } in (# a, (__coerce ByteArray# mba#) #) } } One thing that is not nice is in the lambda abstraction based to $wsimpleGen. There is one lambda, then two unboxing operations and another lambda for the state variables s#. These two nested lambda's become in STG stg_c1Op = NO_CCS[] \r[i] case i of wild1 { I# i# -> case w of wild2 { I# e# -> let { stg_c1Sv = NO_CCS[] \r[s#] case writeIntArray# [mba# i# e# s#] of s2#1 { DEFAULT -> (#,#) [s2#1 ()] }; } in stg_c1Sv; }; }; So, stg_c1Sv is allocated and immediately entered. I would have hoped that the two lambda abstractions are merged into something like this: (\ i :: Int s# :: (State# RealWorld) -> case i of wild1 { I# i# -> case w of wild2 { I# e# -> case writeIntArray# @ RealWorld mba# i# e# s# of s2#1 { __DEFAULT -> (# s2#1, () #) }) } }) Maybe it is the __coerce, which prevents this from happening? Also, I am wondering why in the case of the foldr (fill mpa e) (return $ unsafeFreezeMPArray mpa) [0..n-1]) the function `fill' gets inlined into the foldr loop, but in the above, the lambda abstraction gets not inlined into $wsimpleGen. Maybe it is because $wsimpleGen itself gets not inlined... Cheers, Manuel
participants (2)
-
Manuel M. T. Chakravarty
-
Simon Peyton-Jones