Hiya, I've been looking at some grin output and it seems that tags have been dropped in favour of indirect calls. That is, 'eval' now does an indirect call instead of inspecting the node tag. Is there a reason for this? -- Cheers, Lemmih
On Sat, Dec 06, 2008 at 08:23:48AM +0100, Lemmih wrote:
I've been looking at some grin output and it seems that tags have been dropped in favour of indirect calls. That is, 'eval' now does an indirect call instead of inspecting the node tag. Is there a reason for this?
There were a couple reasons that functionality was added, Mainly it came down to flexibility of implementation. It provides a straightforward path for separate compilation. It is not intended that indirect calls be used always for evaluation, it just provides a universal fall-back case when direct execution is not appropriate for various reasons. Although jhc is a whole program optimizer, some support for separate compilation would be very convinient, if nothing else it will be needed for dynamic code loading at run-time. Another reason had to do with the points-to analysis, I found myself "chasing my tail" a lot of the time, in the face of features such as arbitrary impredicativity and GADTs, it became harder and harder to ensure the points-to analysis would produce an accurate and useful result in all possible cases, being able to fall back on an indirect evaluation in those cases simplified things dramatically. improving the front end, and improving the accuracy of the point-to analysis could be decoupled and no longer had to be done in lock-step. The dynamic nature of haskell programs also played a part, when looking at a histogram of the number of targets a given heap location could evaluate to it was very tail-heavy, I found that by far, exactly 1 was the most common possibility. Furthermore, after a point, the size of the jump table in the cache killed the advantage of the direct call over an indirect one. So all in all, it made sense to concentrate on a couple cases, the direct case, when you can reduce the number of possibilities to just a couple choices, and the fallback case where an indirect call is used. Note that where the "sweet spot" lies for optimal performance in terms of direct vs indirect execution is rather CPU architecture dependent. http://www.complang.tuwien.ac.at/forth/threading/ contains some test code for testing various threading methods in a forth implementation, which is somewhat relevant at least in demonstrating how different CPUs behave. John -- John Meacham - ⑆repetae.net⑆john⑈
On Sun, Dec 7, 2008 at 11:26 AM, John Meacham
On Sat, Dec 06, 2008 at 08:23:48AM +0100, Lemmih wrote:
I've been looking at some grin output and it seems that tags have been dropped in favour of indirect calls. That is, 'eval' now does an indirect call instead of inspecting the node tag. Is there a reason for this?
There were a couple reasons that functionality was added, Mainly it came down to flexibility of implementation. It provides a straightforward path for separate compilation. It is not intended that indirect calls be used always for evaluation, it just provides a universal fall-back case when direct execution is not appropriate for various reasons. Although jhc is a whole program optimizer, some support for separate compilation would be very convinient, if nothing else it will be needed for dynamic code loading at run-time. Another reason had to do with the points-to analysis, I found myself "chasing my tail" a lot of the time, in the face of features such as arbitrary impredicativity and GADTs, it became harder and harder to ensure the points-to analysis would produce an accurate and useful result in all possible cases, being able to fall back on an indirect evaluation in those cases simplified things dramatically. improving the front end, and improving the accuracy of the point-to analysis could be decoupled and no longer had to be done in lock-step. The dynamic nature of haskell programs also played a part, when looking at a histogram of the number of targets a given heap location could evaluate to it was very tail-heavy, I found that by far, exactly 1 was the most common possibility. Furthermore, after a point, the size of the jump table in the cache killed the advantage of the direct call over an indirect one. So all in all, it made sense to concentrate on a couple cases, the direct case, when you can reduce the number of possibilities to just a couple choices, and the fallback case where an indirect call is used. Note that where the "sweet spot" lies for optimal performance in terms of direct vs indirect execution is rather CPU architecture dependent.
http://www.complang.tuwien.ac.at/forth/threading/
contains some test code for testing various threading methods in a forth implementation, which is somewhat relevant at least in demonstrating how different CPUs behave.
What about the missed optimization opportunities? In the GRIN paper, Boquist shows how 'foldl (+) 0 [1..n]' could be compiled to a strict loop with no allocations. Without points-to analysis (which seems to be completely disable at this point) most of the important optimizations are blocked and the resulting program is far less efficient than it should be. -- Cheers, Lemmih
On Sun, Dec 07, 2008 at 11:43:58AM +0100, Lemmih wrote:
What about the missed optimization opportunities? In the GRIN paper, Boquist shows how 'foldl (+) 0 [1..n]' could be compiled to a strict loop with no allocations. Without points-to analysis (which seems to be completely disable at this point) most of the important optimizations are blocked and the resulting program is far less efficient than it should be.
No, it only falls back to indirect evaluation when the standard grin transformation would have not been able to improve matters much, as in, when it would have ended up a large cache-killing case scrutininization anyway. Many cases can be directly transformed into loops just like the original Grin paper. The idea was to turn the points-to analysis from a fundamental method of correctness into a powerful optimization primitive. (but not strictly necessary for compilation). I went through several iterations of 'points-to' analysis as the jhc internals evolved. This continual retooling of the analysis was something I was hoping to avoid in the future by allowed the indirect fallback. John -- John Meacham - ⑆repetae.net⑆john⑈
On Sun, Dec 7, 2008 at 11:52 AM, John Meacham
On Sun, Dec 07, 2008 at 11:43:58AM +0100, Lemmih wrote:
What about the missed optimization opportunities? In the GRIN paper, Boquist shows how 'foldl (+) 0 [1..n]' could be compiled to a strict loop with no allocations. Without points-to analysis (which seems to be completely disable at this point) most of the important optimizations are blocked and the resulting program is far less efficient than it should be.
No, it only falls back to indirect evaluation when the standard grin transformation would have not been able to improve matters much, as in, when it would have ended up a large cache-killing case scrutininization anyway. Many cases can be directly transformed into loops just like the original Grin paper. The idea was to turn the points-to analysis from a fundamental method of correctness into a powerful optimization primitive. (but not strictly necessary for compilation). I went through several iterations of 'points-to' analysis as the jhc internals evolved. This continual retooling of the analysis was something I was hoping to avoid in the future by allowed the indirect fallback.
From looking at Grin.EvalInline, it would seem the code has been disabled. The grin output of 'sum' example backs this up as 'eval' is never inlined.
-- Cheers, Lemmih
On Sun, Dec 07, 2008 at 12:02:31PM +0100, Lemmih wrote:
No, it only falls back to indirect evaluation when the standard grin transformation would have not been able to improve matters much, as in, when it would have ended up a large cache-killing case scrutininization anyway. Many cases can be directly transformed into loops just like the original Grin paper. The idea was to turn the points-to analysis from a fundamental method of correctness into a powerful optimization primitive. (but not strictly necessary for compilation). I went through several iterations of 'points-to' analysis as the jhc internals evolved. This continual retooling of the analysis was something I was hoping to avoid in the future by allowed the indirect fallback.
From looking at Grin.EvalInline, it would seem the code has been disabled. The grin output of 'sum' example backs this up as 'eval' is never inlined.
It is entirely likely I disabled it while debugging another part of the code for fault isolation. The new runtime was designed to support more of an 'compilation by transformation' sort of style of development. Accepting higher level calls to eval as well as direct inlined code. I believe I last had it set to use 'eval' in all cases except when it can be proved there was only a single code path, in which case it should call that function directly (or perform the loop directly). But I don't recall exactly, I was working on front end type system issues so switched off all the transformations that wern't 100% reliable. -- John Meacham - ⑆repetae.net⑆john⑈
participants (2)
-
John Meacham -
Lemmih