Cost of Overloading vs. HOFs

Hello, The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably). I'm not sure why that should be, but has anyone else noticed this? If so, maybe this advice should be added to the user guide, especially if your function repeatedly uses just one method from a class? (or maybe not if it's nonsense :-) I guess having done this there would be little to gain by using the SPECIALIZE pragma though (unless ghc also specialises HOFs). Regards -- Adrian Hey

Hi Adrian
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).
I'm not sure why that should be, but has anyone else noticed this?
A HOF is one box, the Ord dictionary is an 8-box tuple containing HOF boxes. When you invoke compare out of Ord, you are taking something out of the tuple, whereas when you use the HOF its directly there. This is also the reason you get a performance decrease moving from a 1-element class to a 2-element class. Thanks Neil

Neil Mitchell wrote:
Hi Adrian
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).
I'm not sure why that should be, but has anyone else noticed this?
A HOF is one box, the Ord dictionary is an 8-box tuple containing HOF boxes. When you invoke compare out of Ord, you are taking something out of the tuple, whereas when you use the HOF its directly there.
Well I can understand why overloading might be slow compared to a direct call (presumably this is what you get from specialisation). But I wouldn't have thought this additional indirection cost of method lookup was very significant compared with the HOF approach (at least not after everything was in cache). IOW I would have expected HOFs to just about as deathly to performance as (unspecialised) overloading, but it seems this isn't the case.
This is also the reason you get a performance decrease moving from a 1-element class to a 2-element class.
Why is that? Does ghc pass just the single method rather than a one entry dictionary in such cases? Regards -- Adrian Hey

On Fri, May 04, 2007 at 09:02:03PM +0100, Neil Mitchell wrote:
Hi Adrian
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).
I'm not sure why that should be, but has anyone else noticed this?
A HOF is one box, the Ord dictionary is an 8-box tuple containing HOF boxes. When you invoke compare out of Ord, you are taking something out of the tuple, whereas when you use the HOF its directly there.
This is also the reason you get a performance decrease moving from a 1-element class to a 2-element class.
What ever happened to OccurAnal? Stefan

On Fri, 2007-05-04 at 19:28 +0100, Adrian Hey wrote:
Hello,
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).
I'm not sure why that should be, but has anyone else noticed this?
One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop. Duncan

Duncan Coutts wrote:
One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop.
This possibility had crossed my mind too. If HOFs really are faster (for whatever reason) then it should be possible for a compiler to do this automatically. Regards -- Adrian Hey

Adrian Hey wrote:
Duncan Coutts wrote:
One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop.
This possibility had crossed my mind too. If HOFs really are faster (for whatever reason) then it should be possible for a compiler to do this automatically.
Once everything is in terms of dictionaries, this should even be part of ``full laziness''. Wolfram

Duncan Coutts wrote:
On Fri, 2007-05-04 at 19:28 +0100, Adrian Hey wrote:
Hello,
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).
I'm not sure why that should be, but has anyone else noticed this?
One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop.
In theory at least, case liberation (on with -O2) does this sort of thing. (BTW Simon: it looks like -fliberate-case-threshold was replaced by -fspec-threshold, but the change wasn't propagated to the docs). Cheers, Simon

Does anyone know what became of Dictionary-free Overloading by Partial
Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it
impractical for some reason?
On 5/4/07, Adrian Hey
Hello,
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).
I'm not sure why that should be, but has anyone else noticed this?
If so, maybe this advice should be added to the user guide, especially if your function repeatedly uses just one method from a class?
(or maybe not if it's nonsense :-)
I guess having done this there would be little to gain by using the SPECIALIZE pragma though (unless ghc also specialises HOFs).
Regards -- Adrian Hey _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Fri, May 04, 2007 at 03:07:41PM -0700, Conal Elliott wrote:
Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason?
jhc also uses a dictionary free approach, doing a case directly on the type parameter. The nice thing about this is that _all_ methods can be determined by a single case evaluation, because finding out the right instance for any method will determine the right instance for all other methods too for a given type. The standard case-of-known-value optimization takes care of later scrutinizations (method lookups) on the same type. John -- John Meacham - ⑆repetae.net⑆john⑈

Cool. You know which types to consider because jhc is a whole-program
compiler?
Given the whole program, why not monomorphize, and inline away all of the
dictionaries?
- Conal
On 5/4/07, John Meacham
On Fri, May 04, 2007 at 03:07:41PM -0700, Conal Elliott wrote:
Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason?
jhc also uses a dictionary free approach, doing a case directly on the type parameter.
The nice thing about this is that _all_ methods can be determined by a single case evaluation, because finding out the right instance for any method will determine the right instance for all other methods too for a given type. The standard case-of-known-value optimization takes care of later scrutinizations (method lookups) on the same type.
John
-- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Fri, May 04, 2007 at 05:06:10PM -0700, Conal Elliott wrote:
Cool. You know which types to consider because jhc is a whole-program compiler?
Given the whole program, why not monomorphize, and inline away all of the dictionaries?
Because compilation might take forever - it is possible to write a program that uses a statically unbounded number of types. See Data.Sequence for a non-contrived example of this. (In fairness the issue won't appear because Data.Sequence doesn't use overloading, but in general this is a problem.) Stefan

On Sat, May 05, 2007 at 10:29:54AM +1000, Matthew Brecknell wrote:
Stefan O'Rear:
Data.Sequence doesn't use overloading
Data.Sequence uses overloading for subtree size annotations. The structural recursion seems to make it quite awkward to express size annotations without overloading.
Ah yes, I'd forgotten about that use. I was thinking of the fact that Data.Sequence is monomorphic in the measurement type. Thanks for the correction. Stefan

On Fri, May 04, 2007 at 05:06:10PM -0700, Conal Elliott wrote:
Cool. You know which types to consider because jhc is a whole-program compiler?
Yes. though, i have since redone the back end so there is no technical reason for it to be whole program any more, you can just benefit from more optimizations that way. The old backend required global knowledge to compile at all.
Given the whole program, why not monomorphize, and inline away all of the dictionaries?
Well that is the thing, there are no dictionaries at all, the only extra arguments passed in are the types so, f :: forall a . (Foo a, Baz a, Bar a Int, Bred a) => a -> a still is only passed the single hidden argument of 'a' since a case on it will determine what method to use. (+) a x y = case a of Int -> plusInt x y Float -> plusFloat x y and so forth. a here is a type, of kind '*'. since the method lookup is done explicitly via the case statement, it can be optimized via standard transformations in nice ways. John
- Conal
On 5/4/07, John Meacham
wrote: On Fri, May 04, 2007 at 03:07:41PM -0700, Conal Elliott wrote:
Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason?
jhc also uses a dictionary free approach, doing a case directly on the type parameter.
The nice thing about this is that _all_ methods can be determined by a single case evaluation, because finding out the right instance for any method will determine the right instance for all other methods too for a given type. The standard case-of-known-value optimization takes care of later scrutinizations (method lookups) on the same type.
John
-- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
-- John Meacham - ⑆repetae.net⑆john⑈

On Fri, 2007-05-04 at 17:06 -0700, Conal Elliott wrote:
Cool. You know which types to consider because jhc is a whole-program compiler?
Given the whole program, why not monomorphize, and inline away all of the dictionaries?
That's what Felix does: typeclasses, no dictionaries: whole program analyser: resolves typeclasses, inlines whilst resolving typeclasses, monomorphises resolving typeclasses, and inlines again resolving typeclassses. Indirect dispatch can't be eliminated if the method is wrapped in a polymorphic closure. Monomorphisation alone can't eliminate second order polymorphism. [But Felix doesn't support that anyhow] -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net

| The GHC users guide says overloading "is death to performance if | left to linger in an inner loop" and one thing I noticed while | playing about with the AVL lib was that using a HOF and passing | the (overloaded) compare function as an explicit argument at the | start seemed to give noticable a performance boost (compared with | dictionary passing presumably). GHC does try to optimise the task of selecting a method out of a dictionary. However optimisation is a delicate art, especially when you are looking at inner loops. It may well be that GHC screws up because of some apparently minor change. The thing to do is to look at the Core (-ddump-simpl) for your inner loop and see what the difference is. (Even then, it's not always straightforward to design optimisations that hit your case without harming someone else. But it's usually fun.) Simon
participants (11)
-
Adrian Hey
-
Conal Elliott
-
Duncan Coutts
-
John Meacham
-
kahl@cas.mcmaster.ca
-
Matthew Brecknell
-
Neil Mitchell
-
Simon Marlow
-
Simon Peyton-Jones
-
skaller
-
Stefan O'Rear