Cost of Overloading vs. HOFs

older
recent Windows installer for ghc...

Adrian Hey

4 May 2007 4 May '07

6:28 p.m.

Hello, The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably). I'm not sure why that should be, but has anyone else noticed this? If so, maybe this advice should be added to the user guide, especially if your function repeatedly uses just one method from a class? (or maybe not if it's nonsense :-) I guess having done this there would be little to gain by using the SPECIALIZE pragma though (unless ghc also specialises HOFs). Regards -- Adrian Hey

Show replies by date

Neil Mitchell

4 May 4 May

8:02 p.m.

Hi Adrian

...

The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).

I'm not sure why that should be, but has anyone else noticed this?

A HOF is one box, the Ord dictionary is an 8-box tuple containing HOF boxes. When you invoke compare out of Ord, you are taking something out of the tuple, whereas when you use the HOF its directly there. This is also the reason you get a performance decrease moving from a 1-element class to a 2-element class. Thanks Neil

Adrian Hey

8:53 p.m.

Neil Mitchell wrote:

...

Hi Adrian

...
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).

I'm not sure why that should be, but has anyone else noticed this?

A HOF is one box, the Ord dictionary is an 8-box tuple containing HOF boxes. When you invoke compare out of Ord, you are taking something out of the tuple, whereas when you use the HOF its directly there.

Well I can understand why overloading might be slow compared to a direct call (presumably this is what you get from specialisation). But I wouldn't have thought this additional indirection cost of method lookup was very significant compared with the HOF approach (at least not after everything was in cache). IOW I would have expected HOFs to just about as deathly to performance as (unspecialised) overloading, but it seems this isn't the case.

...

This is also the reason you get a performance decrease moving from a 1-element class to a 2-element class.

Why is that? Does ghc pass just the single method rather than a one entry dictionary in such cases? Regards -- Adrian Hey

Stefan O'Rear

10:36 p.m.

On Fri, May 04, 2007 at 09:02:03PM +0100, Neil Mitchell wrote:

...

Hi Adrian

...
The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).

I'm not sure why that should be, but has anyone else noticed this?

A HOF is one box, the Ord dictionary is an 8-box tuple containing HOF boxes. When you invoke compare out of Ord, you are taking something out of the tuple, whereas when you use the HOF its directly there.

This is also the reason you get a performance decrease moving from a 1-element class to a 2-element class.

What ever happened to OccurAnal? Stefan

Duncan Coutts

8:19 p.m.

On Fri, 2007-05-04 at 19:28 +0100, Adrian Hey wrote:

...

Hello,

The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).

I'm not sure why that should be, but has anyone else noticed this?

One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop. Duncan

Adrian Hey

8:56 p.m.

Duncan Coutts wrote:

...

One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop.

This possibility had crossed my mind too. If HOFs really are faster (for whatever reason) then it should be possible for a compiler to do this automatically. Regards -- Adrian Hey

kahl＠cas.mcmaster.ca

9:13 p.m.

Adrian Hey wrote:

...

Duncan Coutts wrote:

...
One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop.

This possibility had crossed my mind too. If HOFs really are faster (for whatever reason) then it should be possible for a compiler to do this automatically.

Once everything is in terms of dictionaries, this should even be part of ``full laziness''. Wolfram

Simon Marlow

7 May 7 May

3:17 p.m.

Duncan Coutts wrote:

...

On Fri, 2007-05-04 at 19:28 +0100, Adrian Hey wrote:

...
Hello,

The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).

I'm not sure why that should be, but has anyone else noticed this?

One might hope that in this case we could hoist the extraction of the dictionary members outside the inner loop.

In theory at least, case liberation (on with -O2) does this sort of thing. (BTW Simon: it looks like -fliberate-case-threshold was replaced by -fspec-threshold, but the change wasn't propagated to the docs). Cheers, Simon

Conal Elliott

4 May 4 May

10:07 p.m.

Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason? On 5/4/07, Adrian Hey wrote:

...

Hello,

The GHC users guide says overloading "is death to performance if left to linger in an inner loop" and one thing I noticed while playing about with the AVL lib was that using a HOF and passing the (overloaded) compare function as an explicit argument at the start seemed to give noticable a performance boost (compared with dictionary passing presumably).

I'm not sure why that should be, but has anyone else noticed this?

If so, maybe this advice should be added to the user guide, especially if your function repeatedly uses just one method from a class?

(or maybe not if it's nonsense :-)

I guess having done this there would be little to gain by using the SPECIALIZE pragma though (unless ghc also specialises HOFs).

Regards -- Adrian Hey _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

John Meacham

10:51 p.m.

On Fri, May 04, 2007 at 03:07:41PM -0700, Conal Elliott wrote:

...

Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason?

jhc also uses a dictionary free approach, doing a case directly on the type parameter. The nice thing about this is that _all_ methods can be determined by a single case evaluation, because finding out the right instance for any method will determine the right instance for all other methods too for a given type. The standard case-of-known-value optimization takes care of later scrutinizations (method lookups) on the same type. John -- John Meacham - ⑆repetae.net⑆john⑈

Conal Elliott

5 May 5 May

12:06 a.m.

Cool. You know which types to consider because jhc is a whole-program compiler? Given the whole program, why not monomorphize, and inline away all of the dictionaries? - Conal On 5/4/07, John Meacham wrote:

...

On Fri, May 04, 2007 at 03:07:41PM -0700, Conal Elliott wrote:

...
Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason?

jhc also uses a dictionary free approach, doing a case directly on the type parameter.

The nice thing about this is that _all_ methods can be determined by a single case evaluation, because finding out the right instance for any method will determine the right instance for all other methods too for a given type. The standard case-of-known-value optimization takes care of later scrutinizations (method lookups) on the same type.

John

-- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Stefan O'Rear

12:13 a.m.

On Fri, May 04, 2007 at 05:06:10PM -0700, Conal Elliott wrote:

...

Cool. You know which types to consider because jhc is a whole-program compiler?

Given the whole program, why not monomorphize, and inline away all of the dictionaries?

Because compilation might take forever - it is possible to write a program that uses a statically unbounded number of types. See Data.Sequence for a non-contrived example of this. (In fairness the issue won't appear because Data.Sequence doesn't use overloading, but in general this is a problem.) Stefan

Matthew Brecknell

12:29 a.m.

Stefan O'Rear:

...

Data.Sequence doesn't use overloading

Data.Sequence uses overloading for subtree size annotations. The structural recursion seems to make it quite awkward to express size annotations without overloading.

Stefan O'Rear

12:37 a.m.

On Sat, May 05, 2007 at 10:29:54AM +1000, Matthew Brecknell wrote:

...

Stefan O'Rear:

...
Data.Sequence doesn't use overloading

Data.Sequence uses overloading for subtree size annotations. The structural recursion seems to make it quite awkward to express size annotations without overloading.

Ah yes, I'd forgotten about that use. I was thinking of the fact that Data.Sequence is monomorphic in the measurement type. Thanks for the correction. Stefan

John Meacham

12:21 a.m.

On Fri, May 04, 2007 at 05:06:10PM -0700, Conal Elliott wrote:

...

Cool. You know which types to consider because jhc is a whole-program compiler?

Yes. though, i have since redone the back end so there is no technical reason for it to be whole program any more, you can just benefit from more optimizations that way. The old backend required global knowledge to compile at all.

...

Given the whole program, why not monomorphize, and inline away all of the dictionaries?

Well that is the thing, there are no dictionaries at all, the only extra arguments passed in are the types so, f :: forall a . (Foo a, Baz a, Bar a Int, Bred a) => a -> a still is only passed the single hidden argument of 'a' since a case on it will determine what method to use. (+) a x y = case a of Int -> plusInt x y Float -> plusFloat x y and so forth. a here is a type, of kind '*'. since the method lookup is done explicitly via the case statement, it can be optimized via standard transformations in nice ways. John

...

- Conal

On 5/4/07, John Meacham wrote:

...
On Fri, May 04, 2007 at 03:07:41PM -0700, Conal Elliott wrote:

...
Does anyone know what became of Dictionary-free Overloading by Partial Evaluation http://web.cecs.pdx.edu/%7Empj/pubs/pepm94.html? Is it impractical for some reason?

jhc also uses a dictionary free approach, doing a case directly on the type parameter.

The nice thing about this is that _all_ methods can be determined by a single case evaluation, because finding out the right instance for any method will determine the right instance for all other methods too for a given type. The standard case-of-known-value optimization takes care of later scrutinizations (method lookups) on the same type.

John

-- John Meacham - ⑆repetae.net⑆john⑈ _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

...

-- Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

-- John Meacham - ⑆repetae.net⑆john⑈

skaller

12:59 a.m.

On Fri, 2007-05-04 at 17:06 -0700, Conal Elliott wrote:

...

Cool. You know which types to consider because jhc is a whole-program compiler?

Given the whole program, why not monomorphize, and inline away all of the dictionaries?

That's what Felix does: typeclasses, no dictionaries: whole program analyser: resolves typeclasses, inlines whilst resolving typeclasses, monomorphises resolving typeclasses, and inlines again resolving typeclassses. Indirect dispatch can't be eliminated if the method is wrapped in a polymorphic closure. Monomorphisation alone can't eliminate second order polymorphism. [But Felix doesn't support that anyhow] -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net

Simon Peyton-Jones

7 May 7 May

8:28 a.m.

| The GHC users guide says overloading "is death to performance if | left to linger in an inner loop" and one thing I noticed while | playing about with the AVL lib was that using a HOF and passing | the (overloaded) compare function as an explicit argument at the | start seemed to give noticable a performance boost (compared with | dictionary passing presumably). GHC does try to optimise the task of selecting a method out of a dictionary. However optimisation is a delicate art, especially when you are looking at inner loops. It may well be that GHC screws up because of some apparently minor change. The thing to do is to look at the Core (-ddump-simpl) for your inner loop and see what the difference is. (Even then, it's not always straightforward to design optimisations that hit your case without harming someone else. But it's usually fun.) Simon

6635

Age (days ago)

6638

Last active (days ago)

List overview

Download

16 comments

11 participants

participants (11)

Adrian Hey
Conal Elliott
Duncan Coutts
John Meacham
kahl＠cas.mcmaster.ca
Matthew Brecknell
Neil Mitchell
Simon Marlow
Simon Peyton-Jones
skaller
Stefan O'Rear