
From this I guess the answer is "-O2 -fvia-C"? I just wanted to check
Hi, I want to benchmark GHC vs some other Haskell compilers, what flags should I use? I just want whichever package of flags is "good optimisation" - nothing unsafe, nothing that changes strictness properties, and a set of flags that is generally applicable - but slow compilation is fine. As far as the user guide says: * At the moment, -O2 is unlikely to produce better code than -O. * When we want to go for broke, we tend to use -O2 -fvia-C this information is still current before I start giving anyone any benchmark numbers. (for reference I'm on Windows, x86 Pentium 4/Athlon, if that makes a difference) Thanks Neil

"Neil Mitchell"
I want to benchmark GHC vs some other Haskell compilers, what flags should I use?
[...] I guess the answer is "-O2 -fvia-C"?
I tend to use -O2, but haven't really tested it against plain -O.
From what I've seen -fvia-C is sometimes faster, sometimes slower, I tend to cross my fingers and hope the compiler uses sensible defaults on the current architecture.
One thing that IME makes a difference is -funbox-strict-fields. It's probably better to use pragmas for this, though. Another thing to consider is garbage collection RTS flags, those can sometimes make a big difference. -k -- If I haven't seen further, it is by standing in the footprints of giants

ketil+haskell:
"Neil Mitchell"
writes: I want to benchmark GHC vs some other Haskell compilers, what flags should I use?
[...] I guess the answer is "-O2 -fvia-C"?
I tend to use -O2, but haven't really tested it against plain -O.
From what I've seen -fvia-C is sometimes faster, sometimes slower, I tend to cross my fingers and hope the compiler uses sensible defaults on the current architecture.
One thing that IME makes a difference is -funbox-strict-fields. It's probably better to use pragmas for this, though. Another thing to consider is garbage collection RTS flags, those can sometimes make a big difference.
All this and more on the under-publicised Performance wiki, http://haskell.org/haskellwiki/Performance -- Don

Hi
One thing that IME makes a difference is -funbox-strict-fields. It's probably better to use pragmas for this, though. Another thing to consider is garbage collection RTS flags, those can sometimes make a big difference.
I _don't_ want to speed up a particular program by modifying it, I want to take a set of existing programs which are treated as black boxes, and compile them all with the same flags. I don't want to experiment to see which flags give the best particular result on a per program basis, or even for the benchmark as a whole, I just want to know what the "standard recommendation" is for people who want fast code but not to understand anything.
All this and more on the under-publicised Performance wiki, http://haskell.org/haskellwiki/Performance
It's a very good resource, and I've read it before :) Another way to treat my question is, the wiki says "Of course, if a GHC compiled program runs slower than the same program compiled with another Haskell compiler, then it's definitely a bug" - in this sentance what does the command line look like in the GHC compiled case? Thanks Neil

These days -O2, which invokes the SpecConstr pass, can have a big effect, but only on some programs. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] | On Behalf Of Neil Mitchell | Sent: 19 October 2006 11:22 | To: Donald Bruce Stewart | Cc: GHC Users Mailing List | Subject: Re: Benchmarking GHC | | Hi | | > > One thing that IME makes a difference is -funbox-strict-fields. It's | > > probably better to use pragmas for this, though. Another thing to | > > consider is garbage collection RTS flags, those can sometimes make a | > > big difference. | > | | I _don't_ want to speed up a particular program by modifying it, I | want to take a set of existing programs which are treated as black | boxes, and compile them all with the same flags. I don't want to | experiment to see which flags give the best particular result on a per | program basis, or even for the benchmark as a whole, I just want to | know what the "standard recommendation" is for people who want fast | code but not to understand anything. | | > All this and more on the under-publicised Performance wiki, | > http://haskell.org/haskellwiki/Performance | | It's a very good resource, and I've read it before :) | | Another way to treat my question is, the wiki says "Of course, if a | GHC compiled program runs slower than the same program compiled with | another Haskell compiler, then it's definitely a bug" - in this | sentance what does the command line look like in the GHC compiled | case? | | Thanks | | Neil | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hello Simon, Thursday, October 19, 2006, 6:40:54 PM, you wrote:
These days -O2, which invokes the SpecConstr pass, can have a big effect, but only on some programs.
it also enables -optc-O2. so, answering Neil's question: -O2 -funbox-strict-fields (sidenote to SPJ: -funbox-simple-strict-fields may be a good way to _safe_ optimization) RTS -A10m option may be helpful (even with 6.6), so you may allow to run program two times - with and without this option - and select the best run btw, writing this message i thought that -fconvert-strings-to-ByteStrings option will give a significant boost to many programs without rewriting them :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, 2006-10-19 at 21:10 +0400, Bulat Ziganshin wrote:
btw, writing this message i thought that -fconvert-strings-to-ByteStrings option will give a significant boost to many programs without rewriting them :)
This kind of data refinement has a side condition on the strictness of the function. You need to know that your list function is strict in the spine and elements before it can be swapped to a version that operates on packed strings. If it's merely strict in the spine then one could switch to lazy arrays. There's also the possibility to use lists that are strict in the elements. It'd be an interesting topic to research, to see if this strictness analysis and transformation could be done automatically, and indeed if it is properly meaning preserving. Duncan

Bulat Ziganshin wrote:
Hello Simon,
Thursday, October 19, 2006, 6:40:54 PM, you wrote:
These days -O2, which invokes the SpecConstr pass, can have a big effect, but only on some programs.
it also enables -optc-O2. so, answering Neil's question:
-O2 -funbox-strict-fields
(sidenote to SPJ: -funbox-simple-strict-fields may be a good way to _safe_ optimization)
I'm not sure that -funbox-strict-fields always improves performance, even if you only do it on Ints for example. If you end up pulling out those fields and passing the Int to a lazy function, the Int will be re-boxed each time, leading to more allocation. This is the reason that -funbox-strict-fields isn't on by defualt, and why I recommend using {-# UNPACK #-} pragmas. I like -O2 -fliberate-case-threshold=30 but anything other that -O2 is really just guesswork (i.e. we haven't made any systematic measurements). You might also like -funfolding-use-threshold=50. Cheers, Simon

Hello Simon, Friday, October 20, 2006, 1:38:39 PM, you wrote:
-O2 -funbox-strict-fields
I'm not sure that -funbox-strict-fields always improves performance,
we search for solution that improves performance ON AVERAGE
I like
-O2 -fliberate-case-threshold=30
but anything other that -O2 is really just guesswork (i.e. we haven't made any systematic measurements). You might also like -funfolding-use-threshold=50.
it's something like super-O2 optimization, with a trade-off between speed and program size -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Simon,
Friday, October 20, 2006, 1:38:39 PM, you wrote:
-O2 -funbox-strict-fields
I'm not sure that -funbox-strict-fields always improves performance,
we search for solution that improves performance ON AVERAGE
Not really: would you turn on an optimisation that makes 100 programs 1% better, and 1 program 200% worse? I'm being slightly hypocritical because we already do that kind of thing with full laziness. But the rule of thumb should be that -O/-O2 never takes risks that might seriously hurt performance. I'd say -funbox-strict-fields is too much of a risk to have on by default, but it's a difficult call to make. Measurements over lots of programs would help us decide - the nofib suite isn't much good here because most of the programs are from a time before strict fields were invented. Cheers, Simon

Hello Simon, Friday, October 20, 2006, 3:12:29 PM, you wrote:
-O2 -funbox-strict-fields
I'm not sure that -funbox-strict-fields always improves performance,
we search for solution that improves performance ON AVERAGE
Not really: would you turn on an optimisation that makes 100 programs 1% better, and 1 program 200% worse?
sorry, Simon, it is that was asked by Neil. i don't mean that programmers should use this switch on their own programs without any testing -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi
So to summarise this thread "compile with -O2", unless you want to
start looking at specific programs and checking how the flags perform
etc.
Thanks,
Neil
On 10/20/06, Bulat Ziganshin
Hello Simon,
Friday, October 20, 2006, 3:12:29 PM, you wrote:
-O2 -funbox-strict-fields
I'm not sure that -funbox-strict-fields always improves performance,
we search for solution that improves performance ON AVERAGE
Not really: would you turn on an optimisation that makes 100 programs 1% better, and 1 program 200% worse?
sorry, Simon, it is that was asked by Neil. i don't mean that programmers should use this switch on their own programs without any testing
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Fri, Oct 20, 2006 at 10:38:39AM +0100, Simon Marlow wrote:
I'm not sure that -funbox-strict-fields always improves performance, even if you only do it on Ints for example. If you end up pulling out those fields and passing the Int to a lazy function, the Int will be re-boxed each time, leading to more allocation. This is the reason that -funbox-strict-fields isn't on by defualt, and why I recommend using {-# UNPACK #-} pragmas.
the happy medium I found in jhc was to always unbox any fields whose representation was smaller or equal to a pointer. It seems to work well. another worthwhile optimization that benefits from this is unboxing all enums. so data Bool = False | True desugars into data Bool = Bool# Int# False = Bool# 0# True = Bool# 1# this gives a lot of things the CPR quality that is so good for optimization and means they can be unboxed in data constructors. a small problem is that the compiler now doesn't know that Int#'s pulled out of a Bool# can only take on the values 0# or 1#, which is useful for things like case alternative elimination. I am not sure how to best handle it, but right now I am thinking of supporting 'restricted unboxed integers' somehow... John -- John Meacham - ⑆repetae.net⑆john⑈

John Meacham wrote:
On Fri, Oct 20, 2006 at 10:38:39AM +0100, Simon Marlow wrote:
I'm not sure that -funbox-strict-fields always improves performance, even if you only do it on Ints for example. If you end up pulling out those fields and passing the Int to a lazy function, the Int will be re-boxed each time, leading to more allocation. This is the reason that -funbox-strict-fields isn't on by defualt, and why I recommend using {-# UNPACK #-} pragmas.
the happy medium I found in jhc was to always unbox any fields whose representation was smaller or equal to a pointer. It seems to work well.
The Clean compiler does it also (including Doubles), and it seems to work well. Re-boxing small/basic types introduces a small penalty, but specializing polymorphic functions for basic types usually helps. Lazy and higher-order function applications are (relatively) slow anyway.
another worthwhile optimization that benefits from this is unboxing all enums.
so data Bool = False | True
desugars into
data Bool = Bool# Int#
False = Bool# 0# True = Bool# 1#
this gives a lot of things the CPR quality that is so good for optimization and means they can be unboxed in data constructors.
a small problem is that the compiler now doesn't know that Int#'s pulled out of a Bool# can only take on the values 0# or 1#, which is useful for things like case alternative elimination. I am not sure how to best handle it, but right now I am thinking of supporting 'restricted unboxed integers' somehow...
John
The (unboxed) Bool type in Clean is also implemented as an (unboxed) Int, other enums are not unboxed however. I found that using only the values 0 and 1 for booleans, as Clean does, can be a pain when wrapping foreign C calls, since C uses any non-zero value for True. Arjen

John Meacham wrote:
On Fri, Oct 20, 2006 at 10:38:39AM +0100, Simon Marlow wrote:
I'm not sure that -funbox-strict-fields always improves performance, even if you only do it on Ints for example. If you end up pulling out those fields and passing the Int to a lazy function, the Int will be re-boxed each time, leading to more allocation. This is the reason that -funbox-strict-fields isn't on by defualt, and why I recommend using {-# UNPACK #-} pragmas.
the happy medium I found in jhc was to always unbox any fields whose representation was smaller or equal to a pointer. It seems to work well.
Good idea.
another worthwhile optimization that benefits from this is unboxing all enums.
so data Bool = False | True
desugars into
data Bool = Bool# Int#
False = Bool# 0# True = Bool# 1#
Right, this occurred to me too. Alternatively we could have the strictness analyser represent a strict enumeration by Int# (I believe there's a ticket for this). I think when we discussed this for GHC the conclusion was that the latter was probably easier to implement, because we'd have to re-architect more of GHC to handle a data type with a representation that had a different number of constructors from the source data type (currently the difference between source and representation data types is only handled on a per-constructor basis). Still, I prefer the Bool# solution because it seems to expose more to the simplifier. Cheers, Simon

On Mon, Oct 23, 2006 at 01:06:48PM +0100, Simon Marlow wrote:
Right, this occurred to me too. Alternatively we could have the strictness analyser represent a strict enumeration by Int# (I believe there's a ticket for this).
I think when we discussed this for GHC the conclusion was that the latter was probably easier to implement, because we'd have to re-architect more of GHC to handle a data type with a representation that had a different number of constructors from the source data type (currently the difference between source and representation data types is only handled on a per-constructor basis).
Still, I prefer the Bool# solution because it seems to expose more to the simplifier.
Yeah, I can probably get some better numbers on how often this transformation is used, but for the cases of Bool and Ordering, it is _a lot_. most functions that return Bool do so in a CPR fashion, and most that take a Bool are strict in it, all of these get unboxed. Of course, many of these probably would have been inlined away anyway, so it is hard to say what the actual effect on performance is. but the code sure looks a lot nicer on inspection. :) An alternative to representing them as Int#'s which I considered for a while was.. data Bool = False | True ==> data Bool = Bool# Bool# data unboxed Bool# = False# | True# False = Bool# False# True = Bool# True# (and at code generation False# just becomes 0 and True# just becomes 1) this would make a new unboxed type, isomorphic to a subset of the integers, but a distinct type from them. the advantage being type safety and not inhibiting optimizations that could benefit from knowing that the unboxed bool type can only take on two values. Also, I imagine it might be a bit easier to see what is going on when reading core. in the end I decided to go with the plain old Int# route, but I am not fully committed to staying with that choice.. John -- John Meacham - ⑆repetae.net⑆john⑈

Hello Ketil, Thursday, October 19, 2006, 11:05:48 AM, you wrote:
One thing that IME makes a difference is -funbox-strict-fields. It's probably better to use pragmas for this, though. Another thing to consider is garbage collection RTS flags, those can sometimes make a big difference.
yes, it's better to unbox individual fields. i had a program where this flag leads to significant memory usage increase. smth like this: data T1 = T1 ... -- many fields data T2 = T2 !T1 !T1 !T1 make t1 = T2 t1 t1 t1 -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello Neil, Wednesday, October 18, 2006, 10:49:37 PM, you wrote:
* At the moment, -O2 is unlikely to produce better code than -O.
ghc manual full of text that was written 10 years or more ago :)
* When we want to go for broke, we tend to use -O2 -fvia-C
From this I guess the answer is "-O2 -fvia-C"? I just wanted to check
just "-O2" does the same -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (10)
-
Arjen van Weelden
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
John Meacham
-
Ketil Malde
-
Neil Mitchell
-
Simon Marlow
-
Simon Marlow
-
Simon Peyton-Jones