
Which of these is likely to go faster? type Quad = (Bool,Bool) foo (r,t) = let x = if r ... y = if t ... in ... data Quad = BL | BR | TL | TR foo q = let x = if q == TL | q == TR ... y = if q == BR | q == TR ... in ... (Unless somebody has a better idea...) I'm hoping that the latter one will more more strict / use less space. But I don't truely know...

On 09/08/07, Andrew Coppin
Which of these is likely to go faster?
type Quad = (Bool,Bool)
foo (r,t) = let x = if r ... y = if t ... in ...
data Quad = BL | BR | TL | TR
foo q = let x = if q == TL | q == TR ... y = if q == BR | q == TR ... in ...
(Unless somebody has a better idea...)
I'm hoping that the latter one will more more strict / use less space. But I don't truely know...
Sounds like the perfect candidate for a benchmark, then! Another tool for your toolbox: {-#OPTIONS -funbox-strict-fields #-} data Quad = Quad !Bool !Bool foo True True = ... foo True False = .... ... etc... The GHC option just causese GHC to unbox primitive types when they're strict in the data type, and the bangs cause them to be strict. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

On Thu, Aug 09, 2007 at 07:12:12PM +0100, Sebastian Sylvan wrote:
{-#OPTIONS -funbox-strict-fields #-}
data Quad = Quad !Bool !Bool
foo True True = ... foo True False = .... ... etc...
The GHC option just causese GHC to unbox primitive types when they're strict in the data type, and the bangs cause them to be strict.
Unfortunately, Bool is not a sufficiently primitive type for that to work. Stefan

On 09/08/07, Stefan O'Rear
On Thu, Aug 09, 2007 at 07:12:12PM +0100, Sebastian Sylvan wrote:
{-#OPTIONS -funbox-strict-fields #-}
data Quad = Quad !Bool !Bool
foo True True = ... foo True False = .... ... etc...
The GHC option just causese GHC to unbox primitive types when they're strict in the data type, and the bangs cause them to be strict.
Unfortunately, Bool is not a sufficiently primitive type for that to work.
Ah good point. Well I'd guess a Word8 would do (might be faster to use a Word32 or Word64 depending on your machine though?). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Stefan O'Rear wrote:
On Thu, Aug 09, 2007 at 07:12:12PM +0100, Sebastian Sylvan wrote:
{-#OPTIONS -funbox-strict-fields #-}
data Quad = Quad !Bool !Bool
foo True True = ... foo True False = .... ... etc...
The GHC option just causese GHC to unbox primitive types when they're strict in the data type, and the bangs cause them to be strict.
Unfortunately, Bool is not a sufficiently primitive type for that to work.
Stefan
Don't ya just hate it when that happens? (I.e., you say something and sound all cleaver, and then an expert points out that "actually, no".) Happens to me all the time... heh. OOC, in what way is Bool not "primitive enough"? You mean because it's an algebraic data type, rather than a bunch of bits in the machine? For that matter, just how much space does such a type typically use? (Questions, questions, so many questions...)

On Thu, Aug 09, 2007 at 09:27:23PM +0100, Andrew Coppin wrote:
OOC, in what way is Bool not "primitive enough"? You mean because it's an algebraic data type, rather than a bunch of bits in the machine? For that matter, just how much space does such a type typically use?
Yes. data Bool = False | True In general, GHC doesn't do "unboxing". Instead it has a simpler and more general approach, where it passes the fields of a single-constructor type instead of the type itself; this is as good as true unboxing in most of the interesting cases: data Int = I# Int# data Float = F# Float# data Double = D# Double# data Char = C# Char# data Ptr = Ptr Addr# ... but not always: data Bool = False | True data Integer = S# Int# | J# ByteArray# Int# As far as actual heap usage goes, GHC creates single static values for all 0-argument constructors; so all Bool WHNFs are one of two addresses, one for True and one for False. But GHC isn't quite smart enough for the -funbox-strict-fields mechanism to understand this...
(Questions, questions, so many questions...)
I like answering them. :) Stefan

Stefan O'Rear wrote:
On Thu, Aug 09, 2007 at 09:27:23PM +0100, Andrew Coppin wrote:
OOC, in what way is Bool not "primitive enough"? You mean because it's an algebraic data type, rather than a bunch of bits in the machine? For that matter, just how much space does such a type typically use?
Yes.
data Bool = False | True
In general, GHC doesn't do "unboxing". Instead it has a simpler and more general approach, where it passes the fields of a single-constructor type instead of the type itself; this is as good as true unboxing in most of the interesting cases:
data Int = I# Int# data Float = F# Float# data Double = D# Double# data Char = C# Char# data Ptr = Ptr Addr# ...
I see. (I think!)
but not always:
data Bool = False | True data Integer = S# Int# | J# ByteArray# Int#
As far as actual heap usage goes, GHC creates single static values for all 0-argument constructors; so all Bool WHNFs are one of two addresses, one for True and one for False. But GHC isn't quite smart enough for the -funbox-strict-fields mechanism to understand this...
Right. So a Bool is a 32 or 64 bit quantity. (Rather like Smalltalk...) That presumably means that a (Double,Double) is going to be a thunk that evaluates to a (,) pointing to two thunks that evaluate to pointers... IOW, something like 3 pointers' worth of space. Whereas my Quad object is going to be a pointer to one of 4 values... so it looks like Quads save space. (And they're more strict.) OTOH, I'm not sure what the time penalty is like... It would be nice if there were some general mechanism for turning a bunch of Bool flags into a single machine word. E.g., if I did a data Foo = Foo {flagX, flagY, flagZ :: !Bool} and it ends up that a Foo value is just a single machine word, and GHC picks which bit each flag is... I guess if you want that at present you'd have to code it all by hand. Hmm, I think this might work out better than my current Quad thing... I could do something like type Quad = Word8 foo q = let x = if testBit 0 q ... y = if testBit 1 q ... That should be quite fast... (?)
(Questions, questions, so many questions...)
I like answering them. :)
Heh. I'll have to pester you more often. :-P PS. Somewhere they should write a page entitled "Optimisations that GHC does (and doesn't) do"... PPS. Hmm. Might be out of date fast? ;-)

On Thu, Aug 09, 2007 at 10:19:59PM +0100, Andrew Coppin wrote:
Right. So a Bool is a 32 or 64 bit quantity. (Rather like Smalltalk...)
That presumably means that a (Double,Double) is going to be a thunk that evaluates to a (,) pointing to two thunks that evaluate to pointers... IOW, something like 3 pointers' worth of space.
I like pretty pictures. : : +--------+ : : /->| D# tag | +-----------+ +---------+ / +--------+ +--------+ | somewhere |--->| (,) tag |-/ | Value | /->| D# tag | +-----------+ +---------+ | | | +--------+ : : | fst | | | | | Value | : : +---------+ +--------+ | | | | snd |-----------------/ | | +---------+ +--------+ That's how much space a (Double,Double) NF uses.
Whereas my Quad object is going to be a pointer to one of 4 values... so it looks like Quads save space. (And they're more strict.) OTOH, I'm not sure what the time penalty is like...
Probably none. The STG-machine was designed to make user-defined algebraic types very fast.
It would be nice if there were some general mechanism for turning a bunch of Bool flags into a single machine word. E.g., if I did a
data Foo = Foo {flagX, flagY, flagZ :: !Bool}
and it ends up that a Foo value is just a single machine word, and GHC picks which bit each flag is... I guess if you want that at present you'd have to code it all by hand. Hmm, I think this might work out better than my current Quad thing... I could do something like
type Quad = Word8
foo q = let x = if testBit 0 q ... y = if testBit 1 q ...
That should be quite fast... (?)
Probably. I wound up doing something similar with vty, to considerable gain. (I did however use .&. instead of testBit - probably makes no difference, but I'm reminded of the (^2) being much slower than join(*) case...)
(Questions, questions, so many questions...)
I like answering them. :)
Heh. I'll have to pester you more often. :-P
:)
PS. Somewhere they should write a page entitled "Optimisations that GHC does (and doesn't) do"...
Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
PPS. Hmm. Might be out of date fast? ;-)
That's what wikis are for :) Stefan

On 8/10/07, Stefan O'Rear
Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
From the wiki: "Since GHC http://www.haskell.org/haskellwiki/GHC doesn't have any credible competition in the performance department these days it's hard to say what overly-slow means"
Whoa, some wiki editor's been smoking a little too many illicit substances recently :-O I guess what is meant is "since GHC doesnt have any credible competion in the performance department these days relative to other Haskell compilers, it's hard to say what overly-slow means"? (but that's not quite how it reads ;-) )

On Fri, Aug 10, 2007 at 02:08:42PM +0800, Hugh Perkins wrote:
On 8/10/07, Stefan O'Rear
wrote: Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
From the wiki: "Since GHC http://www.haskell.org/haskellwiki/GHC doesn't have any credible competition in the performance department these days it's hard to say what overly-slow means"
Whoa, some wiki editor's been smoking a little too many illicit substances recently :-O
stefan@stefans:/usr/local/src/Agda-1.0.2/src$ gcc Import.hs /usr/bin/ld:Import.hs: file format not recognized; treating as linker script /usr/bin/ld:Import.hs:1: syntax error collect2: ld returned 1 exit status stefan@stefans:/usr/local/src/Agda-1.0.2/src$
I guess what is meant is "since GHC doesnt have any credible competion in the performance department these days relative to other Haskell compilers, it's hard to say what overly-slow means"? (but that's not quite how it reads ;-) )
Personally, I think of: GHC's purpose: To implement Haskell GHC's competition: Nyhc, Jhc, Hugs, Hbc, Shc, ... a few more maybe ... Haskell's purpose: To be a generally cool language Haskell's competition: C++, SML, ... hundreds of thousands more and I make no assertion of a representative sample ... Stefan

On 8/10/07, Stefan O'Rear
Haskell's purpose: To be a generally cool language Haskell's competition: C++, SML, ... hundreds of thousands more and I make no assertion of a representative sample ...
Well, C++ is not really competitive with Haskell, because C++ does not have a GC, and it's trivial to corrupt the stack/heap. Wrt imperative languages, it probably makes more sense to compare Haskell with imperative languages that do have a GC and for which it's near impossible to accidentally corrupt the stack/heap. You'll find by the way that the imperative GC'd, stack/heap protected languages run *significantly* faster for many (not all I guess?) algorithms and applications. This will change with threading of course, but still if you've got a 1024-core Niagara 2012 machine, and the Haskell algorithm runs 65536 times as slowly as a single-core imperative GC'd language program, you're not going to see a significant speed-up ;-)

On 8/10/07, Hugh Perkins
On 8/10/07, Stefan O'Rear
wrote: Haskell's purpose: To be a generally cool language Haskell's competition: C++, SML, ... hundreds of thousands more and I make no assertion of a representative sample ...
Well, C++ is not really competitive with Haskell, because C++ does not have a GC, and it's trivial to corrupt the stack/heap.
Beg to differ. I offer the following proof by contradiction. :-) In my current job, I had a version-1 implementation in Python which had severe performance problems, and was not amenable to concurrency (The Python interpreter has a global lock, so you can only execute python bytecodes from one thread at a time. :-(). The natural alternative implementation language was C++, but I argued successfully that a Haskell implementation would be significantly easier to make concurrent. Saying that it's trivial to corrupt the stack/heap in C++ is a bit like saying it's easy to fall of a bicycle. Sure it is, but there are also well understood techniques for avoiding doing so. :-) In C++ that I write, I almost never use bare pointers. Using auto_ptr, shared_ptr, etc, handle most of the memory management issues. When they don't, one can usually make a analogous class to manage the lifetime for you. cheers, Tom -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

hughperkins:
You'll find by the way that the imperative GC'd, stack/heap protected languages run *significantly* faster for many (not all I guess?) algorithms and applications.
Wow. Big claims. It must be silly hat day on the Haskell lists. We're trying hard to be friendly, perhaps you don't realise that your inflammatory remarks are out of place here? Now, just looking at just this assertion, for which you provide no references: let's see, only imperative and GC'd eh? Ruby? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=ruby JavaScript? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=javascript Python? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=python Hmm. Not looking so good so for for the imperative, GC'd languages. Java? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=java C#? http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=csharp Doesn't look too good for your assertion :( Maybe there really isn't something fundamental about being `imperative' or being garbage collected that matters performance-wise. Rather, having a good optimising native code compiler is more to the point? ------------------------------------------------------------------------ So, what do we do about this? Your unusual advocacy is interesting, but out of place on the Haskell mailing list. Given the community is growing rather rapidly, I'd like to encourage you to contribute more quality material to the list, and to tone down the sniping. Recall that your comments go out to around 2000 people directly, and further via Gmane. So making silly insults simply has the effect of alienating the people whose help you might seek in the future. To help more clearly think about the impact of noise on the mailing list, and how to actively seek to improve (or maintain) quality, I like to refer to this useful article: http://headrush.typepad.com/creating_passionate_users/2006/12/how_to_build_a... Give back to those who give you their time, rather than insulting them with silly statements. If we can make this step, there may well be unexpected benefits, in terms of collaboration and participation, that you otherwise miss out. -- Don (Trying to encourage friendly online communities) Unfortunately, I suspect you'll snip out 90% of this mail, and reply with some non sequitor. Please prove me wrong.

On Friday 10 August 2007 07:46:36 Donald Bruce Stewart wrote:
Doesn't look too good for your assertion :(
Poor benchmark design forces the authors of the shootout to subjectively reject or cripple submissions. In fact, counting primes and printing pi are among the worst possible benchmark tasks imaginable. Regardless, of the more objective tests (spectral-norm, fasta, k-nucleotide), Haskell is slower in all cases than all of the following languages: C, C++, D, Pascal, Clean, OCaml, Java, CAL, Scala, MLton and C# (Mono). I don't know what you're counting as an imperative language but I am sure you can find some in that list. The ray tracer is a much more objective measure because it is a practically irreducible task. Haskell remains something like 3x slower than OCaml, Scheme and C++: http://www.ffconsultancy.com/languages/ray_tracer/results.html You might also like to finish the Minim interpreter or compare the performance of some other suitably small interpreters perhaps running some larger programs. A Haskell implementation of the "n"th nearest neighbours example from my book would be interesting. The program is small, computationally intensive and of practical interest. The symbolic simplifier would be a good benchmark if it were run on non-trivial input. A term rewriter to evaluate some simple Mathematica programs would be interesting. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

Hello Donald, Friday, August 10, 2007, 10:46:36 AM, you wrote:
Hmm. Not looking so good so for for the imperative, GC'd languages.
Java?
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=ghc&lang2=java C#?
Donald, i have written (an not once) that most of shooutout entries doesn't check speed of compiled code. for example, in one test TCL was the fastest language, because in most cases they measure speed of shipped libs or RTS so Hugh is right. making Haskell as fast as Java/C# will *significantly* increase its userbase. in particular, i'm now interested in development of large systems using Haskell and main problem in doing this its low performance so please don't consider this as words against Haskell, better look at your own highly-optimized code in FPS/shootout and decide whether it's written in high-level FP style? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Fri, Aug 10, 2007 at 02:28:09PM +0800, Hugh Perkins wrote:
On 8/10/07, Stefan O'Rear
wrote: Haskell's purpose: To be a generally cool language Haskell's competition: C++, SML, ... hundreds of thousands more and I make no assertion of a representative sample ...
Wrt imperative languages, it probably makes more sense to compare Haskell with imperative languages that do have a GC and for which it's near impossible to accidentally corrupt the stack/heap. You'll find by the way that the imperative GC'd, stack/heap protected languages run *significantly* faster for many (not all I guess?) algorithms and applications.
I don't have any numbers, but I've got a strong suspicion that this is almost entirely an issue of "programmer culture"; IOW, if you wrote a O'Caml to Haskell compiler, and fed idiomatic O'Caml through that and then GHC, the resulting binaries would be about as fast as if you used ocamlopt (INRIA's native code O'Caml compiler); and conversely, Haskell code translated naïvely into O'Caml would be no faster than before. (I do have the case of my Unlambda compiler, which was somewhat faster with ghc -O2 than with ocamlopt, but between CPS-conversion and the complete lack of data types in Unlambda, the resulting code was sufficiently unidiomatic in either language as to render my numbers mostly useless).
This will change with threading of course, but still if you've got a 1024-core Niagara 2012 machine, and the Haskell algorithm runs 65536 times as slowly as a single-core imperative GC'd language program, you're not going to see a significant speed-up ;-)
Just wait 12 years, and if the price of processors follows Moore's extrapolation and the Haskell keeps its parallelism, Haskell will win :) Stefan

Stefan O'Rear wrote:
Just wait 12 years, and if the price of processors follows Moore's extrapolation and the Haskell keeps its parallelism, Haskell will win :)
Nice idea. Unfortunately, writing code in Haskell does not [yet] magically cause it to become "parallel". It's just that writing [pure] code in Haskell is inherently thread-safe. ;-) I'm very excited about things like fusion and data-parallel Haskell and so forth that seem to be "out there on the horizon", but in the "right now" I'm not sure how easy it is to write parallel Haskell code. (So far I haven't even tried. Unless you count standard "server that spawns processes when clients connect" type stuff...)

Stefan O'Rear wrote:
I like pretty pictures.
...and have lots of spare time, apparently. ;-) [I actually meant to write (Bool,Bool), but anyway...]
Whereas my Quad object is going to be a pointer to one of 4 values... so it looks like Quads save space. (And they're more strict.) OTOH, I'm not sure what the time penalty is like...
Probably none. The STG-machine was designed to make user-defined algebraic types very fast.
My program needs to make decisions based on a pair of boolean values. Encoding both values as a single algebraic data type means I have to keep "taking it apart" so I can work with it. I'm not sure how much time this wastes...
It would be nice if there were some general mechanism for turning a bunch of Bool flags into a single machine word. E.g., if I did a
data Foo = Foo {flagX, flagY, flagZ :: !Bool}
and it ends up that a Foo value is just a single machine word, and GHC picks which bit each flag is... I guess if you want that at present you'd have to code it all by hand. Hmm, I think this might work out better than my current Quad thing... I could do something like
type Quad = Word8
foo q = let x = if testBit 0 q ... y = if testBit 1 q ...
That should be quite fast... (?)
Probably. I wound up doing something similar with vty, to considerable gain. (I did however use .&. instead of testBit - probably makes no difference, but I'm reminded of the (^2) being much slower than join(*) case...)
Well, perhaps I could define a pair of constants representing the bit masks? (OTOH, won't GHC optimise "testBit <constant>" into something faster anyway?)
Heh. I'll have to pester you more often. :-P
:)
Like that time yesterday, I compiled from program and got a weird message about GHC about "ignored trigraphs" or something... What the heck is a trigraph? (I compiled the program again later, and it compiled just fine. Weird...)
PS. Somewhere they should write a page entitled "Optimisations that GHC does (and doesn't) do"...
Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
OK. But it'll probably contain a lot of guessing to start with... ;-)

On Fri, Aug 10, 2007 at 07:26:28AM +0100, Andrew Coppin wrote:
Stefan O'Rear wrote:
I like pretty pictures.
...and have lots of spare time, apparently. ;-)
Indeed. :)
Probably none. The STG-machine was designed to make user-defined algebraic types very fast.
My program needs to make decisions based on a pair of boolean values. Encoding both values as a single algebraic data type means I have to keep "taking it apart" so I can work with it. I'm not sure how much time this wastes...
Good point...
Probably. I wound up doing something similar with vty, to considerable gain. (I did however use .&. instead of testBit - probably makes no difference, but I'm reminded of the (^2) being much slower than join(*) case...)
Well, perhaps I could define a pair of constants representing the bit masks? (OTOH, won't GHC optimise "testBit <constant>" into something faster anyway?)
Probably not; GHC has few rules for dealing with partial evaluation on numeric arguments. Asking GHC itself: stefan@stefans:/tmp$ ghc -c -ddump-simpl -O2 X.hs ==================== Tidy Core ==================== X.moo [NEVER Nothing] :: forall a_a82. GHC.Base.Int -> a_a82 -> a_a82 -> a_a82 [GlobalId] [Arity 3 NoCafRefs Str: DmdType U(L)LL] X.moo = \ (@ a_a88) (ix_a84 :: GHC.Base.Int) (ift_a85 :: a_a88) (iff_a86 :: a_a88) -> case ix_a84 of wild_acF { GHC.Base.I# x#_acH -> case Data.Bits.$w$s$dmbit 7 of ww1_acN { __DEFAULT -> case GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# x#_acH) (GHC.Prim.int2Word# ww1_acN)) of wild1_acO { __DEFAULT -> ift_a85; 0 -> iff_a86 } } } ==================== Tidy Core Rules ==================== stefan@stefans:/tmp$ cat X.hs module X where import Data.Bits {-# NOINLINE moo #-} -- ghc doesn't optimize functions that are deemed small enough for inlining; -- this is a good thing (since when we inline we know more about the context -- and can do a better job if we wait until then), but interferes with small -- experiments like this moo :: Int -> a -> a -> a moo ix ift iff = if testBit ix 7 then ift else iff stefan@stefans:/tmp$ The important bit is the (Data.Bits.$w$s$dmbit 7). Since that function doesn't do IO (no realworld arguments), it could in theory be evaluated at compile time (and judging from context it almost surely evaluates to 128#), but it hasn't been.
Like that time yesterday, I compiled from program and got a weird message about GHC about "ignored trigraphs" or something... What the heck is a trigraph?
Everyone's favorite obscure feature of the ANSI C99 preprocessor. Probably you had something like "this is odd???" in your source code, and were using -cpp. http://www.vmunix.com/~gabor/c/draft.html#5.2.1.1
(I compiled the program again later, and it compiled just fine. Weird...)
Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
OK. But it'll probably contain a lot of guessing to start with... ;-)
Wiki pages can be fixed. Private misunderstandings can't, at least not anywhere near as easily. Stefan

Stefan O'Rear wrote:
On Fri, Aug 10, 2007 at 07:26:28AM +0100, Andrew Coppin wrote:
My program needs to make decisions based on a pair of boolean values. Encoding both values as a single algebraic data type means I have to keep "taking it apart" so I can work with it. I'm not sure how much time this wastes...
Good point...
I suppose the only sure way to find out is to test. (IIRC, doesn't GHC have a tendance to "take apart" tuples anyway?)
Probably. I wound up doing something similar with vty, to considerable gain. (I did however use .&. instead of testBit - probably makes no difference, but I'm reminded of the (^2) being much slower than join(*) case...)
Well, perhaps I could define a pair of constants representing the bit masks? (OTOH, won't GHC optimise "testBit <constant>" into something faster anyway?)
Probably not; GHC has few rules for dealing with partial evaluation on numeric arguments. Asking GHC itself:
stefan@stefans:/tmp$ ghc -c -ddump-simpl -O2 X.hs
==================== Tidy Core ==================== X.moo [NEVER Nothing] :: forall a_a82. GHC.Base.Int -> a_a82 -> a_a82 -> a_a82 [GlobalId] [Arity 3 NoCafRefs Str: DmdType U(L)LL] X.moo = \ (@ a_a88) (ix_a84 :: GHC.Base.Int) (ift_a85 :: a_a88) (iff_a86 :: a_a88) -> case ix_a84 of wild_acF { GHC.Base.I# x#_acH -> case Data.Bits.$w$s$dmbit 7 of ww1_acN { __DEFAULT -> case GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# x#_acH) (GHC.Prim.int2Word# ww1_acN)) of wild1_acO { __DEFAULT -> ift_a85; 0 -> iff_a86 } } }
==================== Tidy Core Rules ====================
stefan@stefans:/tmp$ cat X.hs module X where
import Data.Bits
{-# NOINLINE moo #-} -- ghc doesn't optimize functions that are deemed small enough for inlining; -- this is a good thing (since when we inline we know more about the context -- and can do a better job if we wait until then), but interferes with small -- experiments like this
moo :: Int -> a -> a -> a moo ix ift iff = if testBit ix 7 then ift else iff stefan@stefans:/tmp$
The important bit is the (Data.Bits.$w$s$dmbit 7). Since that function doesn't do IO (no realworld arguments), it could in theory be evaluated at compile time (and judging from context it almost surely evaluates to 128#), but it hasn't been.
Mmm. See, now, I have *no idea* what GHC is saying. But I would have expected that if I do something like x = if testBit 3 q ... then the definition of testBit would get inlined, and then hopfully the optimiser would do something. But then, IANAGD. (I am not a GHC developer.)
Like that time yesterday, I compiled from program and got a weird message about GHC about "ignored trigraphs" or something... What the heck is a trigraph?
Everyone's favorite obscure feature of the ANSI C99 preprocessor. Probably you had something like "this is odd???" in your source code, and were using -cpp.
Er... wow. OK, well I have no idea what happened there... (I'm not using -cpp. I don't even know what it is.) I had presumed GHC was upset because it got killed on the previous run... (I was running something else and it locked up the PC.)
Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
OK. But it'll probably contain a lot of guessing to start with... ;-)
Wiki pages can be fixed. Private misunderstandings can't, at least not anywhere near as easily.
Point taken... ;-)

On Fri, Aug 10, 2007 at 06:12:03PM +0100, Andrew Coppin wrote:
[big blob of simplifier output]
Mmm. See, now, I have *no idea* what GHC is saying. But I would have expected that if I do something like
x = if testBit 3 q ...
then the definition of testBit would get inlined, and then hopfully the optimiser would do something. But then, IANAGD. (I am not a GHC developer.)
Sure, it gets inlined, and you wind up with something like: x = case 3 .&. (1 `shiftL` q) of 0 -> ... _ -> ... or, if you used the (correct and unintuitive) argument order to testBit: x = case q .&. (1 `shiftL` 3) of 0 -> ... _ -> ... We *want* (1 `shiftL` 3) to be reduced to 8 at compile time, but that doesn't seem to be happening. (And I'm not a GHC developer either. I should probably start at some point...) Stefan

Stefan O'Rear wrote:
or, if you used the (correct and unintuitive) argument order to testBit:
GAH! >_< Do you have ANY IDEA how many times I've got that wrong so far?? All I can say is thank God that Haskell is a statically-typed language! The type checker has saved my life here more times than I can count... [I seem to recall tripping over the order that the various array functions are expecting too...]
x = case q .&. (1 `shiftL` 3) of 0 -> ... _ -> ...
We *want* (1 `shiftL` 3) to be reduced to 8 at compile time, but that doesn't seem to be happening.
(And I'm not a GHC developer either. I should probably start at some point...)
Heh. There seems to be a lot of people floating around here who are. And whereas people often have to say "I am not a lawyer", round here it seems to be frequenctly necessary to say "...but I'm not a GHC developer". ;-)

Andrew Coppin wrote:
Like that time yesterday, I compiled from program and got a weird message about GHC about "ignored trigraphs" or something... What the heck is a trigraph?
Everyone's favorite obscure feature of the ANSI C99 preprocessor. Probably you had something like "this is odd???" in your source code, and were using -cpp.
Er... wow. OK, well I have no idea what happened there... (I'm not using -cpp. I don't even know what it is.) I had presumed GHC was upset because it got killed on the previous run... (I was running something else and it locked up the PC.)
Since you are after increasing your program's performance, maybe you are using -fvia-c or chose an optimization level high enough that GHC decides for itself to go via C. The message is very most probably from the C compiler. You could try passing the C compiler an appropriate flag (see gcc manual) via a ghc command line option (which I am too lazy to look up). Cheers Ben

Stefan O'Rear wrote:
Good idea! Maybe it could be fit into the GHC Performance Resource somehow? (http://www.haskell.org/haskellwiki/Performance/GHC)
OK. But it'll probably contain a lot of guessing to start with... ;-)
Wiki pages can be fixed. Private misunderstandings can't, at least not anywhere near as easily.
A page now exists: http://www.haskell.org/haskellwiki/GHC_optimisations Everybody feel free to point and laugh. (Or, more helpfully, expand and correct!)

On 8/10/07, Andrew Coppin
My program needs to make decisions based on a pair of boolean values. Encoding both values as a single algebraic data type means I have to keep "taking it apart" so I can work with it. I'm not sure how much time this wastes...
Incidentally, there is an argument that many (perhaps most) use of Bool should instead be custom datatypes. That is, instead of: type FooBar = (Bool, Bool) one should instead do something like data Foo = Foo | AntiFoo data Bar = Baz | Bo type FooBar = (Foo, Bar) which makes it clearer what's going on and harder to confuse the two booleans. Of course, now you have to replace \(foo, bar) -> if foo then ... else ... with \(foo, bar) -> if foo == Foo then ... else ... or \(foo, bar) -> case foo of { Foo -> ...; Bar -> ... } ---- Actually, that raises an interesting question. Is there a performance difference between "if foo == Foo ..." and "case Foo of ..."? I think JHC's case-hoisting should be able to transform the former into the latter, but does it?

On Sat, Aug 11, 2007 at 12:06:23AM -0400, David Menendez wrote:
On 8/10/07, Andrew Coppin
wrote: My program needs to make decisions based on a pair of boolean values. Encoding both values as a single algebraic data type means I have to keep "taking it apart" so I can work with it. I'm not sure how much time this wastes...
Incidentally, there is an argument that many (perhaps most) use of Bool should instead be custom datatypes. That is, instead of:
type FooBar = (Bool, Bool)
one should instead do something like
data Foo = Foo | AntiFoo data Bar = Baz | Bo type FooBar = (Foo, Bar)
which makes it clearer what's going on and harder to confuse the two booleans.
Of course, now you have to replace
\(foo, bar) -> if foo then ... else ... with \(foo, bar) -> if foo == Foo then ... else ... or \(foo, bar) -> case foo of { Foo -> ...; Bar -> ... }
Actually, that raises an interesting question. Is there a performance difference between "if foo == Foo ..." and "case Foo of ..."? I think JHC's case-hoisting should be able to transform the former into the latter, but does it?
You don't need to go all the way to JHC[1] for this; GHC's case-of-case transformation is perfectly adequate, as GHC itself will tell you: stefan@stefans:/tmp$ ghc -c -ddump-simpl -O2 X.hs ... X.moo :: X.Ay -> GHC.Base.Int ... X.moo = \ (x_a6D :: X.Ay) -> case x_a6D of wild_Xq { X.Be -> X.lit; X.Ce -> X.lvl } stefan@stefans:/tmp$ cat X.hs module X where data Ay = Be | Ce deriving(Eq) moo x = if x == Be then 2 else (3::Int) stefan@stefans:/tmp$ ghc -c -ddump-simpl-stats -O2 X.hs ==================== FloatOut stats: ==================== 1 Lets floated to top level; 0 Lets floated elsewhere; from 4 Lambda groups ==================== FloatOut stats: ==================== 0 Lets floated to top level; 0 Lets floated elsewhere; from 3 Lambda groups ==================== Grand total simplifier statistics ==================== Total ticks: 46 9 PreInlineUnconditionally 11 PostInlineUnconditionally 6 UnfoldingDone 1 RuleFired 1 ==#->case 9 BetaReduction 2 CaseOfCase <------- 7 KnownBranch 1 CaseMerge 11 SimplifierDone stefan@stefans:/tmp$ Stefan [1] GHC takes 2 hours to run a full two-stage bootstrap complete with the entire standard library. Jhc takes[2] at least 4 hours (I didn't let it finish) to compile just the Prelude. [2] 700MB working set, 384MiB primary store. This makes a difference.

G'day all.
Quoting Stefan O'Rear
In general, GHC doesn't do "unboxing". Instead it has a simpler and more general approach, [...]
I'm not convinced that the phrase "more general" is appropriate here. :-)
As far as actual heap usage goes, GHC creates single static values for all 0-argument constructors; so all Bool WHNFs are one of two addresses, one for True and one for False.
And, of course, if it's a strict argument, then the values stored are ALWAYS one of two possibilities. So as a matter of curiosity, would there be any advantage at all for "unboxing" enumeration types? (Apart from, I suppose, the possibility of using fewer than 32/64 bits to store a flag.) Cheers, Andrew Bromage

On Thu, Aug 09, 2007 at 11:09:36PM -0400, ajb@spamcop.net wrote:
Quoting Stefan O'Rear
: In general, GHC doesn't do "unboxing". Instead it has a simpler and more general approach, [...]
I'm not convinced that the phrase "more general" is appropriate here. :-)
Not sure where that came from; my filters are usually better than that :)
As far as actual heap usage goes, GHC creates single static values for all 0-argument constructors; so all Bool WHNFs are one of two addresses, one for True and one for False.
And, of course, if it's a strict argument, then the values stored are ALWAYS one of two possibilities. So as a matter of curiosity, would there be any advantage at all for "unboxing" enumeration types? (Apart from, I suppose, the possibility of using fewer than 32/64 bits to store a flag.)
That was actually described in the first paper on first-class unboxed types. The paper described a general mechanism for declaring user-defined unboxed types and procedures for unboxing any ADT. No idea if it was ever implemented, though. Stefan

| And, of course, if it's a strict argument, then the values stored are | ALWAYS one of two possibilities. So as a matter of curiosity, would | there be any advantage at all for "unboxing" enumeration types? (Apart | from, I suppose, the possibility of using fewer than 32/64 bits to store | a flag.) Possibly some, but less now because of pointer tagging (see paper on my home page) http://research.microsoft.com/~simonpj/papers/ptr-tag/index.htm. We've never gotten around to unboxing enumeration types because we couldn't convince ourselves that the win was big enough. My nose tells me there is probably a small win, but whether it's worth the additional complexity I'm not sure. S

On Thu, Aug 09, 2007 at 06:37:32PM +0100, Andrew Coppin wrote:
Which of these is likely to go faster? type Quad = (Bool,Bool) ... data Quad = BL | BR | TL | TR ... I'm hoping that the latter one will more more strict / use less space. But I don't truely know...
The second one will be signifigantly better for a couple reasons. A simple counting of values that they can take on will show not only this but that they are not isomorphic even, (Bool,Bool) can be one of _|_ (True,True) (True,False) (False,True) (False,False) (_|_,True) (_|_,False) (_|_,_|_) (True,_|_) (False,_|_) that is a total of 10 different cases, each time a bottom might appear, a thunk evaluation (or indirection) is involved. now, take the second case data Quad = BL | BR | TL | TR the possible values are _|_ BL BR TL TR a whole half of the other representation. under jhc (and probably ghc at some point in the future) there is another very strong advantage to the second one, since it is an enumerated type, internally it can be represented by a simple unboxed byte that takes on a value of 0,1,2,or 3, which is a very enabling optimization, especially in the 'if' case in your code. John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Friday, August 10, 2007, 5:15:56 AM, you wrote:
data Quad = BL | BR | TL | TR
under jhc (and probably ghc at some point in the future) there is another very strong advantage to the second one, since it is an enumerated type, internally it can be represented by a simple unboxed byte that takes on a value of 0,1,2,or 3, which is a very enabling optimization, especially in the 'if' case in your code.
it was implemented and merged to ghc HEAD ~1 month ago -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 8/10/07, John Meacham
On Thu, Aug 09, 2007 at 06:37:32PM +0100, Andrew Coppin wrote:
Which of these is likely to go faster? type Quad = (Bool,Bool) ... data Quad = BL | BR | TL | TR ... I'm hoping that the latter one will more more strict / use less space. But I don't truely know...
The second one will be signifigantly better for a couple reasons. A simple counting of values that they can take on will show not only this but that they are not isomorphic even,
(Bool,Bool) can be one of
_|_ (True,True) (True,False) (False,True) (False,False) (_|_,True) (_|_,False) (_|_,_|_) (True,_|_) (False,_|_)
that is a total of 10 different cases, each time a bottom might appear, a thunk evaluation (or indirection) is involved.
now, take the second case
data Quad = BL | BR | TL | TR
the possible values are
_|_ BL BR TL TR
a whole half of the other representation.
Well, there are ways to improve the situation. If you want to remove all the bottoms in your type you can define Quad as: type Quad = Data.Strict.Tuple.Pair Bool Bool I'm not sure how much speed this will gain in practice and whether it will beat the four constructor data type though. If anyone does some measurements it'd be interesting to know. Cheers, Josef PS. Data.Strict.Tuple lives in the strict package which can be found here: http://www.cse.unsw.edu.au/~rl/code/strict.html
participants (14)
-
ajb@spamcop.net
-
Andrew Coppin
-
Benjamin Franksen
-
Bulat Ziganshin
-
David Menendez
-
dons@cse.unsw.edu.au
-
Hugh Perkins
-
John Meacham
-
Jon Harrop
-
Josef Svenningsson
-
Sebastian Sylvan
-
Simon Peyton-Jones
-
Stefan O'Rear
-
Thomas Conway