Int vs Word performance?

Looking at prelude/PrelRules.hs has reminded me of an old conundrum: if I switch from Int to Word, should I expect any performance differences? A while ago, I needed lots of fairly small positive numbers, together with a small number of flags for each, so I thought I'd switch from Int to Word, and map the flags to bits. But the performance dropped so drastically that I went back to Int, slightly complicating the bitmaps. I didn't expect that, and I can't see any builtin rules for Int that would have no Word counterpart. Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here): main = print $ foldl' (+) 0 [1..100000000::T] What am I missing here? Also, in the real code I ended up seeing things like this in the -ddump-simpl output for the bit-fiddling code: GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ)) Is that likely to cost me a lot or are these conversions cheap? Claus

claus.reinke:
Looking at prelude/PrelRules.hs has reminded me of an old conundrum: if I switch from Int to Word, should I expect any performance differences?
A while ago, I needed lots of fairly small positive numbers, together with a small number of flags for each, so I thought I'd switch from Int to Word, and map the flags to bits. But the performance dropped so drastically that I went back to Int, slightly complicating the bitmaps. I didn't expect that, and I can't see any builtin rules for Int that would have no Word counterpart.
Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here):
main = print $ foldl' (+) 0 [1..100000000::T]
What am I missing here?
Also, in the real code I ended up seeing things like this in the -ddump-simpl output for the bit-fiddling code:
GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ))
Is that likely to cost me a lot or are these conversions cheap?
Those guys are no-ops, and in general you should never see a performance difference. If you do, it is a bug! There are some known cases where rules are missing however: * Conversions to Int from Double for ceil, floor, round are missing rules. http://hackage.haskell.org/trac/ghc/ticket/2271 * gcd only has a specialised implementation for Int, http://hackage.haskell.org/trac/ghc/ticket/2270 Some others I'm aware of are product/sum/maximum/minimum on lists have specialisations for some atomic types (Int, Integer) but not all (needs a ticket for this too). I'm not aware of any remaining "theoretically noop" conversions that aren't in fact implemented noops now. If you find them, please open a ticket. -- Don

Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here):
main = print $ foldl' (+) 0 [1..100000000::T] .. GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ))
Is that likely to cost me a lot or are these conversions cheap?
Those guys are no-ops, and in general you should never see a performance difference. If you do, it is a bug! There are some known cases where rules are missing however:
Thanks, that is one thing less to worry about. Btw, is there a "guide to reading Core" somewhere, with emphasis on performance aspects (what to look for when optimizing time or space usage, what to ignore, how to make it more readable, etc)? Until I stumbled over CORE annotations, I found it near impossible even to find the pieces of interest for non-trivial programs, things like -dsuppress-uniques help a little with diffs, some things look big but are noops, etc. - that kind of helpful pragmatic knowledge (why does it look as if source variable names aren't always preserved; why does it use random uniques instead of de Bruijn-style disambiguation, which wouldn't interfere with diffs and would have static semantic content; why do the outputs look different for core2core vs dump-simpl, ..).
Some others I'm aware of are product/sum/maximum/minimum on lists have specialisations for some atomic types (Int, Integer) but not all (needs a ticket for this too).
A quick grep shows almost no specialization at all for Word, or for
IntXX/WordXX (see below). Still, none of that seems to explain the
example repeated at the top of this message.
Claus
$ find libraries/ -name _darcs -prune -o -name *hs | xargs grep SPECIAL | grep '\

Claus Reinke wrote:
Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here):
main = print $ foldl' (+) 0 [1..100000000::T] ..
A quick grep shows almost no specialization at all for Word, or for IntXX/WordXX (see below). Still, none of that seems to explain the example repeated at the top of this message.
The Enum instance for Int uses specialised implementations of enumFromTo and friends, whereas the Word version uses the generic integralEnumFromTo. Cheers, Simon

marlowsd:
Claus Reinke wrote:
Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here):
main = print $ foldl' (+) 0 [1..100000000::T] ..
A quick grep shows almost no specialization at all for Word, or for IntXX/WordXX (see below). Still, none of that seems to explain the example repeated at the top of this message.
The Enum instance for Int uses specialised implementations of enumFromTo and friends, whereas the Word version uses the generic integralEnumFromTo.
Another good reason to use uvector, import Data.Array.Vector import Data.Word main = print $ foldlU (+) (0::T) (enumFromToU 1 (100000000::T)) type T = Word $wfold :: Word# -> Word# -> Word# $wfold = \ (ww_s1cg :: Word#) (ww1_s1ck :: Word#) -> case gtWord# ww1_s1ck __word 100000000 of wild_a19p { False -> $wfold (plusWord# ww_s1cg ww1_s1ck) (plusWord# ww1_s1ck __word 1); True -> ww_s1cg Yields: Main_zdwfold_info: .Lc1e1: cmpq $100000000,%rdi ja .Lc1e4 leaq 1(%rdi),%rax addq %rdi,%rsi movq %rax,%rdi jmp Main_zdwfold_info While at type T = Int We get: $wfold :: Int# -> Int# -> Int# $wfold = \ (ww_s144 :: Int#) (ww1_s148 :: Int#) -> case ># ww1_s148 100000000 of wild_a11q { False -> $wfold (+# ww_s144 ww1_s148) (+# ww1_s148 1); True -> ww_s144 And *identical assembly* Main_zdwfold_info: .Lc15E: cmpq $100000000,%rdi jg .Lc15H leaq 1(%rdi),%rax addq %rdi,%rsi movq %rax,%rdi jmp Main_zdwfold_info -- Don

| Until I stumbled over CORE annotations, I found it near impossible even | to find the pieces of interest for non-trivial programs, things like | -dsuppress-uniques help a little with diffs, some things look big but | are noops, etc. - that kind of helpful pragmatic knowledge (why does | it look as if source variable names aren't always preserved; why does | it use random uniques instead of de Bruijn-style disambiguation, which | wouldn't interfere with diffs and would have static semantic content; | why do the outputs look different for core2core vs dump-simpl, ..). Many of these things might be fixable if someone thought about the specification carefully. The current core pretty-printer was initially designed only for GHC internals hackers, rather than Joe User. A first step might be for a posse of Joe Users to specify what they want, as precisely as possible. Then this same posse might even write a Core pretty-printer to achieve it. I am happy to advise. Then we could substitute new for old. | A quick grep shows almost no specialization at all for Word, or for | IntXX/WordXX (see below). Still, none of that seems to explain the | example repeated at the top of this message. We'd be delighted to apply suitable library patches. Simon

| | A quick grep shows almost no specialization at all for Word, or for | | IntXX/WordXX (see below). Still, none of that seems to explain the | | example repeated at the top of this message. | | We'd be delighted to apply suitable library patches. PS: in the case that no one gets around to creating such a patch, creating a ticket that documents the problem and points to the needed specialisations would be a start

| | A quick grep shows almost no specialization at all for Word, or for | | IntXX/WordXX (see below). Still, none of that seems to explain the | | example repeated at the top of this message. | | We'd be delighted to apply suitable library patches.
PS: in the case that no one gets around to creating such a patch, creating a ticket that documents the problem and points to the needed specialisations would be a start
Since more that just SPECIALI[SZ]E pragmas seem involved, I've created a ticket to summarize this thread's findings: http://hackage.haskell.org/trac/ghc/ticket/3055 Claus

claus.reinke:
Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here):
main = print $ foldl' (+) 0 [1..100000000::T] .. GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ))
Is that likely to cost me a lot or are these conversions cheap?
Those guys are no-ops, and in general you should never see a performance difference. If you do, it is a bug! There are some known cases where rules are missing however:
Thanks, that is one thing less to worry about. Btw, is there a "guide to reading Core" somewhere, with emphasis on performance aspects (what to look for when optimizing time or space usage, what to ignore, how to make it more readable, etc)?
Until I stumbled over CORE annotations, I found it near impossible even to find the pieces of interest for non-trivial programs, things like -dsuppress-uniques help a little with diffs, some things look big but are noops, etc. - that kind of helpful pragmatic knowledge (why does it look as if source variable names aren't always preserved; why does it use random uniques instead of de Bruijn-style disambiguation, which wouldn't interfere with diffs and would have static semantic content; why do the outputs look different for core2core vs dump-simpl, ..).
I use the ghc-core tool: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/ghc-core

claus.reinke:
Here is a trivial example with drastic difference between T = Int and T = Word (~2.5x here):
main = print $ foldl' (+) 0 [1..100000000::T] .. GHC.Prim.word2Int# (GHC.Prim.and# (GHC.Prim.int2Word# wild13_XbE) (GHC.Prim.int2Word# y#_a4EZ))
Is that likely to cost me a lot or are these conversions cheap?
Those guys are no-ops, and in general you should never see a performance difference. If you do, it is a bug! There are some known cases where rules are missing however:
Thanks, that is one thing less to worry about. Btw, is there a "guide to reading Core" somewhere, with emphasis on performance aspects (what to look for when optimizing time or space usage, what to ignore, how to make it more readable, etc)?
Until I stumbled over CORE annotations, I found it near impossible even to find the pieces of interest for non-trivial programs, things like -dsuppress-uniques help a little with diffs, some things look big but are noops, etc. - that kind of helpful pragmatic knowledge (why does it look as if source variable names aren't always preserved; why does it use random uniques instead of de Bruijn-style disambiguation, which wouldn't interfere with diffs and would have static semantic content; why do the outputs look different for core2core vs dump-simpl, ..).
Some others I'm aware of are product/sum/maximum/minimum on lists have specialisations for some atomic types (Int, Integer) but not all (needs a ticket for this too).
A quick grep shows almost no specialization at all for Word, or for IntXX/WordXX (see below). Still, none of that seems to explain the example repeated at the top of this message.
We do need to decide on if we want to add specializations for all atomic types in general, and if so, then let'd do that intentionally. Does anyone see a reason not to do it in the libraries, via rules?

"Claus Reinke"
A while ago, I needed lots of fairly small positive numbers, together with a small number of flags for each, so I thought I'd switch from Int to Word, and map the flags to bits.
Since there are few guarantees about the size of a Word (or Int), surely it would be better to choose a definitely sized basic type, e.g. Word8 or Word16? I vaguely recall that ghc used to generate better code for definitely sized WordN than the generic unguaranteed-size Word. Regards, Malcolm

A while ago, I needed lots of fairly small positive numbers, together with a small number of flags for each, so I thought I'd switch from Int to Word, and map the flags to bits.
Since there are few guarantees about the size of a Word (or Int), surely it would be better to choose a definitely sized basic type, e.g. Word8 or Word16?
Good point in principle, and I would indeed prefer a specific size. Unfortunately, I found just the opposite of this
I vaguely recall that ghc used to generate better code for definitely sized WordN than the generic unguaranteed-size Word.
to be true (although I don't recall whether I checked with IntN or WordN, and I don't have a small example for this issue): Even just replacing Int with Int32 on a system where that should be the same was liable to reduce performance.. Given Don's point about SPECIALI[ZS]E, and the lack of specialisations for IntN/WordN, that might explain it? Claus PS. perhaps on newer 64 bit machines, explicitly selecting 32 bits can offer savings? I'd certainly expect selecting 64 bits on a 32 bit machine to lead to slowdowns.

Claus Reinke wrote:
PS. perhaps on newer 64 bit machines, explicitly selecting 32 bits can offer savings? I'd certainly expect selecting 64 bits on a 32 bit machine to lead to slowdowns.
Unlikely, I'd have thought. We implement all the explicitly sized integral types by zero-extending or sign-extending to the size of an Int/Word. That's why there are no primitive types or operations for Word8#, Word16#, etc. Cheers, Simon
participants (5)
-
Claus Reinke
-
Don Stewart
-
Malcolm Wallace
-
Simon Marlow
-
Simon Peyton-Jones