
I had a quick look at the code for
loop :: Int64 -> Int64 -> Int64
loop i r = if i == 0 then r else loop (i-1) (r+1)
It's quite bad. It's full of C calls.
It would be much better to do what gcc does and treat Int64 as a
primitive type, and just insert C calls for the tricky operations,
like division.
On Thu, Jan 29, 2009 at 3:17 AM, Duncan Coutts
On Wed, 2009-01-28 at 20:42 -0500, Ross Mellgren wrote:
Very possibly -- I'm on a mac so no prebuilt 64-bit binary. I'm not good enough at reading core to tell, but I can tell from the core that it's calling out to external C functions to do the 64-bit math.
Right, that'll make it really slow but does not explain the allocations.
It could be that it's crossing over from machine register size to managed heap object and so without additional help on 32-bit it wants to allocate thunks.
If it's using Int64 then there's no transition, that only happens with Integer (which is always heap allocated anyway).
The sum parameter in the inner loop is an accumulating parameter that is not inspected until the final value is returned. In the case of the simple direct Int64 implementation the strictness analyser does notice that it really is strict so can be evaluated as we go along. I bet that's the source of the problem, that for the indirect Int64 impl used on 32bit machines the strictness analyser does not discover the same property. So that would explain the allocations.
It's worth investigating the indirect Int64 implementation to see if this could be improved.
Does your core indicate that it's making a bunch of external __ccalls?
No, it's all unboxed Int# types and primitive # operations. Lovely. In particular the inner loop is all unboxed types with no allocations.
Duncan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe