Haskell translation and transformation

Dear Café, I've tried to make my code more compact and faster in runtime and I tried to use primitive types, e.g. Int#. To omit all influences, I ended with Ackermann function: acker :: Int# -> Int# -> Int# acker 0# n = n +# 1# acker m 0# = acker (m -# 1#) 1# acker m n = acker (m -# 1#) (acker m (n -# 1#)) I was quite surprised, that the result was the same as for function without primitive types when it was compiled with -XStrict option. Both memory and time consumption was the same (small difference under 0.5%). Of course, -O2 option was used. :-) I compared with the same C implementation: int acker(int m, int n) { if (m==0) return n+1; if (n==0) return acker(m-1, 1); return acker(m-1, acker(m,n-1)); } with -O2 option used for gcc, the C code is 7 times faster, for no -O options provided both for ghc and gcc, the C code is also 7 times faster. Thus, no difference. My question is: does the ghc use primitive types automatically when possible? Otherwise, I cannot explain the same times... Or, to my big surprise, using primitives does not save memory and computation time, really? And the other question is about reasoning during translation and code generation, what is the reason the code is so slow? I would guess that forcing primitive types and strict evaluation would produce a code with comparable time to C code... The difference seems to be like the one between compiled code to executable and to low level virtual machine code, which is interpreted then. Versions I used: ghc: 8.10.3 gcc: 10.2.0 Dušan P.S. I definitely don't want to make someone upset with the content, I'm simply wondering... D.

My question is: does the ghc use primitive types automatically when possible? Otherwise, I cannot explain the same times... Or, to my big surprise, using primitives does not save memory and computation time, really?
GHC mainly uses inlining to transform high level code into code that uses primitive types. Given a piece of code like: acker :: Int -> Int -> Int acker 0 n = n + 1 acker m 0 = acker (m - 1) 1 acker m n = acker (m - 1) (acker m (n - 1)) GHC will inline (+) and (-). The definition of these functions is: (I# x) + (I# y) = I# (x +# y) (I# x) - (I# y) = I# (x -# y) So it transforms to something like: acker 0 (I# n) = I# (n +# 1#) acker (I# m) 0 = acker (I# (m -# 1#)) 1 acker (I# m) (I# n) = acker (I# (m -# 1#)) (acker (I# m) (I# (n -# 1#))) And what I think is called the worker wrapper transformation can transform it into something like: acker (I# m) (I# n) = acker# m n acker# 0# n = n +# 1# acker# m 0# = acker# (m -# 1#) 1# acker# m n = acker# (m -# 1#) (acker# m (n -# 1#)) Which completely eliminates the boxing and unboxing (the I#s) in the tight loop.
And the other question is about reasoning during translation and code generation, what is the reason the code is so slow? I would guess that forcing primitive types and strict evaluation would produce a code with comparable time to C code... The difference seems to be like the one between compiled code to executable and to low level virtual machine code, which is interpreted then.
I think it is mainly that in small tight loops like your ackermann function, low level optimizations become much more important. On godbolt you can easily compare the produced assembly: Haskell https://godbolt.org/z/nejqWsq9z (acker is Main_$wacker_info) C https://godbolt.org/z/WKW7vcvfb (GCC 9.3 is easier to read than 10.2) The main hot code path blocks like: c4pc_info: movq 8(%rbp),%rax decq %rax addq $16,%rbp movq %rbx,%rsi movq %rax,%r14 And movq $c4pc_info,-16(%rbp) decq %rsi movq %r14,%rax movq %rax,-8(%rbp) addq $-16,%rbp jmp Main_$wacker_info Just seem very awkward when compared to to the GCC assembly. But I'm not knowledgeable enough about GHC's internals to know why this awkward code is generated. Cheers, Jaro
participants (2)
-
Dušan Kolář
-
Jaro Reinders