performance issues in simple arithmetic code

28 Apr 2011

      Hi,

While trying to figure out why some of my code is very slow I have found 
that it is something related to division.
Digging a bit deeper I found an example which shows some unexpected 
magic and a lack of the expected one.
Before raising any tickets in trac I would like to consult with you 
regarding what I see. Maybe I am misunderstanding the way GHC is 
supposed to work.

-------------------

    module Test where

    import Data.Int
    import GHC.Exts
    import GHC.Prim

    foo :: Int -> Int
    foo a =
       let
         b = a `quot` 1111
         c = b `quot` 1113
         d = c `quot` 1117
       in d

    bar :: Int -> Int
    bar !a' =
       let
         !(I# a) = a'
         !(b) = quotInt# a 1111#
         !(c) = quotInt# b 1113#
         !(d) = quotInt# c 1117#
       in I# d

-------------------

Here 'foo' is a function written in a common way and 'bar' is 
essentially identical one, written in a low-level style.
* My understanding is that these functions are equivalent in terms of 
what they do. The only difference is in the code being generated.

Unexpected magic is in the Core dump:
-------------------

    Test.$wfoo =
       \ (ww_sxw :: GHC.Prim.Int#) ->
         case ww_sxw of wild1_ax0 {
           __DEFAULT ->
             case GHC.Prim.quotInt# wild1_ax0 1111 of wild2_Xxc {
               __DEFAULT ->
                 case GHC.Prim.quotInt# wild2_Xxc 1113 of wild3_Xxt {
                   __DEFAULT -> GHC.Prim.quotInt# wild3_Xxt 1117;
                   (-9223372036854775808) -> (-8257271295304186)
                 };
               (-9223372036854775808) -> (-7418931981405)
             };
           (-9223372036854775808) -> (-6677706553)
         }

    Test.bar =
       \ (a'_ah5 :: GHC.Types.Int) ->
         case a'_ah5 of _ { GHC.Types.I# ipv_ste ->
         GHC.Types.I#
           (GHC.Prim.quotInt#
              (GHC.Prim.quotInt# (GHC.Prim.quotInt# ipv_ste 1111) 1113)
    1117)
         }

-------------------
Question 1: what is the meaning of those magic numbers 
-9223372036854775808, -6677706553, -7418931981405, -8257271295304186?
Question 2: under which circumstances those strange branches of 
execution will be used and what those results would mean?
Question 3: why is the Core for 'foo' so different to 'bar'?

The lack of expected magic is in the assembler code:
-------------------

         addq $16,%r12
         cmpq 144(%r13),%r12
         ja .Lcz1
         movl $1117,%ecx
         movl $1113,%r10d
         movl $1111,%r11d
         movq 7(%rbx),%rax
         cqto
         idivq %r11
         cqto
         idivq %r10
         cqto
         idivq %rcx
         movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
         movq %rax,0(%r12)
         leaq -7(%r12),%rbx
         addq $8,%rbp
         jmp *0(%rbp)

-------------------
Question: can't it use cheap multiplication and shift instead of 
expensive division here? I know that such optimisation is implemented at 
least to some extent for C--. I suppose it also won't do anything smart 
for expressions like a*4 or a/4 for the same reason.

With kind regards,
Denys Rtveliashvili