let x = x in x (GHC.Prim)

Hello Haskellers, I'm trying to write a faster popCount function for x86 systems. I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped. In particular, popCnt64# was implemented in GHC.Prim as: popCnt64# = let x = x in x Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value. My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect. Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way? Cheers, -John

Hi, You can also test your primop with a foreign primop. See here for an example of assembly code (calling cpuid): https://github.com/hsyl20/ViperVM/blob/master/src/lib/ViperVM/Arch/X86_64/cp... And here for the Haskell part with the foreign primop that calls the assembly code: https://github.com/hsyl20/ViperVM/blob/master/src/lib/ViperVM/Arch/X86_64/Cp... Cheers, Sylvain On 23/03/2016 02:07, rahulmutt@gmail.com wrote:
Hi John,
ghc-prim is just a stub package generated for the purpose of documentation. All primops are defined in a low level language called Cmm in GHC. If you want to make it even faster, you'll need to learn Cmm and update the definition in GHC. If you want to a specialized implementation for x86 systems, you may need to modify the NCG (Native Code Generator) which requires a knowledge of assembly language.
Hope that helps! Rahul Muttineni
Sent from my BlackBerry 10 smartphone. *From: *John Ky *Sent: *Wednesday 23 March 2016 4:40 AM *To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Reply To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Subject: *[Haskell-beginners] let x = x in x (GHC.Prim)
Hello Haskellers,
I'm trying to write a faster popCount function for x86 systems.
I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped.
In particular, popCnt64# was implemented in GHC.Prim as:
popCnt64# = let x = x in x
Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value.
My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect.
Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way?
Cheers,
-John
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

Hi John,
popCnt64# = let x = x in x
It represents "bottom, or undefined, or infinite loop" [1].
(Almost primitives can't be represented with Haskell language.)
In your case, It's good to use FFI call [2] like as Sylvain's nice code.
If you dig GHC compiler's primitives, followings [3][4] may be useful.
[1] https://www.fpcomplete.com/blog/2015/02/primitive-haskell
[2]
https://downloads.haskell.org/~ghc/master/users-guide/ffi-chap.html#primitiv...
[3] https://ghc.haskell.org/trac/ghc/wiki/Commentary/PrimOps
[4] http://www.well-typed.com/blog/2014/06/understanding-the-realworld/
Cheers,
Takenobu
2016-03-23 10:10 GMT+09:00 Sylvain Henry
Hi,
You can also test your primop with a foreign primop.
See here for an example of assembly code (calling cpuid):
https://github.com/hsyl20/ViperVM/blob/master/src/lib/ViperVM/Arch/X86_64/cp...
And here for the Haskell part with the foreign primop that calls the assembly code:
https://github.com/hsyl20/ViperVM/blob/master/src/lib/ViperVM/Arch/X86_64/Cp...
Cheers, Sylvain
On 23/03/2016 02:07, rahulmutt@gmail.com wrote:
Hi John,
ghc-prim is just a stub package generated for the purpose of documentation. All primops are defined in a low level language called Cmm in GHC. If you want to make it even faster, you'll need to learn Cmm and update the definition in GHC. If you want to a specialized implementation for x86 systems, you may need to modify the NCG (Native Code Generator) which requires a knowledge of assembly language.
Hope that helps! Rahul Muttineni
Sent from my BlackBerry 10 smartphone. *From: *John Ky *Sent: *Wednesday 23 March 2016 4:40 AM *To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Reply To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Subject: *[Haskell-beginners] let x = x in x (GHC.Prim)
Hello Haskellers,
I'm trying to write a faster popCount function for x86 systems.
I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped.
In particular, popCnt64# was implemented in GHC.Prim as:
popCnt64# = let x = x in x
Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value.
My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect.
Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way?
Cheers,
-John
_______________________________________________ Beginners mailing listBeginners@haskell.orghttp://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

Hi Rahul,
I see mention of the popcount instruction in nativeGen/X86/CodeGen.hs. In
particular it looks like it only activates if sse4_2 is activated? Maybe
all I need to do is activate this somehow?
Cheers,
-John
On Wed, 23 Mar 2016 at 12:07
Hi John,
ghc-prim is just a stub package generated for the purpose of documentation. All primops are defined in a low level language called Cmm in GHC. If you want to make it even faster, you'll need to learn Cmm and update the definition in GHC. If you want to a specialized implementation for x86 systems, you may need to modify the NCG (Native Code Generator) which requires a knowledge of assembly language.
Hope that helps! Rahul Muttineni
Sent from my BlackBerry 10 smartphone. *From: *John Ky *Sent: *Wednesday 23 March 2016 4:40 AM *To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Reply To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Subject: *[Haskell-beginners] let x = x in x (GHC.Prim)
Hello Haskellers,
I'm trying to write a faster popCount function for x86 systems.
I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped.
In particular, popCnt64# was implemented in GHC.Prim as:
popCnt64# = let x = x in x
Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value.
My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect.
Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way?
Cheers,
-John
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

Hi Rahul,
The reason I thought that maybe GHC didn't call the CPU instruction was due
to my benchmarking, which showed that the built-in popCount was slower than
the pure Haskell Broadword implementation:
main = defaultMain
[ env setupEnv (\bv -> bgroup "Rank"
[ bench "PopCnt1 Broadword - Once" (nf (map (\n -> getCount
(PC1BW.popCount1 (DVS.take n bv)))) [0, 1000..100000])
, bench "PopCnt1 GHC - Once" (nf (map (\n -> getCount
(PC1GHC.popCount1 (DVS.take n bv)))) [0, 1000..100000])
] )
]
Benchmarking results follow:
benchmarking Rank/PopCnt1 Broadword - Once
time 18.49 ms (17.89 ms .. 19.08 ms)
0.994 R² (0.989 R² .. 0.998 R²)
mean 19.62 ms (19.22 ms .. 20.04 ms)
std dev 1.026 ms (846.2 μs .. 1.315 ms)
variance introduced by outliers: 21% (moderately inflated)
benchmarking Rank/PopCnt1 GHC - Once
time 36.80 ms (36.25 ms .. 37.57 ms)
0.998 R² (0.996 R² .. 1.000 R²)
mean 37.14 ms (36.72 ms .. 37.97 ms)
std dev 1.100 ms (650.6 μs .. 1.680 ms)
This makes the built-in nearly two times slower than the pure Haskell.
That's how I got into the ghc-prim code.
This is the broadword implementation I was using:
popCount1 x0 = ((x3 * 0x0101010101010101) .>. 56)
where
x1 = x0 - ((x0 .&. 0xaaaaaaaaaaaaaaaa) .>. 1)
x2 = (x1 .&. 0x3333333333333333) + ((x1 .>. 2) .&. 0x3333333333333333)
x3 = (x2 + (x2 .>. 4)) .&. 0x0f0f0f0f0f0f0f0f
Maybe all I need to do is compile with some flags or switch the backend or
something? Any ideas?
Cheers,
-John
On Thu, 24 Mar 2016 at 08:28 John Ky
Hi Rahul,
I see mention of the popcount instruction in nativeGen/X86/CodeGen.hs. In particular it looks like it only activates if sse4_2 is activated? Maybe all I need to do is activate this somehow?
Cheers,
-John
On Wed, 23 Mar 2016 at 12:07
wrote: Hi John,
ghc-prim is just a stub package generated for the purpose of documentation. All primops are defined in a low level language called Cmm in GHC. If you want to make it even faster, you'll need to learn Cmm and update the definition in GHC. If you want to a specialized implementation for x86 systems, you may need to modify the NCG (Native Code Generator) which requires a knowledge of assembly language.
Hope that helps! Rahul Muttineni
Sent from my BlackBerry 10 smartphone. *From: *John Ky *Sent: *Wednesday 23 March 2016 4:40 AM *To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Reply To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Subject: *[Haskell-beginners] let x = x in x (GHC.Prim)
Hello Haskellers,
I'm trying to write a faster popCount function for x86 systems.
I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped.
In particular, popCnt64# was implemented in GHC.Prim as:
popCnt64# = let x = x in x
Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value.
My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect.
Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way?
Cheers,
-John
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

OMG. Yes!
I just needed to add the -msse4.2 flag to GHC:
benchmarking Rank/PopCnt1 Broadword - Once
time 18.45 ms (18.25 ms .. 18.67 ms)
0.999 R² (0.997 R² .. 0.999 R²)
mean 18.19 ms (17.99 ms .. 18.38 ms)
std dev 508.6 μs (398.6 μs .. 623.8 μs)
benchmarking Rank/PopCnt1 GHC - Once
time 11.82 ms (11.65 ms .. 11.97 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 11.85 ms (11.74 ms .. 11.96 ms)
std dev 283.5 μs (229.6 μs .. 362.8 μs)
Thanks so much for your help!
Cheers,
-John
On Thu, 24 Mar 2016 at 08:36 John Ky
Hi Rahul,
The reason I thought that maybe GHC didn't call the CPU instruction was due to my benchmarking, which showed that the built-in popCount was slower than the pure Haskell Broadword implementation:
main = defaultMain [ env setupEnv (\bv -> bgroup "Rank" [ bench "PopCnt1 Broadword - Once" (nf (map (\n -> getCount (PC1BW.popCount1 (DVS.take n bv)))) [0, 1000..100000]) , bench "PopCnt1 GHC - Once" (nf (map (\n -> getCount (PC1GHC.popCount1 (DVS.take n bv)))) [0, 1000..100000]) ] ) ]
Benchmarking results follow:
benchmarking Rank/PopCnt1 Broadword - Once time 18.49 ms (17.89 ms .. 19.08 ms) 0.994 R² (0.989 R² .. 0.998 R²) mean 19.62 ms (19.22 ms .. 20.04 ms) std dev 1.026 ms (846.2 μs .. 1.315 ms) variance introduced by outliers: 21% (moderately inflated)
benchmarking Rank/PopCnt1 GHC - Once time 36.80 ms (36.25 ms .. 37.57 ms) 0.998 R² (0.996 R² .. 1.000 R²) mean 37.14 ms (36.72 ms .. 37.97 ms) std dev 1.100 ms (650.6 μs .. 1.680 ms)
This makes the built-in nearly two times slower than the pure Haskell. That's how I got into the ghc-prim code.
This is the broadword implementation I was using:
popCount1 x0 = ((x3 * 0x0101010101010101) .>. 56) where x1 = x0 - ((x0 .&. 0xaaaaaaaaaaaaaaaa) .>. 1) x2 = (x1 .&. 0x3333333333333333) + ((x1 .>. 2) .&. 0x3333333333333333) x3 = (x2 + (x2 .>. 4)) .&. 0x0f0f0f0f0f0f0f0f
Maybe all I need to do is compile with some flags or switch the backend or something? Any ideas?
Cheers,
-John
On Thu, 24 Mar 2016 at 08:28 John Ky
wrote: Hi Rahul,
I see mention of the popcount instruction in nativeGen/X86/CodeGen.hs. In particular it looks like it only activates if sse4_2 is activated? Maybe all I need to do is activate this somehow?
Cheers,
-John
On Wed, 23 Mar 2016 at 12:07
wrote: Hi John,
ghc-prim is just a stub package generated for the purpose of documentation. All primops are defined in a low level language called Cmm in GHC. If you want to make it even faster, you'll need to learn Cmm and update the definition in GHC. If you want to a specialized implementation for x86 systems, you may need to modify the NCG (Native Code Generator) which requires a knowledge of assembly language.
Hope that helps! Rahul Muttineni
Sent from my BlackBerry 10 smartphone. *From: *John Ky *Sent: *Wednesday 23 March 2016 4:40 AM *To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Reply To: *The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell *Subject: *[Haskell-beginners] let x = x in x (GHC.Prim)
Hello Haskellers,
I'm trying to write a faster popCount function for x86 systems.
I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped.
In particular, popCnt64# was implemented in GHC.Prim as:
popCnt64# = let x = x in x
Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value.
My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect.
Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way?
Cheers,
-John
_______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
participants (4)
-
John Ky
-
rahulmutt@gmail.com
-
Sylvain Henry
-
Takenobu Tani