
Thank you for this answer.
On Sun, Jun 19, 2016 at 1:08 AM, Amos Robinson
Hi, You might want to add inline pragmas to "inc" and so on. STRef is boxed, so it will allocate. The 'URef' type here might be useful: http://hackage.haskell.org/package/mutable-containers-0.3.2.1/docs/Data-Muta... It is just a wrapper around unboxed mutable vector, as Carter suggests.
I added INLINE pragma on all my functions and it does not change anything unfortunately. I tried the package mutable-containers and the Api is really great, but I'm surprised by the results : - PRef (Unboxed bytearray of size 1, the closest thing to an unboxed stack allocation) : 86ms - URef (Unboxed vector of size 1) : 130ms - SRef (Storable vector of size 1) : 43ms (This is an improvement !) - BRef (Boxed ref) : 137ms - STRef : 54ms (this was my baseline) This is based on the creation of the Ref using a "mutable" function such as : mutable v = asF <$> newRef v (with asF == asPRef, asURef, ...) However, if I'm providing a type signature for mutable, I got totally different results : mutable :: forall (f :: * -> *) a. (Prim a, PrimMonad f) => a -> f (MyRef (PrimState f) a) mutable v = asF <$> newRef v (with type MyRef = PRef, URef, ...) - PRef: 39ms (huge improvement, and winner as expected) - URef: 45ms - SRef: 43ms - BRef: 84ms - STRef: 54ms I don't understand the speed difference which is only impacted by the availability of a signature for mutable. Now much of the allocation due to STRef have disappears. There is still some which annoys me. Most of the allocations comes from that line : https://github.com/guibou/inversionsCount/blob/syntax/inversions.hs#L35 Which is nothing more that a boolean expression between different ST action which are reading from unboxed Vector or PRef. So I was expecting without allocation. Perhaps the fact that it uses "high level" combinators such as "<$>", "<*>" and "=<<" is responsible ?
The sad story is that in my experience, if you really want decent performance you will have to dump the Core and inspect it by hand. You'd then add inline and bang patterns based on the Core. I do not recommend it as a relaxing weekend exercise. I usually use an invocation something like:
ghc -ddump-prep -dppr-case-as-let -dsuppress-all -fforce-recomp inversions.hs > inversions.hscore
Thank you ;) For now I rather prefer reading intel x86 assembler than GHC core, but today is a rainy day, so I'll use it to start my understanding of the GHC core ;) -- G.