Re: [Haskell-cafe] ST Vector / STRef -- Performances and allocation

19 Jun 2016

      Thank you for this answer.

On Sun, Jun 19, 2016 at 1:08 AM, Amos Robinson  wrote:
...
Hi,
You might want to add inline pragmas to "inc" and so on.
STRef is boxed, so it will allocate. The 'URef' type here might be useful:
http://hackage.haskell.org/package/mutable-containers-0.3.2.1/docs/Data-Muta...
It is just a wrapper around unboxed mutable vector, as Carter suggests.
I added INLINE pragma on all my functions and it does not change
anything unfortunately.

I tried the package mutable-containers and the Api is really great,
but I'm surprised by the results :

- PRef (Unboxed bytearray of size 1, the closest thing to an unboxed
stack allocation) : 86ms
- URef (Unboxed vector of size 1) : 130ms
- SRef (Storable vector of size 1) : 43ms (This is an improvement !)
- BRef (Boxed ref) : 137ms
- STRef : 54ms (this was my baseline)

This is based on the creation of the Ref using a "mutable" function such as :

mutable v = asF <$> newRef v

(with asF == asPRef, asURef, ...)

However, if I'm providing a type signature for mutable, I got totally
different results :

mutable :: forall (f :: * -> *) a.
                 (Prim a, PrimMonad f) =>
                 a -> f (MyRef (PrimState f) a)
mutable v = asF <$> newRef v

(with type MyRef = PRef, URef, ...)

- PRef: 39ms (huge improvement, and winner as expected)
- URef: 45ms
- SRef: 43ms
- BRef: 84ms
- STRef: 54ms

I don't understand the speed difference which is only impacted by the
availability of a signature for mutable.

Now much of the allocation due to STRef have disappears. There is
still some which annoys me.

Most of the allocations comes from that line :

https://github.com/guibou/inversionsCount/blob/syntax/inversions.hs#L35

Which is nothing more that a boolean expression between different ST
action which are reading from unboxed Vector or PRef. So I was
expecting without allocation. Perhaps the fact that it uses "high
level" combinators such as "<$>", "<*>" and "=<<" is responsible ?
...
The sad story is that in my experience, if you really want decent
performance you will have to dump the Core and inspect it by hand. You'd
then add inline and bang patterns based on the Core. I do not recommend it
as a relaxing weekend exercise. I usually use an invocation something like:
...
ghc -ddump-prep -dppr-case-as-let -dsuppress-all -fforce-recomp
inversions.hs > inversions.hscore
Thank you ;) For now I rather prefer reading intel x86 assembler than
GHC core, but today is a rainy day, so I'll use it to start my
understanding of the GHC core ;)

-- 
G.

Re: [Haskell-cafe] ST Vector / STRef -- Performances and allocation

Guillaume Bouchard