
Hello. I read a reddit post [0] which ends in a debate about haskell performances. Someone gives a C implementation which runs in 15ms on some dataset. I wanted to learn about mutable vector and ST and got something in haskell which runs in 50ms. Code is here [1]. [0] https://www.reddit.com/r/haskell/comments/4ojsn0/counting_inversions_haskell... [1] https://github.com/guibou/inversionsCount/blob/syntax/inversions.hs When I'm profiling the code, I'm surprised to see more than 50% of the %alloc in function which, from my point of view, must not allocate. For example, this function : inc :: forall s. STRef s Int -> ST s () inc v = modifySTRef' v (+1) Is responsible for 10% of the allocation in this program. I was hoping that the "unboxing magic" of GHC may have replaced my Int somewhere by an unboxed one, but I guess the allocation means that the STRef is somehow pointing to a heap allocated Int. Do you know a way to remove theses allocations? As a second question, if you see any other way to improve this implementation, I'll be happy to learn something new ;) Finally, if you read my code, you'll see that I implemented a small DSL around ST / STRef / MVector to make my main function more readable. How do people write readable code in real life ? Especially, I was annoyed when I tried to convert this piece of C code : if (x1 < mid && (x2 == sz || tmp[x1] <= p[x2])) { a; } else { b; } In my haskell code, x1, x2 are STRef, and tmp and p are MVector. This was highly painful to write in haskell and I had to write something such as: x1 <- readSTRef x1' x2 <- readSTRef x2' cond <- if x1 < mid then if x2 == sz then return True else do tmp_x1 <- Vector.read tmp x1 p_x2 <_ Vector.read p x2 return (tmp_x1 <= p_x2) else return False if cond then a else b This is painful, complex and error prone, so is there another solution ? Thank you.