
Hi Daniel,
Thank you for your analysis. I dropped -prof from the compile and
explicitly specified the index type as Int - both yielded huge
improvements. These are great tips for me to keep in mind as I
continue to get deeper into Haskell.
Thanks,
Chris
On Wed, Feb 24, 2010 at 7:26 PM, Daniel Fischer
Am Donnerstag 25 Februar 2010 03:54:17 schrieb Daniel Fischer:
Am Donnerstag 25 Februar 2010 03:04:02 schrieb Chris Pettitt:
Hello Haskell-Beginners,
I'm having a fair amount of difficulty understanding the ST monad. In my basic experimentation I'm finding that I'm getting more heap allocated with STUArray than with a regular Array, which is not what I would expect.
And ordinarily you don't. The problem is that the STUArray suffers very badly from profiling. Compiling without -prof, the STUArray code allocates about half as much as the UArray code and takes ~42% of the time. With profiling, STUArray allocates ~40% more and takes ~50% longer than UArray.
Oh, and: In the UArray code, you specified the index type as Int, in the STUArray code, you didn't specify it, so it was taken to be Integer, which slows down performance significantly, changing sumInPlace1 to
sumInPlace1 :: [Int] -> Int sumInPlace1 xs = (! 0) . runSTUArray $ do a <- newArray (0 :: Int, 0) 0 forM_ xs $ \x -> do x' <- (readArray a 0) writeArray a 0 (x' + x) return a
makes the STUArray code allocate ~6% of what the UArray code allocates and run in ~8% of the time, because now we get a pretty nice loop involving only unboxed Ints. If we then replace readArray and writeArray with unsafeRead and unsafeWrite, we see that the most time is spent on bounds checking, because now the STUArray loop becomes really compact and runs a good three times faster, so it's now over forty times faster than the UArray (for which using unsafeAt instead of (!) doesn't make a noticeable difference).
Two things: - profiling interacts badly with some optimisations, so what the profiling output says may deviate significantly from what your -O2 compiled production code actually does - some datatypes suffer more from profiling than others, so what the profiling output says for different choices of datatype may deviate significantly from how your -O2 compiled production code behaves
Normally, -prof doesn't change the behaviour very much, but sometimes it does.
One additional point of confusion for me: when I run either function with +RTS -hc and use hp2ps I get an empty graph. I've seen these tools work quite well before, so I suspect I'm doing something wrong now.
Too little actual heap usage and too short running time. Change the upper limit to 10 million and you should get a graph with actual bands.
Thanks, Chris
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners