
I sat down tonight and did tons of good learning (which was my goal). Yes,
the variable names in the unrolling is a little "ugly" but it helps to read
the C++ version for context. There are two for loops (advN is each inner one
unrolled). the other function names match the C++ version. It was my goal
to implement an unrolled version of that.
Fortunately, my performance is excellent now. It is only 4x slower than the
C++ version and 2x slower than the Haskell one listed (which uses pointer
trickery). I am sure there could be more done but I am at my limit of
comprehension. But if I may guess, I would say that any speed issues now are
related to a lack of in place updating for variables and structures.
I'd also like to thank everyone for their help so far. I have attached my
latest version.
--ryan
On Nov 27, 2007 7:14 PM, Sterling Clover
The first step would be profiling -- i.e. compiling with -prof -auto- all to tag each function as a cost center, then running with +RTS -p to generate a cost profile. The problem here is you've got massive amounts of unrolling done already, so it's sort of hard to figure out what's doing what, and the names you've given the unrolled functions are... less than helpful. (first rule of optimization: optimize later.) The use of tuples shouldn't be a problem per se in terms of performance, but it really hurts readability to lack clear type signatures and types. You'd probably be better off constructing a vector data type as does the current Haskell entry -- and by forcing it to be strict and unboxed (you have nearly no strictness annotations I note -- and recall that $! only evaluates its argument to weak head normal form, which means that you're just checking if the top-level constructor is _|_) you'll probably get better performance to boot. In any case, declaring type aliases for the various units you're using would also help readability quite a bit.
--S
On Nov 27, 2007, at 5:41 PM, Ryan Dickie wrote:
I thought it would be a nice exercise (and a good learning experience) to try and solve the nbody problem from the debian language shootout. Unfortunately, my code sucks. There is a massive space leak and performance is even worse. On the bright side, my implementation is purely functional. My twist: I manually unrolled a few loops from the C++ version.
I believe that most of my performance problems stem from my abuse of tuple. The bodies are passed as a tuple of planets, a planet is a tuple of (position, velocity, mass) and the vectors position and velocity are also tuples of type double. My lame justification for that is to make it nice and handy to pass data around.
Any tips would be greatly appreciated.
--ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe