
I've been dabbling with implementing the "n-body" benchmark over at the "Great Computer Language Shootout"... http://shootout.alioth.debian.org/benchmark.php?test=nbody ...and I've stumbled on to a problem with space leaks. The code below works great for low numbers of iterations, but crashes and burns with large numbers. You can temporarily fix it up by specifying larger stack and heap sizes (with the +RTS -K & -H options), but this only gets you so far. I haven't been able to glean much information from the profiling reports. You can see the reports I generated (like the biography report) over at http://sleepingsquirrel.org/nbody. I've tried several different variations on the "advance" function, as well as trying "seq" in numerous spots, but I haven't stumbled on to the right combination yet. Any pointers? Thanks, Greg Buchholz
-- n-body simulation -- compile: ghc -O2 -o nbody nbody.hs -- run: nbody 100000
import System(getArgs) import Numeric
data Vec = V !Double !Double !Double deriving Show instance Num Vec where (V a b c) + (V x y z) = (V (a+x) (b+y) (c+z)) (V a b c) - (V x y z) = (V (a-x) (b-y) (c-z)) fromInteger 0 = V 0.0 0.0 0.0 -- for sum function instance Eq Vec where (V a b c) == (V x y z) = (a==x) && (b==y) && (c==z)
dot (V a b c) (V x y z) = a*x + b*y + c*z scale (V a b c) n = V (n*a) (n*b) (n*c) mag (V x y z) = sqrt (x*x + y*y + z*z)
data Planet = Planet Vec Vec Double deriving Show --Position Velocity Mass dist (Planet p1 _ _) (Planet p2 _ _) = mag $ p1 - p2 mass (Planet _ _ m) = m vel (Planet _ v _) = v pos (Planet p _ _) = p
main = do [arg] <- getArgs let iter = read arg let n = 5::Int let bodies = offset_momentum n [sun, jupiter, saturn, neptune, uranus] let begin = energy n bodies putStrLn $ showFFloat (Just 9) begin "" let final = (iterate (advance n 0.01) bodies) let end = energy n (final !! iter) putStrLn $ showFFloat (Just 9) end ""
days_per_year = 365.24 solar_mass = 4.0 * pi * pi
advance :: Int -> Double -> [Planet] -> [Planet] advance n dt (p:ps) = take n ((Planet (pos p + delta_x) (new_v) (mass p) ) : advance n dt (ps++[p])) where delta_v = sum (map (\q -> (pos p - pos q) `scale` ((mass q)*dt/(dist p q)^3)) ps) new_v = (vel p) - delta_v delta_x = new_v `scale` dt
energy:: Int -> [Planet] -> Double energy n ps = kinetic - potential where kinetic = 0.5 * (sum (map (\q->(mass q)*((vel q) `dot` (vel q))) ps)) potential = sum [(mass (ps!!i))*(mass (ps!!j))/(dist (ps!!i) (ps!!j)) | i<-[0..n-1], j<-[i+1..n-1]]
offset_momentum n ((Planet p v m):ps) = (Planet p new_v m):ps where new_v = (sum (map (\n->(vel n) `scale` (mass n)) ps)) `scale` ((-1.0)/solar_mass)
{- rotations 0 _ = [] rotations n xss@(x:xs) = flipped:rotations (n-1) flipped where flipped = xs++[x]
advance :: Int -> Double -> [Planet] -> [Planet] advance n dt pss = map (adv dt) (rotations n pss)
adv dt (p:ps) = Planet (pos p + delta_x) (new_v) (mass p) where delta_v = sum (map (\q -> (pos p - pos q) `scale` ((mass q)*dt/(dist p q)^3)) ps) new_v = (vel p) - delta_v delta_x = new_v `scale` dt --} jupiter = (Planet (V 4.84143144246472090e+00 (-1.16032004402742839e+00) (-1.03622044471123109e-01)) (V ( 1.66007664274403694e-03 * days_per_year) ( 7.69901118419740425e-03 * days_per_year) ((-6.90460016972063023e-05) * days_per_year)) (9.54791938424326609e-04 * solar_mass))
saturn = (Planet (V 8.34336671824457987e+00 4.12479856412430479e+00 (-4.03523417114321381e-01)) (V (-2.76742510726862411e-03 * days_per_year) (4.99852801234917238e-03 * days_per_year) (2.30417297573763929e-05 * days_per_year)) (2.85885980666130812e-04 * solar_mass))
uranus = (Planet (V 1.28943695621391310e+01 (-1.51111514016986312e+01) (-2.23307578892655734e-01)) (V (2.96460137564761618e-03 * days_per_year) (2.37847173959480950e-03 * days_per_year) (-2.96589568540237556e-05 * days_per_year)) (4.36624404335156298e-05 * solar_mass))
neptune = (Planet (V 1.53796971148509165e+01 (-2.59193146099879641e+01) 1.79258772950371181e-01) (V (2.68067772490389322e-03 * days_per_year) (1.62824170038242295e-03 * days_per_year) (-9.51592254519715870e-05 * days_per_year)) (5.15138902046611451e-05 * solar_mass))
sun = (Planet (V 0.0 0.0 0.0) (V 0.0 0.0 0.0) solar_mass)

On 5/5/05, Greg Buchholz
I've been dabbling with implementing the "n-body" benchmark over at the "Great Computer Language Shootout"...
http://shootout.alioth.debian.org/benchmark.php?test=nbody
...and I've stumbled on to a problem with space leaks. The code below works great for low numbers of iterations, but crashes and burns with large numbers. You can temporarily fix it up by specifying larger stack and heap sizes (with the +RTS -K & -H options), but this only gets you so far. I haven't been able to glean much information from the profiling reports. You can see the reports I generated (like the biography report) over at http://sleepingsquirrel.org/nbody. I've tried several different variations on the "advance" function, as well as trying "seq" in numerous spots, but I haven't stumbled on to the right combination yet. Any pointers?
I think the thing that really kills you is the way you shuffle around the list in the advance function. Your commented out rotations function has the same problem. Here is my attempt to solve the problem a little more efficiently: \begin{code} update :: (a -> [a] -> a) -> ([a] -> [a]) -> [a] -> [a] update f newlist [] = [] update f newlist (a:as) = a' : update f (newlist . (a':)) as where a' = f a (newlist as) advance dt ps = update newplanet id ps where newplanet p ps = Planet (pos p + delta_x) new_v (mass p) where delta_v = sum (map (\q -> (pos p - pos q) `scale` ((mass q)*dt/(dist p q)^3)) ps) new_v = (vel p) - delta_v delta_x = new_v `scale` dt \end{code} (update is a really bad name for the above function. It's just the first name that came to mind. You can probably come up with something better.) Also, putting strictness annotations in the Planet data type will yield some speedup. HTH, /Josef

Josef Svenningsson wrote:
I think the thing that really kills you is the way you shuffle around the list in the advance function. Your commented out rotations function has the same problem. Here is my attempt to solve the problem a little more efficiently:
We're heading in the right direction anyway. I can now compute 1 million iteration in about 2 minutes (with +RTS -H750M -K100M). Well almost, since it now doesn't compute the right answer, so something must be amiss in the shuffling section. Now if we can get it to us a little less than 1G of memory, we'll be in good shape. Thanks, Greg Buchholz

Have you tried heap profiling? -- Lennart Greg Buchholz wrote:
Josef Svenningsson wrote:
I think the thing that really kills you is the way you shuffle around the list in the advance function. Your commented out rotations function has the same problem. Here is my attempt to solve the problem a little more efficiently:
We're heading in the right direction anyway. I can now compute 1 million iteration in about 2 minutes (with +RTS -H750M -K100M). Well almost, since it now doesn't compute the right answer, so something must be amiss in the shuffling section. Now if we can get it to us a little less than 1G of memory, we'll be in good shape.
Thanks,
Greg Buchholz _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Lennart Augustsson wrote:
Have you tried heap profiling?
Not on the updated version from Josef Svenningsson, but you can look at the heap profiles for the previous version at... http://sleepingsquirrel.org/nbody ...the *.ps files are the ones you want. Greg Buchholz

Am Freitag, 6. Mai 2005 02:24 schrieb Greg Buchholz:
Josef Svenningsson wrote:
I think the thing that really kills you is the way you shuffle around the list in the advance function. Your commented out rotations function has the same problem. Here is my attempt to solve the problem a little more efficiently:
We're heading in the right direction anyway. I can now compute 1 million iteration in about 2 minutes (with +RTS -H750M -K100M). Well almost, since it now doesn't compute the right answer, so something must be amiss in the shuffling section. Now if we can get it to us a little less than 1G of memory, we'll be in good shape.
Thanks,
Greg Buchholz
The problem is that a prime crept in in Josef's code, so to calculate the positions and velocities, the updated versions of the planets are used, it should be update f newlist (a:as) = a' : update f (newlist . (a:)) as where a' = f a (newlist as) instead of update f newlist (a:as) = a' : update f (newlist . (a':)) as where a' = f a (newlist as). Besides, offset_momentum does not use the parameter n at all and I think that all the indexing in the computation of the potential energy is rather inefficient. I rewrote it as myEnergy :: [Planet] -> Double myEnergy pps@(p:ps) = kinetic - (pot pps) where kinetic = 0.5 * (sum $ map (\q -> dot (vel q) (vel q) * mass q) pps) pot = sum . pots pots [] = [] pots (p:ps) = sum [m * (mass q) / (dist p q) | q <- ps]: pots ps where m = mass p, it didn't make it any faster, though. I did not yet space-profile Cheers, Daniel

Daniel Fischer wrote:
The problem is that a prime crept in in Josef's code, so to calculate the positions and velocities, the updated versions of the planets are used, it should be
Ah, yes, thanks for the correction.
Besides, offset_momentum does not use the parameter n at all and I think that all the indexing in the computation of the potential energy is rather inefficient.
I wasn't too worried about the energy function, since it only gets called twice, whereas the "advance" get called potentially millions of times (and must be the source of the space leak). You can see one profile I ran (for n=1000 instead of 1000000) at... http://sleepingsquirrel.org/nbody/nbody.prof Thanks, Greg Buchholz

Greg Buchholz
I wasn't too worried about the energy function, since it only gets called twice, whereas the "advance" get called potentially millions of times (and must be the source of the space leak).
After running a few examples myself, it seems clear that the peak of the profile is very early in the run, then space falls linearly to the end of the computation. A "construction" profile reveals that there is no single type of heap node responsible, rather a mixture of list conses and unevaluated closures. Also, the initial peak seems to grow at least linearly with the number of iterations requested. I conclude that the program is initially (and quickly) building a vast data-structure filled with closures, and subsequently reducing this "latent" computation down to the single result value. The biographical profile confirms that the vast majority of the heap is 'lag': it is created too early before being used. Given this observation, it points to the initial generator of the peak being the iterations of 'advance', and the consumer of the peak would be 'energy'. Your target therefore is to structure the computation's use of data in a more linear fashion - to produce some small portion of the data structure, then partially reduce it, before continuing to lazily produce some more of the data structure. So, here is a suggested improvement - it does more work than the original, and takes more time, but it totally eliminates the space leak. main = do .... let final = forceIterate iter (energy n) (advance 0.01) bodies putStrLn $ showFFloat (Just 9) final "" forceIterate :: Int -> (a->b) -> (a->a) -> a -> b forceIterate 0 reduce f x = reduce x forceIterate n reduce f x = let r = f x in reduce r `seq` forceIterate (n-1) reduce f r Perhaps you can be thus inspired to discover a more minimal way of forcing the necessary parts of intermediate computation than calling 'energy' itself. Regards, Malcolm

Am Freitag, 6. Mai 2005 16:58 schrieb Greg Buchholz:
Daniel Fischer wrote:
The problem is that a prime crept in in Josef's code, so to calculate the positions and velocities, the updated versions of the planets are used, it should be
Ah, yes, thanks for the correction.
Besides, offset_momentum does not use the parameter n at all and I think that all the indexing in the computation of the potential energy is rather inefficient.
I wasn't too worried about the energy function, since it only gets called twice, whereas the "advance" get called potentially millions of
Of course, and for only five bodies...
times (and must be the source of the space leak). You can see one profile I ran (for n=1000 instead of 1000000) at...
I did profiling for n = 20000 and n = 100000, if I read the profiles correctly, Josef's code reduces memory usage by a factor of 2 resp. 3, which ain't bad. Maybe one could reduce space usage further by not creating the list of iterations but instead make it a tail recursive function? I'll try.
http://sleepingsquirrel.org/nbody/nbody.prof
Thanks,
Greg Buchholz
One question: the energy of the system should be constant - that little physics I know. What I don't know is whether the change in energy in this simulation is within usual, reasonable bounds or not. If not, is it due to a too long time interval or should one use a more sophisticated algorithm? Cheers, Daniel

Daniel Fischer wrote:
One question: the energy of the system should be constant - that little physics I know. What I don't know is whether the change in energy in this simulation is within usual, reasonable bounds or not. If not, is it due to a too long time interval or should one use a more sophisticated algorithm?
The change in energy is from round-off error caused by floating point arithmetic. For this particular test, the shootout requires everyone to use the same algorithm. So the error is expected. Greg Buchholz

Am Montag, 9. Mai 2005 05:38 schrieb Greg Buchholz:
Daniel Fischer wrote:
One question: the energy of the system should be constant - that little physics I know. What I don't know is whether the change in energy in this simulation is within usual, reasonable bounds or not. If not, is it due to a too long time interval or should one use a more sophisticated algorithm?
The change in energy is from round-off error caused by floating point arithmetic. For this particular test, the shootout requires everyone to use the same algorithm. So the error is expected.
Greg Buchholz
Well, I think most of the change is due to the algorithm, which gives a rather crude approximation to the actual solution of the ODE. This is backed by trying different values of dt (actually tested with Java, because that uses hardly any memory at all and is much much faster). For the tests, dt * k = 20, dt is the argument: dafis@linux:~/Documents/haskell/dmitri/Greg/Bodies> java PlanSys 0.1 -0.1690751638285245 -0.16877178579767274 dafis@linux:~/Documents/haskell/dmitri/Greg/Bodies> java PlanSys 0.05 -0.1690751638285245 -0.16899653420881677 dafis@linux:~/Documents/haskell/dmitri/Greg/Bodies> java PlanSys 0.025 -0.1690751638285245 -0.1690547211965566 ... dafis@linux:~/Documents/haskell/dmitri/Greg/Bodies> java PlanSys 0.00001 -0.1690751638285245 -0.16907516338916015 dafis@linux:~/Documents/haskell/dmitri/Greg/Bodies> java PlanSys 0.000005 -0.1690751638285245 -0.16907516360969982 dafis@linux:~/Documents/haskell/dmitri/Greg/Bodies> time java PlanSys 0.0000025 -0.1690751638285245 -0.16907516371932937 real 1m42.464s user 1m39.900s sys 0m2.540s To me it looks like finer granularity on the time scale is not yet consumed by rounding errors (and since we never divide by small numbers, I wouldn't expect any dramatic effects for a while). If the algorithm - including dt - is prescribed, fine, but I wonder what sort of deviation physicists would consider acceptable. For dt = 0.01, k = 2000 we have a relative error of about 2*10^(-5), is that within accepted bounds or not? (Any physicists hang about here?) Cheers, Daniel

Daniel Fischer wrote:
If the algorithm - including dt - is prescribed, fine, but I wonder what sort of deviation physicists would consider acceptable. For dt = 0.01, k = 2000 we have a relative error of about 2*10^(-5), is that within accepted bounds or not? (Any physicists hang about here?)
It is becoming explicitly off-topic... I didn't follow this thread, I had just one glimpse on the equations used, I had the impression that I saw the standard Euler extrapolation instead of something more stable, like, e.g., Verlet, and I switched off, having other stuff to do. Now, mind you, *no physicist* will tell you a priori whether this or that relative error is acceptable or not. 0.01 percent looks nice, but if you simulate a system in order to see whether it is stable or not, then it won't suffice. For example: is the Solar system eternal? On the other hand, if your system is inherently chaotic, ergodic, and if you are not interested in KAM tori or other islands of stability, then the system is much more tolerant, one chaos is essentially equivalent to another chaos, the details of the trajectory are irrelevant. Then, small errors are not critical at all. == What some people do : 1. They compute the initial energy. 2. They solve the differential equations using some *good* methods. 3. After some steps they stop for a moment, they scratch their heads, and they recompute the energy. If it changed a bit, then they RENORMALIZE the velocity vectors in such a way that the energy *remains* constant unconditionally. Jerzy Karczmarczuk

On 5/6/05, Daniel Fischer
Am Freitag, 6. Mai 2005 02:24 schrieb Greg Buchholz:
We're heading in the right direction anyway. I can now compute 1 million iteration in about 2 minutes (with +RTS -H750M -K100M). Well almost, since it now doesn't compute the right answer, so something must be amiss in the shuffling section. Now if we can get it to us a little less than 1G of memory, we'll be in good shape.
Thanks,
Greg Buchholz
The problem is that a prime crept in in Josef's code, so to calculate the positions and velocities, the updated versions of the planets are used, it should be
update f newlist (a:as) = a' : update f (newlist . (a:)) as where a' = f a (newlist as)
instead of update f newlist (a:as) = a' : update f (newlist . (a':)) as where a' = f a (newlist as).
The prime didn't creep in, I put it there on purpose. Which just shows that I didn't understand the algorithm properly :). But your correction makes the algorithm a little easier to express. But first to the space problem. Greg, you express the sequential computation of updating the set of planets as an iteration over a list. This is a sweet way of expressing it but doesn't work very well when an element of the list contain pointers to the previous element. This will force all the elements to be in memory at the same time. What causes this is too little strictness, all computations which depend on the previous element haven't been forced and thus still points to it. It this specific program the problem is the list of planets. Even if we may have computer ourselves a new list of planets the planets themselves haven't been computed yet and will contain pointers to the previous list of planets. Devastating! Hence we will need a strict list to do this or some other strict data structure. So, here's my code. It's not beautiful but it solves the space problem. I hope I got it right this time :) \begin{code} data StrictList a = Cons !a !(StrictList a) | Nil slToList (Cons a sl) = a : slToList sl slToList Nil = [] listToSl (a:as) = Cons a (listToSl as) listToSl [] = Nil mapSL f (Cons a sl) = Cons (f a) (mapSL f sl) mapSL f Nil = Nil partitionSL :: StrictList a -> StrictList (a,StrictList a) partitionSL sl = go id sl where go prev Nil = Nil go prev (Cons a sl) = Cons (a,prev sl) (go (prev . (Cons a)) sl) advance dt sl = mapSL newplanet (partitionSL sl) where newplanet (p,sl) = Planet (pos p + delta_x) new_v (mass p) where ps = slToList sl delta_v = sum (map (\q -> (pos p - pos q) `scale` ((mass q)*dt/(dist p q)^3)) ps) new_v = (vel p) - delta_v delta_x = new_v `scale` dt offset_momentum ((Planet p v m):ps) = Cons (Planet p new_v m) (listToSl ps) where new_v = (sum (map (\p->(vel p) `scale` (mass p)) ps)) `scale` ((-1.0)/solar_mass) energy:: Int -> StrictList Planet -> Double energy n ls = kinetic - potential where ps = slToList ls kinetic = 0.5 * (sum $ map (\q-> (mass q)*((vel q)`dot`(vel q))) ps) potential = sum [(mass (ps!!i))*(mass (ps!!j))/(dist (ps!!i) (ps!!j)) | i<-[0..n-1], j<-[i+1..n-1]] \end{code} Cheers, /Josef

Josef Svenningsson wrote:
Greg, you express the sequential computation of updating the set of planets as an iteration over a list. This is a sweet way of expressing it but doesn't work very well when an element of the list contain pointers to the previous element. This will force all the elements to be in memory at the same time. What causes this is too little strictness, all computations which depend on the previous element haven't been forced and thus still points to it. It this specific program the problem is the list of planets. Even if we may have computer ourselves a new list of planets the planets themselves haven't been computed yet and will contain pointers to the previous list of planets. Devastating! Hence we will need a strict list to do this or some other strict data structure.
So, here's my code. It's not beautiful but it solves the space problem. I hope I got it right this time :)
Alright, looks good. I've incorporated this latest change. You can see the code... http://xrl.us/fzn4 Greg Buchholz

Josef Svenningsson wrote:
Devastating! Hence we will need a strict list to do this or some other strict data structure.
Seems like the relation between the planets is better described as a graph, instead of a list. Maybe there's something in Data.Graph that'll help us make a more straightforward program? http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data.Graph.html Greg Buchholz
participants (8)
-
Daniel Fischer
-
Greg Buchholz
-
Greg Buchholz
-
haskellï¼ sleepingsquirrel.org
-
Jerzy Karczmarczuk
-
Josef Svenningsson
-
Lennart Augustsson
-
Malcolm Wallace