An interesting toy

newer
GHC Hangs

older
Arbitrary precision?

Andrew Coppin

5 May 2007 5 May '07

11:38 a.m.

Attachments:

attachment.html (text/html — 3.2 KB)

Show replies by date

Ryan Dickie

5 May 5 May

7:58 p.m.

Sounds like a neat program. I'm on a laptop right now but i'll check it out later. The reason I am mailling is because you can use mencoder to convert a stream of image files into a video file. http://www.mplayerhq.hu/DOCS/HTML/en/menc-feat-enc-images.html --ryan On 5/5/07, Andrew Coppin wrote:

...

Greetings.

I have something which you might find mildly interesting. (Please don't attempt the following unless you have some serious CPU power available, and several hundred MB of hard drive space free.)

darcs get http://www.orphi.me.uk/darcs/Chaos cd Chaos ghc -O2 --make System1 ./System1

On my super-hyper-monster machine, the program takes an entire 15 minutes to run to completion. When it's done, you should have 500 images sitting in front of you. (They're in PPM format - hence the several hundred MB of disk space!) The images are the frames that make up an animation; if you can find a way to "play" this animation, you'll be treated to a truely psychedelic light show! (If not then you'll just have to admire them one at a time. The first few dozen frames are quite boring by the way...)

If you want to, you can change the image size. For example, "./System1 800" will render at 800x800 pixels instead of the default 200x200. (Be prepaired for *big* slowdowns!)

*What is it?*

Well, it's a physical simulation of a "chaos pendulum". That is, a magnetic pendulum suspended over a set of magnets. The pendulum would just swing back and forth, but the magnets perturb its path in complex and unpredictable ways.

However, rather than simulate just 1 pendulum, the program simulates 40,000 of them, all at once! For each pixel, a pendulum is initialised with a velocity of zero and an initial position corresponding to the pixel coordinates. As the pendulums swing, each pixel is coloured according to the proximity of the corresponding pendulum to the tree magnets.

*Help requested...*

Can anybody tell me how to make the program go faster?

I already replaced all the lists with IOUArrays, which resulted in big, big speedups (and a large decrease in memory usage). But I don't know how to make it go any faster. I find it worrying that the process of converting pendulum positions to colours appears to take significantly longer than the much more complex task of performing the numerical integration to discover the new pendulum positions. Indeed, using GHC's profiling tools indicates that the most time is spent executing the function "quant8". This function is defined as:

quant8 :: Double -> Word8 quant8 = floor . (0xFF *)

I can't begin to *imagine* how *this* can be the most compute-intensive part of the program when I've got all sorts of heavy metal maths going on with the numerical integration and so forth...! Anyway, if anybody can tell me how to make it run faster, I'd be most appriciative!

Also, is there an easy way to make the program use *both* of the CPUs in my PC? (Given that the program maps two functions over two big IOUArrays...)

Finally, if anybody has any random comments about the [lack of] qualify in my source code, feel free...

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Andrew Coppin

8:17 p.m.

...

Sounds like a neat program. I'm on a laptop right now but i'll check it out later. The reason I am mailling is because you can use mencoder to convert a stream of image files into a video file. Indeed, it is pretty neat. I'd post an image, but I'm not sure whether

Ryan Dickie wrote: the other people on this list would appriciate a binary attachment. I'm hoping to make a DVD of various simulations - but that's kind of difficult when rendering full-size animations takes many hours! >_< Hence the request for optimisation help... ;-) Mencoder works on Linux, IrfanView + VirtualDub does it nicely on Windoze, I'm sure MacOS has something that can stitch PPM images together too. Use whatever you have on your platform. :-D

Stefan O'Rear

8:28 p.m.

On Sat, May 05, 2007 at 09:17:50PM +0100, Andrew Coppin wrote:

...

...
Sounds like a neat program. I'm on a laptop right now but i'll check it out later. The reason I am mailling is because you can use mencoder to convert a stream of image files into a video file. Indeed, it is pretty neat. I'd post an image, but I'm not sure whether

Ryan Dickie wrote: the other people on this list would appriciate a binary attachment. I'm

AFAIK, nobody cares about binaryness per se. It's merely the fact that images tend to be rather large... Is it <=50kb? (typical maximum size of a 1-line patch that has been bloated by darcs' ultra low density context format)

...

hoping to make a DVD of various simulations - but that's kind of difficult when rendering full-size animations takes many hours! >_< Hence the request for optimisation help... ;-)

Mencoder works on Linux, IrfanView + VirtualDub does it nicely on Windoze, I'm sure MacOS has something that can stitch PPM images together too. Use whatever you have on your platform. :-D

I've had success with ffmpeg years ago (linux) Stefan

jerzy.karczmarczuk＠info.unicaen.fr

9:55 p.m.

New subject: (Chaos) [An interesting toy]

Andrew Coppin shares with us his Chaos program. I confirm that on my HP laptop it was faster than 15 minutes, but I won't speculate how to optimize it. I appreciated the elegance of overloading, the usage of Num classes, etc, which makes it more readable, although somewhat slower. Actually it took slightly less than 10 minutes for 300x300. I wonder whether making some constants global (such as dt) would change anything. For those who don't have patience to execute the program, I converted the ppms to a XVID coded AVI file. Thanks, Andrew http://users.info.unicaen.fr/~karczma/Work/Chaos0.avi What I didn't appreciate was the use of simple extrapolating Euler's method which for oscillating systems is known to be unstable, so the results of the simulation may be far from the reality. Well, one chaos is worth another one, and the sin is not as mortal as in the case of truly periodic systems, but it may be the cause that it is difficult to see the classical fractal structure of the attraction domains on the generated images. Try to use leapfrog Verlet... It will require to keep not only p and v for each frame, but also the acceleration, but the corrections are minor. Of course it will be slower, but then, why not increase dt? Jerzy Karczmarczuk

Andrew Coppin

6 May 6 May

8:48 a.m.

New subject: (Chaos) [An interesting toy]

jerzy.karczmarczuk@info.unicaen.fr wrote:

...

I appreciated the elegance of overloading, the usage of Num classes, etc, which makes it more readable, although somewhat slower.

The source code has explicit monomorphic types all over it; I would expect GHC to be able to optimise out any method calls. (OTOH, I'm not a GHC expert... Simon? Don?)

...

For those who don't have patience to execute the program, I converted the ppms to a XVID coded AVI file. Thanks, Andrew http://users.info.unicaen.fr/~karczma/Work/Chaos0.avi

Thanks for that. I did try encoding the video as MPEG 1, but it was still far too large. (20 MB.) Would have taken me several months to upload...

...

What I didn't appreciate was the use of simple extrapolating Euler's method which for oscillating systems is known to be unstable, so the results of the simulation may be far from the reality. Well, one chaos is worth another one, and the sin is not as mortal as in the case of truly periodic systems, but it may be the cause that it is difficult to see the classical fractal structure of the attraction domains on the generated images.

Fact #1: I don't *know* of any other numerical integration algorithm. (I've heard of RK4, but it's too complicated for me to understand.) Fact #2: I have tried running the simulation with several different, non-comensurate time step values, and it always seems to produce the same output, so I'm reasonably confident there are no integration errors.

Vincent Kraeutler

7 May 7 May

7:44 a.m.

New subject: (Chaos) [An interesting toy]

Andrew Coppin wrote: [snip]

...

Fact #1: I don't *know* of any other numerical integration algorithm. (I've heard of RK4, but it's too complicated for me to understand.)

higher-order runge-kutta algorithms typically have high integration accuracy, but may require multiple energy/force evaluations per integration step. also, they are generally not symplectic [1, 2], which is believed to be rather undesirable if you're looking for long-term stability in your simulation. stoermer/verlet/leap-frog is symplectic, and requires only one force evaluation per time-step. it is therefore _the_ default choice for numerical integration of the equations of motion. it is really quite simple [3, 4]. if you're _really_ looking for speed, you'll want to look at r-RESPA, which is a generalization of leapfrog that allows for multiple-timestep integration [5]. if the force evaluation is indeed the limiting step in your computation (which i'm not sure is the case), you'll certainly want to look into that. it basically boils down to splitting your interaction function into a (rugged) short-range part (which is zero beyond some distance) and which is evaluated every timestep, and a (smooth) long-range part, which is evaluated only every so many steps.

...

Fact #2: I have tried running the simulation with several different, non-comensurate time step values, and it always seems to produce the same output, so I'm reasonably confident there are no integration errors.

getting a good feeling for the maximum allowed timestep size is usually the first step to getting performance out of your program. as a rule of thumb, you'll want more than 10 steps per "period" of your fastest degree of freedom. you can get a feeling for that by finding the maximum second derivative of the potential energy function (which is going to be close to one of your magnets, i suppose. speaking of which -- does the energy diverge near the magnets? i think it does.). to validate the above, you'll want to look at energy conservation -- compute the total (potential and kinetic) energy of your system at the beginning and the end (for a given simulation length in terms of time, not in terms of timesteps) of your simulation, and compare to the starting energy. you'll find that there's a critical step size below which your energy stays constant, and above which it starts to diverge rather quickly. you might want to take that step size (and divide it by half if you're cautious). kind regards, v. [1] http://srv.chim.unifi.it/orac/MAN/node5.html [2] http://mitpress.mit.edu/SICM/book-Z-H-57.html#%_chap_5 [3] http://einstein.drexel.edu/courses/CompPhys/Integrators/leapfrog/ [4] http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=all [5] http://cat.inist.fr/?aModele=afficheN&cpsidt=15255925

Simon Peyton-Jones

8:04 a.m.

New subject: (Chaos) [An interesting toy]

| The source code has explicit monomorphic types all over it; I would | expect GHC to be able to optimise out any method calls. (OTOH, I'm not a | GHC expert... Simon? Don?) Full monomorphisation is a whole-program optimisation and GHC isn't (yet) a whole-program compiler. Overloaded functions are indeed specialised for the types at which they are called *in the module they are defined*. Simon

Andrew Coppin

9:43 a.m.

New subject: (Chaos) [An interesting toy]

Simon Peyton-Jones wrote:

...

| The source code has explicit monomorphic types all over it; I would | expect GHC to be able to optimise out any method calls. (OTOH, I'm not a | GHC expert... Simon? Don?)

Full monomorphisation is a whole-program optimisation and GHC isn't (yet) a whole-program compiler. Overloaded functions are indeed specialised for the types at which they are called *in the module they are defined*.

Simon

What I'm trying to say is, I would have expected, say, resultant :: [Vector2] -> Vector2 resultant = sum to run faster than resultant = sum In the latter case, we have resultant :: (Num n) => [n] -> n. I'm no expert, but I would expect that this requires a method lookup on each call, whereas with the more restrictive type I would expect the compiler to be able to just link directly to the correct method implementation. Am I mistaken?

Simon Peyton-Jones

10:03 a.m.

New subject: (Chaos) [An interesting toy]

| What I'm trying to say is, I would have expected, say, | | resultant :: [Vector2] -> Vector2 | resultant = sum | | to run faster than | | resultant = sum | | In the latter case, we have resultant :: (Num n) => [n] -> n. I'm no | expert, but I would expect that this requires a method lookup on each | call, whereas with the more restrictive type I would expect the compiler | to be able to just link directly to the correct method implementation. | Am I mistaken? Probably not. Remember that sum is *itself* overloaded, so if resultant calls sum there is not much it can do except pass on the appropriate dictionary to sum. Better would be to inline (and specialise) sum at this call site. But at the moment GHC doesn't specialise functions defined in *other* modules. That's another thing that could be improved (at the cost of risking constructing multiple copies of the same specialisation of 'sum'. Simon

Andrew Coppin

10:19 a.m.

New subject: (Chaos) [An interesting toy]

...

| In the latter case, we have resultant :: (Num n) => [n] -> n. I'm no | expert, but I would expect that this requires a method lookup on each | call, whereas with the more restrictive type I would expect the compiler | to be able to just link directly to the correct method implementation. | Am I mistaken?

Probably not. Remember that sum is *itself* overloaded, so if resultant calls sum there is not much it can do except pass on the appropriate dictionary to sum.

Better would be to inline (and specialise) sum at this call site. But at the moment GHC doesn't specialise functions defined in *other* modules. That's another thing that could be improved (at the cost of risking constructing multiple copies of the same specialisation of 'sum'.

Right. I see what you're saying. Presumably changing it to resultant = foldl' (+) wouldn't help much either? Seems to me there's always a tradeoff to be made between CPU time, RAM usage, and code size - if not other factors too. :-S But Haskell seems to be fairly unique in that (it looks like) it's possible to transform the code quite aggressively without too much difficulty. You wouldn't ever want to write source code for six different versions of sum, but if a compiler can do it automatically that's another matter. ;-) I guess I was assuming that a function like sum is "simple enough" to get inlined at the call site - and hence possibly optimised at that site. Apparently not. By the way, is there some way of discovering what GHC has "really" done with your code, rather than what you "think" it's done? (Short of getting out a dissassembler that it...)

Simon Peyton-Jones

10:28 a.m.

New subject: (Chaos) [An interesting toy]

| Presumably changing it to resultant = foldl' (+) wouldn't help much either? I think not. | Seems to me there's always a tradeoff to be made between CPU time, RAM | usage, and code size - if not other factors too. :-S But Haskell seems | to be fairly unique in that (it looks like) it's possible to transform | the code quite aggressively without too much difficulty. You wouldn't | ever want to write source code for six different versions of sum, but if | a compiler can do it automatically that's another matter. ;-) | | I guess I was assuming that a function like sum is "simple enough" to | get inlined at the call site - and hence possibly optimised at that | site. Apparently not. There is no reason in principle why not. It's just that GHC doesn't do it at the moment. | By the way, is there some way of discovering what | GHC has "really" done with your code, rather than what you "think" it's | done? Yes: -ddump-simpl. Simon

Andrew Coppin

12:50 p.m.

New subject: (Chaos) [An interesting toy]

Simon Peyton-Jones wrote:

...

| I guess I was assuming that a function like sum is "simple enough" to | get inlined at the call site - and hence possibly optimised at that | site. Apparently not.

There is no reason in principle why not. It's just that GHC doesn't do it at the moment.

On one hand, it looks like something that will often be a win. On the other hand, how do you prevent a combinatorial explosion of specialised functions that hardly ever get called? Hmm... maybe this is why I'm not a compiler designer? ;-)

...

| By the way, is there some way of discovering what | GHC has "really" done with your code, rather than what you "think" it's | done?

Yes: -ddump-simpl.

Quoting the GHC manual: "HACKER TERRITORY. HACKER TERRITORY. (You were warned.)" Ah, that made me smile... Boy, does this thing generate a lot of output! Ah well, I was warned. ;-) I suppose to really make any sense of this lot, I would need to have a very deep understanding of the low-level internal workings of GHC. (I have read a few papers about GHC, but most of them were far too technical for me to comprehend.) Well anyway, I see why adding -O2 makes it run so much faster. When I add that flag, the output goes from (+) :: Vector3 -> Vector3 -> Vector3 to something like (+) :: GHC.Prim.Double# -> GHC.Prim.Double# -> GHC.Prim.Double# -> GHC.Prim.Double# -> GHC.Prim.Double# -> GHC.Prim.Double# -> (# GHC.Float.Double, GHC.Float.Double, GHC.Float.Double #) and the body went from being an incomprehensible tangle of deeply nested case expressions to being a recognisable expression involving GHC.Prim.+##. (Anybody know what the difference between GHC.Prim.Double# and GHC.Float.Double is?) This in spite of the fact that the definition is actually (+) = vzip (+) vzip f (Vector3 x0 y0 z0) (Vector3 x1 y1 z1) = Vector2 (f x0 x1) (f y0 y1) (f z0 z1) So GHC has "unrolled" several levels of indirection here, and replaced an algebraic datatype with a bunch of primitive arguments. In other words, GHC has done what I wanted it to do. :-D There is an absolutely *huge* chunk of code for the derived instance of Read Colour. (Why did I add that again? Hmm, I think I just included the standard library version of my code. Ah well, it's not hurting anything.) Also moderately puzzled by things like "Str: DmdType U(U(L)U(L)U(L))U(U(L)U(L)U(L))m". Surely that's hacker territory if ever I saw it. ;-) Ah well, I doubt I'm going to come up with any new ideas for how to make my code go faster, but it's mildly entertaining wading through over 200 KB of textual output trying to guess what it means.

Brandon S. Allbery KF8NH

1:02 p.m.

New subject: (Chaos) [An interesting toy]

On May 7, 2007, at 8:50 , Andrew Coppin wrote:

...

recognisable expression involving GHC.Prim.+##. (Anybody know what the difference between GHC.Prim.Double# and GHC.Float.Double is?) This in spite of the fact that the definition is actually

Double# is an unboxed (raw) double; Double is boxed, meaning indirect (so it can hold _|_ as a "value"). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Simon Peyton-Jones

1:27 p.m.

New subject: (Chaos) [An interesting toy]

| Ah well, I doubt I'm going to come up with any new ideas for how to make | my code go faster, but it's mildly entertaining wading through over 200 | KB of textual output trying to guess what it means. Ha ha. You did say that you wanted to know what GHC *really* does :-) Seriously, there is mileage in someone working on the core-language pretty-printer to make it less intimidating. It was never really designed for wide consumption, and could be a lot better. Suppressing more qualified names, for example. Simon

Andrew Coppin

1:36 p.m.

New subject: (Chaos) [An interesting toy]

Simon Peyton-Jones wrote:

...

| Ah well, I doubt I'm going to come up with any new ideas for how to make | my code go faster, but it's mildly entertaining wading through over 200 | KB of textual output trying to guess what it means.

Ha ha. You did say that you wanted to know what GHC *really* does :-)

I know. I'm not "really" complaining. ;-)

...

Seriously, there is mileage in someone working on the core-language pretty-printer to make it less intimidating. It was never really designed for wide consumption, and could be a lot better. Suppressing more qualified names, for example.

I wonder... is there a high-level document anywhere which explains what GHC actually does with your code when you compile it? I mean, presumably the first thing it does is check whether the source is parsable, syntactically correct, type checks, etc. As I understand it, it then translates the program into Core, does some Core-to-Core transformations (I have no idea what transformations), and finally feeds that into a code generator - either assembly or C. That's about all I know. I mean, I'm *presuming* that the final compiled form is some kind of graph-reduction machine... but I don't really know. (On the other hand, presumably the GHC developers spend more time, like, *developing* GHC rather than writing about how it works...)

Simon Peyton-Jones

1:41 p.m.

New subject: (Chaos) [An interesting toy]

| I wonder... is there a high-level document anywhere which explains what | GHC actually does with your code when you compile it? A lot, actually: http://hackage.haskell.org/trac/ghc/wiki/Commentary

Andrew Coppin

3:31 p.m.

New subject: (Chaos) [An interesting toy]

Simon Peyton-Jones wrote:

...

| I wonder... is there a high-level document anywhere which explains what | GHC actually does with your code when you compile it?

A lot, actually:

http://hackage.haskell.org/trac/ghc/wiki/Commentary

Ooo... candy! (By the way... I'm loving the whole concept of "The Evil Mangler". If I ever get to write production code, I'm totally going to have a function somewhere called The Evil Mangler! Mind you, based on what it does, it certainly sounds pretty evil...)

David House

8:50 p.m.

New subject: (Chaos) [An interesting toy]

On 07/05/07, Andrew Coppin wrote:

...

(Anybody know what the difference between GHC.Prim.Double# and GHC.Float.Double is?)

It's the difference between unboxed and boxed types. A boxed type's representation is in fact a pointer to the unboxed type (I think), so that a Double would be internally represented to a pointer to a Double#. As a consequence, a Double can be _|_, because this can be represented by a null pointer. So such luck with unboxed types. So working with unboxed types is quicker and consumes less memory, but don't use them in any kind of high level because the lack of a _|_ will bite you sooner or later. -- -David House, dmhouse@gmail.com

Duncan Coutts

9:22 p.m.

New subject: (Chaos) [An interesting toy]

On Mon, 2007-05-07 at 21:50 +0100, David House wrote:

...

On 07/05/07, Andrew Coppin wrote:

...
(Anybody know what the difference between GHC.Prim.Double# and GHC.Float.Double is?)

It's the difference between unboxed and boxed types. A boxed type's representation is in fact a pointer to the unboxed type (I think), so that a Double would be internally represented to a pointer to a Double#. As a consequence, a Double can be _|_, because this can be represented by a null pointer.

Rather it can be represented by a code thunk that when called raises an exception or does not terminate. (It's never a 0-pointer) Duncan

Dougal Stanton

10:24 p.m.

New subject: (Chaos) [An interesting toy]

On 07/05/07, David House wrote:

...

represented by a null pointer. So such luck with unboxed types. So working with unboxed types is quicker and consumes less memory, but don't use them in any kind of high level because the lack of a _|_ will bite you sooner or later.

I hesitate to say that... it might come back and bite you in the _|_? I'll get me coat... D.

Malte Milatz

3:29 p.m.

New subject: (Chaos) [An interesting toy]

Sorry, my mail client fooled me, too, so here it is again: Andrew Coppin:

...

What I'm trying to say is, I would have expected, say,

resultant :: [Vector2] -> Vector2 resultant = sum

to run faster than

resultant = sum

In the latter case, we have resultant :: (Num n) => [n] -> n.

The latter is only true if compiled with -fno-monomorphism-restriction. Am I right? (Sorry if this is a dumb question, I'm still trying to get my head around the Haskell type system.) Malte

Derek Elkins

5 May 5 May

8:33 p.m.

Andrew Coppin wrote:

...

Greetings.

I have something which you might find mildly interesting. (Please don't attempt the following unless you have some serious CPU power available, and several hundred MB of hard drive space free.)

darcs get http://www.orphi.me.uk/darcs/Chaos cd Chaos ghc -O2 --make System1 ./System1

On my super-hyper-monster machine, the program takes an entire 15 minutes to run to completion. When it's done, you should have 500 images sitting in front of you. (They're in PPM format - hence the several hundred MB of disk space!) The images are the frames that make up an animation; if you can find a way to "play" this animation, you'll be treated to a truely psychedelic light show! (If not then you'll just have to admire them one at a time. The first few dozen frames are quite boring by the way...)

If you want to, you can change the image size. For example, "./System1 800" will render at 800x800 pixels instead of the default 200x200. (Be prepaired for /big/ slowdowns!)

*What is it?*

Well, it's a physical simulation of a "chaos pendulum". That is, a magnetic pendulum suspended over a set of magnets. The pendulum would just swing back and forth, but the magnets perturb its path in complex and unpredictable ways.

However, rather than simulate just 1 pendulum, the program simulates 40,000 of them, all at once! For each pixel, a pendulum is initialised with a velocity of zero and an initial position corresponding to the pixel coordinates. As the pendulums swing, each pixel is coloured according to the proximity of the corresponding pendulum to the tree magnets.

*Help requested...*

Can anybody tell me how to make the program go faster?

I already replaced all the lists with IOUArrays, which resulted in big, big speedups (and a large decrease in memory usage). But I don't know how to make it go any faster. I find it worrying that the process of converting pendulum positions to colours appears to take significantly longer than the much more complex task of performing the numerical integration to discover the new pendulum positions. Indeed, using GHC's profiling tools indicates that the most time is spent executing the function "quant8". This function is defined as:

quant8 :: Double -> Word8 quant8 = floor . (0xFF *)

I can't begin to /imagine/ how /this/ can be the most compute-intensive part of the program when I've got all sorts of heavy metal maths going on with the numerical integration and so forth...! Anyway, if anybody can tell me how to make it run faster, I'd be most appriciative!

Also, is there an easy way to make the program use /both/ of the CPUs in my PC? (Given that the program maps two functions over two big IOUArrays...)

Finally, if anybody has any random comments about the [lack of] qualify in my source code, feel free...

Try adding strictness annotations to all the components of all your data structures (i.e. put a ! before the type). Not all of the need it, but I doubt any need to be lazy either. Probably the reason quant8 seems to be taking so much time is that it is where a lot of stuff finally gets forced. Certainly, for things that are "primitive" like Colour and Vector you want the components to be strict, in general. I did this for the program and ran System1 100 and it took maybe a couple of minutes, it seemed to be going at a decent clip. 200x200 should take 4 times longer, I assume, and I still don't see that taking 15 minutes. This is on a laptop running on a Mobile AMD Sempron 3500+. Also, you have many many superfluous parentheses and use a different naming convention from representative Haskell code (namely camelCase).

Stefan O'Rear

8:45 p.m.

On Sat, May 05, 2007 at 03:33:03PM -0500, Derek Elkins wrote:

...

Try adding strictness annotations to all the components of all your data structures (i.e. put a ! before the type). Not all of the need it, but I doubt any need to be lazy either. Probably the reason quant8 seems to be taking so much time is that it is where a lot of stuff finally gets forced. Certainly, for things that are "primitive" like Colour and Vector you want the components to be strict, in general.

(In theory at least) That would not be an issue at all - the GHC profiler uses lexical, *not dynamic*, call stacks.

...

I did this for the program and ran System1 100 and it took maybe a couple of minutes, it seemed to be going at a decent clip. 200x200 should take 4 times longer, I assume, and I still don't see that taking 15 minutes. This is on a laptop running on a Mobile AMD Sempron 3500+. Also, you have many many superfluous parentheses and use a different naming convention from representative Haskell code (namely camelCase).

Stefan

Andrew Coppin

9:05 p.m.

...

Try adding strictness annotations to all the components of all your data structures (i.e. put a ! before the type). Not all of the need it, but I doubt any need to be lazy either. Probably the reason quant8 seems to be taking so much time is that it is where a lot of stuff finally gets forced. Certainly, for things that are "primitive" like Colour and Vector you want the components to be strict, in general. Yes, originally the profile was showing quant8 taking something absurd

like 80% of the CPU time. When I changed the framebuffer to an IOUArray, the time spent in quant8 dropped *drastically*. (Because now the framebuffer is strict, and that's forcing the evaluation sooner.) I could certainly try making vectors, colours and arrays strict and see if that does something... (Thinking about it, the colour computation has a square root in it, and I bet that doesn't get forced until it hits quant8... Square root is an expensive operation on currentl hardware isn't it?)

...

Also, you have many many superfluous parentheses and use a different naming convention from representative Haskell code (namely camelCase). This is a pet hate of mine. NamesLikeThis are fine. names_like_this are fine too. But for the love of God, namesLikeThis just looks stupid and annoying! So I generally use camel case for stuff which has to start uppercase, and underscores for stuff that has to start lowercase. It's a system, and it works. Unfortunately it's not the standard convention in Haskell. (And I doubt I will convince anybody to change it...)

Andrew Coppin

9:49 p.m.

...

Try adding strictness annotations to all the components of all your data structures (i.e. put a ! before the type). Not all of the need it, but I doubt any need to be lazy either. Probably the reason quant8 seems to be taking so much time is that it is where a lot of stuff finally gets forced. Certainly, for things that are "primitive" like Colour and Vector you want the components to be strict, in general.

I just did that. Gives a few percent speed increase. (Turns out on my machine System1 with default options actually takes 5 minutes, not 15. And with the extra strictness, it completes about 40 seconds faster. So not a vast speedup - but worth having!) Also tried playing with GHC options. I found the following: -fexcess-precision: No measurable effect. -funbox-strict-fields: Roughly 40 seconds faster again. -fno-state-hack: Makes the program somewhat *slower*. -funfolding-update-in-place: No measurable effect. Hmm, I suppose if I get *really* desperate, I could always try compiling with GHC 6.6.1 instead of 6.6... ;-)

6635

Age (days ago)

6637

Last active (days ago)

List overview

Download

25 comments

12 participants

participants (12)

Andrew Coppin
Brandon S. Allbery KF8NH
David House
Derek Elkins
Dougal Stanton
Duncan Coutts
jerzy.karczmarczuk＠info.unicaen.fr
Malte Milatz
Ryan Dickie
Simon Peyton-Jones
Stefan O'Rear
Vincent Kraeutler