announce: Glome.hs-0.3 (Haskell raytracer)

Jim Snow

17 Apr 2008 17 Apr '08

8:45 p.m.

A new version of my raytracer is out. It now supports cones, cylinders, disks, boxes, and planes as base primitives (previously it only supported triangles and spheres), as well as transformations of arbitrary objects (rotate, scale, translate) and the CSG operations difference and intersection. Perlin noise and Blinn highlights have been added, as well. http://syn.cs.pdx.edu/~jsnow/glome/ Glome can parse NFF-format scene files (see http://tog.acm.org/resources/SPD/), but many features are only accessible via raw Haskell, since NFF doesn't support very many kinds of primitives. I included a TestScene.hs file that demonstrates how to create a scene with various kinds of geometry (including a crude attempt at a recursively-defined oak tree) in haskell. There isn't any documentation yet, but many of the primitives have constructors that resemble their equivalents in povray, so anyone familiar with povray's syntax should be able to figure out what's going on. -jim

Show replies by date

Don Stewart

18 Apr 18 Apr

6:43 p.m.

jsnow:

...

A new version of my raytracer is out. It now supports cones, cylinders, disks, boxes, and planes as base primitives (previously it only supported triangles and spheres), as well as transformations of arbitrary objects (rotate, scale, translate) and the CSG operations difference and intersection. Perlin noise and Blinn highlights have been added, as well.

http://syn.cs.pdx.edu/~jsnow/glome/

Glome can parse NFF-format scene files (see http://tog.acm.org/resources/SPD/), but many features are only accessible via raw Haskell, since NFF doesn't support very many kinds of primitives. I included a TestScene.hs file that demonstrates how to create a scene with various kinds of geometry (including a crude attempt at a recursively-defined oak tree) in haskell. There isn't any documentation yet, but many of the primitives have constructors that resemble their equivalents in povray, so anyone familiar with povray's syntax should be able to figure out what's going on.

Very impressive. Did you consider cabalising the Haskell code, so it can be easily distributed from hackage.haskell.org? I note on the website you say: "no threading (shared-memory concurrency is not supported by ocaml, in haskell it's buggy)" Could you elaborate on this? Shared memory concurrency is a sweet spot in Haskell, and heavily utilised, so I think we'd all like to know more details.. -- Don

Sebastian Sylvan

7:13 p.m.

On Fri, Apr 18, 2008 at 7:43 PM, Don Stewart wrote:

...

jsnow:

...
A new version of my raytracer is out. It now supports cones, cylinders, disks, boxes, and planes as base primitives (previously it only supported triangles and spheres), as well as transformations of arbitrary objects (rotate, scale, translate) and the CSG operations difference and intersection. Perlin noise and Blinn highlights have been added, as well.

http://syn.cs.pdx.edu/~jsnow/glome/http://syn.cs.pdx.edu/%7Ejsnow/glome/

Glome can parse NFF-format scene files (see http://tog.acm.org/resources/SPD/), but many features are only accessible via raw Haskell, since NFF doesn't support very many kinds of primitives. I included a TestScene.hs file that demonstrates how to create a scene with various kinds of geometry (including a crude attempt at a recursively-defined oak tree) in haskell. There isn't any documentation yet, but many of the primitives have constructors that resemble their equivalents in povray, so anyone familiar with povray's syntax should be able to figure out what's going on.

Very impressive. Did you consider cabalising the Haskell code, so it can be easily distributed from hackage.haskell.org?

I note on the website you say:

"no threading (shared-memory concurrency is not supported by ocaml, in haskell it's buggy)"

Could you elaborate on this? Shared memory concurrency is a sweet spot in Haskell, and heavily utilised, so I think we'd all like to know more details..

Not sure what you need shared memory concurrency for in this case as it seems to be a straightforward parallelism problem (i.e. the different threads would be different pixels, there is no sharing needed). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Jim Snow

8:10 p.m.

Don Stewart wrote:

...

jsnow:

...
A new version of my raytracer is out. ...

Very impressive. Did you consider cabalising the Haskell code, so it can be easily distributed from hackage.haskell.org?

I note on the website you say:

"no threading (shared-memory concurrency is not supported by ocaml, in haskell it's buggy)"

Could you elaborate on this? Shared memory concurrency is a sweet spot in Haskell, and heavily utilised, so I think we'd all like to know more details..

-- Don

The concurrency bug has to do with excessive memory use, and was discussed earlier here on the mailing list (search for Glome). http://hackage.haskell.org/trac/ghc/ticket/2185 The other problem I had with concurrency is that I was getting about a 50% speedup instead of the 99% or so that I'd expect on two cores. I figured I'm probably doing something wrong. I don't have any objection to using cabal, I just haven't gone to the trouble to figure it out yet. Maybe in the next release. Sebastian Sylvan wrote:

...

Not sure what you need shared memory concurrency for in this case as it seems to be a straightforward parallelism problem (i.e. the different threads would be different pixels, there is no sharing needed).

The scene is shared between threads. (Complex scenes can be quite large.) I'm assuming this is handled as a read-only shared memory region or something similar, as one might expect, and is not actually copied. -jim

Bulat Ziganshin

8:19 p.m.

New subject: Re[2]: announce: Glome.hs-0.3 (Haskell raytracer)

Hello Jim, Saturday, April 19, 2008, 12:10:23 AM, you wrote:

...

The other problem I had with concurrency is that I was getting about a 50% speedup instead of the 99% or so that I'd expect on two cores. I

2 cores doesn't guarantee 2x speedup. some programs are limited by memory access speed and you still have just one memory :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

David Roundy

8:32 p.m.

On Sat, Apr 19, 2008 at 12:19:19AM +0400, Bulat Ziganshin wrote:

...

Saturday, April 19, 2008, 12:10:23 AM, you wrote:

...
The other problem I had with concurrency is that I was getting about a 50% speedup instead of the 99% or so that I'd expect on two cores. I

2 cores doesn't guarantee 2x speedup. some programs are limited by memory access speed and you still have just one memory :)

In fact, this is relatively easily tested (albeit crudely): just run two copies of your single-threaded program at the same time. If they take longer than when run one at a time, you can guess that you're memory-limited, and you won't get such good performance from threading your code. But this is only a crude hint, since memory performance is strongly dependent on cache behavior, and running one threaded job may either do better or worse than two single-threaded jobs. If you've got two separate CPUs with two separate caches, the simultaneous single-threaded jobs should beat the threaded job (meaning take less than twice as long), since each job should have full access to one cache. If you've got two cores sharing a single cache, the behavior may be the opposite: the threaded job uses less total memory than the two single-threaded jobs, so more of the data may stay in cache. For reference, on a friend's dual quad-core Intel system (i.e. 8 cores total), if he runs 8 simultaneous (identical) memory-intensive job he only gets about five times the throughput of a job, meaning that each core is running at something like 60% of it's CPU capacity due to memory contention. It's possible that your system is comparably limited, although I'd be suprised, somehow it seems unlikely that your ray tracer is stressing the cache all that much. -- David Roundy Department of Physics Oregon State University

Jim Snow

9:09 p.m.

David Roundy wrote:

...

On Sat, Apr 19, 2008 at 12:19:19AM +0400, Bulat Ziganshin wrote:

...
Saturday, April 19, 2008, 12:10:23 AM, you wrote:

...
The other problem I had with concurrency is that I was getting about a 50% speedup instead of the 99% or so that I'd expect on two cores. I

2 cores doesn't guarantee 2x speedup. some programs are limited by memory access speed and you still have just one memory :)

In fact, this is relatively easily tested (albeit crudely): just run two copies of your single-threaded program at the same time. If they take longer than when run one at a time, you can guess that you're memory-limited, and you won't get such good performance from threading your code. But this is only a crude hint, since memory performance is strongly dependent on cache behavior, and running one threaded job may either do better or worse than two single-threaded jobs. If you've got two separate CPUs with two separate caches, the simultaneous single-threaded jobs should beat the threaded job (meaning take less than twice as long), since each job should have full access to one cache. If you've got two cores sharing a single cache, the behavior may be the opposite: the threaded job uses less total memory than the two single-threaded jobs, so more of the data may stay in cache.

For reference, on a friend's dual quad-core Intel system (i.e. 8 cores total), if he runs 8 simultaneous (identical) memory-intensive job he only gets about five times the throughput of a job, meaning that each core is running at something like 60% of it's CPU capacity due to memory contention. It's possible that your system is comparably limited, although I'd be suprised, somehow it seems unlikely that your ray tracer is stressing the cache all that much.

On a particular scene with one instance of the single-threaded renderer running, it takes about 19 seconds to render an image. With two instances running, they each take about 23 seconds. This is on an Athlon-64 3800+ dual core, with 512kB of L2 cache per core. So, it seems my memory really is slowing things down noticeably. -jim

David Roundy

9:34 p.m.

On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:

...

On a particular scene with one instance of the single-threaded renderer running, it takes about 19 seconds to render an image. With two instances running, they each take about 23 seconds. This is on an Athlon-64 3800+ dual core, with 512kB of L2 cache per core. So, it seems my memory really is slowing things down noticeably.

This doesn't mean there's no hope, it just means that you'll need to be extra-clever if you're to get a speedup that is close to optimal. The key to overcoming memory bandwidth issues is to think about cache use and figure out how to improve it. For instance, O(N^3) matrix multiplication can be done in close to O(N^2) time provided it's memory-limited, by blocking memory accesses so that you access less memory at once. In the case of ray-tracing I've little idea where or how you could improve memory access patterns, but this is what you should be thinking about (if you want to optimize this code). Of course, improving overall scaling is best (e.g. avoiding using lists if you need random access). Next I'd ask if there are more efficent or compact data structures that you could be using. If your program uses less memory, a greater fraction of that memory will fit into cache. Switching to stricter data structures and turning on -funbox-strict-fields (or whatever it's called) may help localize your memory access. Even better if you could manage to use unboxed arrays, so that your memory accesses really would be localized (assuming that you actually do have localize memory-access patterns). Of course, also ask yourself how much memory your program is using in total. If it's not much more than 512kB, for instance, we may have misdiagnosed your problem. -- David Roundy Department of Physics Oregon State University

Jim Snow

11:11 p.m.

David Roundy wrote:

...

On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:

...
On a particular scene with one instance of the single-threaded renderer running, it takes about 19 seconds to render an image. With two instances running, they each take about 23 seconds. This is on an Athlon-64 3800+ dual core, with 512kB of L2 cache per core. So, it seems my memory really is slowing things down noticeably.

This doesn't mean there's no hope, it just means that you'll need to be extra-clever if you're to get a speedup that is close to optimal. The key to overcoming memory bandwidth issues is to think about cache use and figure out how to improve it. For instance, O(N^3) matrix multiplication can be done in close to O(N^2) time provided it's memory-limited, by blocking memory accesses so that you access less memory at once.

In the case of ray-tracing I've little idea where or how you could improve memory access patterns, but this is what you should be thinking about (if you want to optimize this code). Of course, improving overall scaling is best (e.g. avoiding using lists if you need random access). Next I'd ask if there are more efficent or compact data structures that you could be using. If your program uses less memory, a greater fraction of that memory will fit into cache. Switching to stricter data structures and turning on -funbox-strict-fields (or whatever it's called) may help localize your memory access. Even better if you could manage to use unboxed arrays, so that your memory accesses really would be localized (assuming that you actually do have localize memory-access patterns).

Of course, also ask yourself how much memory your program is using in total. If it's not much more than 512kB, for instance, we may have misdiagnosed your problem.

Interestingly, switching between "Float" and "Double" doesn't make any noticeable difference in speed (though I see more rendering artifacts with Float). Transformation matrices are memory hogs, at 24 floats each (a 4x4 matrix and its inverse with the bottom rows omitted (they're always 0 0 0 1)). This may be one reason why many real-time ray tracers just stick with triangles; a triangle can be transformed by transforming its verticies, and then you can throw the matrix away. There are a lot of tricks for making ray tracers more memory-coherent. You can trace packets of rays instead of single rays against whatever acceleration structure you may be using. Kd-tree nodes can be compacted to fit in a single cacheline if you arrange the tree in memory in a particular way that allows you to omit some of the pointers. (I use BIH trees, but the same ideas probably apply.) A lot of these sorts of tricks make the resulting code more complex and/or uglier. Useful references: "What Every Programmer Needs to Know About Memory" http://lwn.net/Articles/250967/ Siggraph presentation on optimizing ray tracers (warning: ppt) http://www.openrt.de/Siggraph05/UpdatedCourseNotes/Stoll_Realtime.ppt Bryan O'Sullivan wrote:

...

Jim Snow wrote:

...
...
The concurrency bug has to do with excessive memory use, and was discussed earlier here on the mailing list (search for Glome). http://hackage.haskell.org/trac/ghc/ticket/2185

Interesting. I looked at your test case. I can reproduce your problem when I build with the threaded runtime and run with a single core, but not if I use +RTS -N2. Did you overlook the possibility that you may not have told GHC how many cores to use?

I just tested it again. Memory usage behaves differently depending on how many cores I tell it to run on, but it always used the least memory when I compiled without threading support. With -N1 memory usage grows faster than -N2, but memory is smaller and doesn't grow larger with each re-render (except by a very small amount) if I don't use parmap.

...

Also, your code is sprinkled with many more strictness annotations than it needs.

That doesn't surprise me. I haven't really figured out why somethings are faster strict or not strict, or where it doesn't matter or the annotations are redundant. -jim

Don Stewart

11:25 p.m.

jsnow:

...

David Roundy wrote:

...
On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:

...
On a particular scene with one instance of the single-threaded renderer running, it takes about 19 seconds to render an image. With two instances running, they each take about 23 seconds. This is on an Athlon-64 3800+ dual core, with 512kB of L2 cache per core. So, it seems my memory really is slowing things down noticeably.

This doesn't mean there's no hope, it just means that you'll need to be extra-clever if you're to get a speedup that is close to optimal. The key to overcoming memory bandwidth issues is to think about cache use and figure out how to improve it. For instance, O(N^3) matrix multiplication can be done in close to O(N^2) time provided it's memory-limited, by blocking memory accesses so that you access less memory at once.

In the case of ray-tracing I've little idea where or how you could improve memory access patterns, but this is what you should be thinking about (if you want to optimize this code). Of course, improving overall scaling is best (e.g. avoiding using lists if you need random access). Next I'd ask if there are more efficent or compact data structures that you could be using. If your program uses less memory, a greater fraction of that memory will fit into cache. Switching to stricter data structures and turning on -funbox-strict-fields (or whatever it's called) may help localize your memory access. Even better if you could manage to use unboxed arrays, so that your memory accesses really would be localized (assuming that you actually do have localize memory-access patterns).

Of course, also ask yourself how much memory your program is using in total. If it's not much more than 512kB, for instance, we may have misdiagnosed your problem.

Interestingly, switching between "Float" and "Double" doesn't make any noticeable difference in speed (though I see more rendering artifacts with Float). Transformation matrices are memory hogs, at 24 floats each (a 4x4 matrix and its inverse with the bottom rows omitted (they're always 0 0 0 1)). This may be one reason why many real-time ray tracers just stick with triangles; a triangle can be transformed by transforming its verticies, and then you can throw the matrix away.

The only differences I'd expect to see here would be with -fvia-C -fexcess-precision -O2 -optc-O2 which might trigger some SSE stuff from the C compiler

Andrew Coppin

21 Apr 21 Apr

6:54 p.m.

New subject: Glome.hs-0.3 and other things

Jim Snow wrote:

...

Useful references: "What Every Programmer Needs to Know About Memory" http://lwn.net/Articles/250967/

Thank you for monopolising my entire afternoon. This is probably the single most interesting thing I've read in ages! :-D Thinking about it, maybe this explains my sorting benchmark - sorting Word32 values probably involves more shifting data from place to place than actual computation, so it's probably limited by the bandwidth of the FSB than the CPU's ability to perform number chrunking. Running it in multiple threads is probably just thrashing the L2 cache rather than actually helping... After studying all this material, I do find myself feeling slightly concerned. The article shows how in C / C++ / assembly you can arrange your data and order your computations to make maximum use of the various caches and avoid certain bottlenecks in the system. And the *vast* performance difference it can yield. But what happens if you happen to be writing your program in Smalltalk or Java or... Haskell. Up here, you're so far from the hardware that it would seem that you have *no hope* of using the CPU caches effectively. Think about it - data scattered all over the heap in an essentially random pattern, getting moved around periodically by the GC. [Oh, the GC! That sounds like it must nail the hell out of the cache. And even if it doesn't, it more or less negates its usefulness because all the "hot" data is now at a different address.] Just think about trying to process a Haskell list - all those multiple levels of pointer indirection! The CPU has no hope of prefetching that stuff... Damn, it seems as if there's simply no way of ever making Haskell fast. :-( I suppose the idea is that Haskell is supposed to help you work at a higher level of abstraction, so you can concentrate on building better *algorithms* which require less work in the first place. Surely using an algorithm in a suitably superior complexity class could more than make up for the performance loss from lack of cache coherence. But people who work in C and C++ know how to build efficient algorithms too, so... hmm, we seem to be in trouble here.

David Roundy

7:15 p.m.

New subject: Glome.hs-0.3 and other things

On Mon, Apr 21, 2008 at 11:54 AM, Andrew Coppin wrote:

...

I suppose the idea is that Haskell is supposed to help you work at a higher level of abstraction, so you can concentrate on building better *algorithms* which require less work in the first place. Surely using an algorithm in a suitably superior complexity class could more than make up for the performance loss from lack of cache coherence. But people who work in C and C++ know how to build efficient algorithms too, so... hmm, we seem to be in trouble here.

I don't think things are so glum. There are certainly things we can do to control where data is located (e.g. using unboxed arrays and keeping track of strictness, and strict data types combined with -funboxstrictidontrecalltheexactpragma). In truth, C++ programs also tend to have severe issues with memory use. Garbage collection is horrible on the cache, but it's also done relatively infrequently (at least, full GCs are). Writing seriously fast Haskell code is indeed challenging, but it's a fun challenge. And writing bug-free code is also a fun challenge, and one at which we have a huge advantage over most other languages. And in Haskell we less are forced to choose between a horribly ugly implementation and a horribly inefficient implementation. David

Andrew Coppin

8:15 p.m.

New subject: Glome.hs-0.3 and other things

David Roundy wrote:

...

On Mon, Apr 21, 2008 at 11:54 AM, Andrew Coppin wrote:

...
I suppose the idea is that Haskell is supposed to help you work at a higher level of abstraction, so you can concentrate on building better *algorithms* which require less work in the first place. Surely using an algorithm in a suitably superior complexity class could more than make up for the performance loss from lack of cache coherence. But people who work in C and C++ know how to build efficient algorithms too, so... hmm, we seem to be in trouble here.

I don't think things are so glum. There are certainly things we can do to control where data is located (e.g. using unboxed arrays and keeping track of strictness, and strict data types combined with -funboxstrictidontrecalltheexactpragma). In truth, C++ programs also tend to have severe issues with memory use. Garbage collection is horrible on the cache, but it's also done relatively infrequently (at least, full GCs are).

Writing seriously fast Haskell code is indeed challenging, but it's a fun challenge. And writing bug-free code is also a fun challenge, and one at which we have a huge advantage over most other languages. And in Haskell we less are forced to choose between a horribly ugly implementation and a horribly inefficient implementation.

Well now, if you're interesting in bug-free code, Haskell looks to me like pretty much the supreme ultimate language. Can you spell "safe"? I think we've got it pretty much nailed on that score. About the most dangerous thing a typical Haskell program can do is "head []" or looping forever. I do worry about performance, however. [I tend to write compute-bounded programs.] Sure, you can use arrays, but for whatever reason the API isn't nearly as nice as lists. (E.g., there is no zip function for arrays.) ByteString shows that arrays can be made fast - but ByteString is only for bytes. It strikes me that an idiomatic Haskell program - which does things like "length . filter p" and such - is going to utterly confound cache prefetch logic with endless layers of indirection. The basic execution model of Haskell is graph reduction. As I understand it [it's been a while since I read about this], GHC uses the spineless tagless G-machine. This fundamentally uses pointers to functions for every aspect of its implementation - graph reduction, deadlock detection, GC, concurrency, etc. Every step of traversing a structure on the heap potentially involves forcing a thunk, so at every step you're dereferencing function pointers. It must utterly confound the instruction and data caches - 0% coherence! Using [unboxed] arrays makes the situation a little more bearable for the data cache, but the poor instruction cache still has to handle control flow that jumps somewhere every few dozen instructions...

Stefan O'Rear

8:21 p.m.

New subject: Glome.hs-0.3 and other things

On Mon, Apr 21, 2008 at 07:54:24PM +0100, Andrew Coppin wrote:

...

...
Useful references: "What Every Programmer Needs to Know About Memory" http://lwn.net/Articles/250967/ After studying all this material, I do find myself feeling slightly concerned. The article shows how in C / C++ / assembly you can arrange your data and order your computations to make maximum use of the various caches and avoid certain bottlenecks in the system. And the *vast* performance difference it can yield. But what happens if you happen to be writing your

Jim Snow wrote: program in Smalltalk or Java or... Haskell. Up here, you're so far from the hardware that it would seem that you have *no hope* of using the CPU caches effectively. Think about it - data scattered all over the heap in an essentially random pattern, getting moved around periodically by the GC. [Oh, the GC! That sounds like it must nail the hell out of the cache. And even if it doesn't, it more or less negates its usefulness because all the "hot" data is now at a different address.] Just think about trying to process a Haskell list - all those multiple levels of pointer indirection! The CPU has no hope of prefetching that stuff... Damn, it seems as if there's simply no way of ever making Haskell fast. :-(

You're assuming the hardware is constant. "Modern" CPUs were designed back when everyone was using C++; this is changing now, and I for one am still optimistic that CPU designs will follow suit. Stefan

Derek Elkins

23 Apr 23 Apr

4:42 p.m.

On Fri, 2008-04-18 at 14:34 -0700, David Roundy wrote:

...

On Fri, Apr 18, 2008 at 02:09:28PM -0700, Jim Snow wrote:

...
On a particular scene with one instance of the single-threaded renderer running, it takes about 19 seconds to render an image. With two instances running, they each take about 23 seconds. This is on an Athlon-64 3800+ dual core, with 512kB of L2 cache per core. So, it seems my memory really is slowing things down noticeably.

This doesn't mean there's no hope, it just means that you'll need to be extra-clever if you're to get a speedup that is close to optimal. The key to overcoming memory bandwidth issues is to think about cache use and figure out how to improve it. For instance, O(N^3) matrix multiplication can be done in close to O(N^2) time provided it's memory-limited, by blocking memory accesses so that you access less memory at once.

In the case of ray-tracing I've little idea where or how you could improve memory access patterns, but this is what you should be thinking about (if you want to optimize this code). Of course, improving overall scaling is best (e.g. avoiding using lists if you need random access). Next I'd ask if there are more efficent or compact data structures that you could be using. If your program uses less memory, a greater fraction of that memory will fit into cache. Switching to stricter data structures and turning on -funbox-strict-fields (or whatever it's called) may help localize your memory access. Even better if you could manage to use unboxed arrays, so that your memory accesses really would be localized (assuming that you actually do have localize memory-access patterns).

Of course, also ask yourself how much memory your program is using in total. If it's not much more than 512kB, for instance, we may have misdiagnosed your problem.

Ingo Wald's work on interactive coherent raytracing springs to mind. http://www.sci.utah.edu/~wald/Publications/ "Interactive Rendering with Coherent Raytracing" http://graphics.cs.uni-sb.de/~wald/Publications/EG2001_IRCRT/InteractiveRend... is a decent, if dated, introduction. He clearly has much more newer stuff as well.

Jim Snow

24 Apr 24 Apr

midnight

Derek Elkins wrote:

...

Ingo Wald's work on interactive coherent raytracing springs to mind. http://www.sci.utah.edu/~wald/Publications/

"Interactive Rendering with Coherent Raytracing" http://graphics.cs.uni-sb.de/~wald/Publications/EG2001_IRCRT/InteractiveRend... is a decent, if dated, introduction. He clearly has much more newer stuff as well.

Those are good links. It's good to see that the groups of people who know about Haskell and people who know about ray tracing do, in fact, overlap. Background information for everyone else: Wald's work is related to OpenRT, which is an OpenGL-like api for interactive ray tracing (despite the name, it is not, in fact, open source). OpenRT makes for stiff competition. Arauna (http://igad.nhtv.nl/~bikker/) is very impressive, as well. On the other end of the spectrum, there's POV-Ray, which isn't known for its speed, but it is very flexible in the kinds of things it can render and can generate some fairly realistic images. Unlike most real-time ray tracers, which only render triangles, POV-Ray can render native representations of spheres, cones, toruses, heightfields, julia fractals, and a few dozen other kinds of primitives. Pbrt (http://www.pbrt.org/) is another renderer, more modern than POV-Ray, that focuses more on output quality than speed. Unfortunately, all the notable ray tracers that I'm aware of are written in C or C++ (often with a little hand-coded SSE), and as you might imagine I find this to be a sad state of affairs. Not that those are bad languages for this kind of work, but they shouldn't be the only option. I've ended up writing something more like POV-Ray than OpenRT, and that's fine with me. I'd rather have something that's slower but more expressive than fast but inflexible. (The former is also perhaps more easily attainable, particularly in Haskell.) This isn't to say that I'm not interested in making it fast as well. There are plenty of ways to make my raytracer faster: Kd-trees built using a surface area heuristic performed better than naively-built BIH when I implemented them in my ocaml raytracer (though they take longer to construct). If I can figure out how quaternions work, I could probably use them instead of transformation matricies to store cones in order to cut down on memory overhead (4 floats versus 24, if I understand correctly). Ray packets, as described in Wald's paper linked above, might help as well. Simon Marlow wrote:

...

There's definitely a bug here, regardless of whether this example demonstrates it. Use of par shouldn't introduce a space leak, and currently it can.

(fortunately I'm fixing it as we speak...)

Cheers, Simon Oh, good.

-jim

Andrew Coppin

7:29 p.m.

Jim Snow wrote:

...

Those are good links. It's good to see that the groups of people who know about Haskell and people who know about ray tracing do, in fact, overlap.

Background information for everyone else: Wald's work is related to OpenRT, which is an OpenGL-like api for interactive ray tracing (despite the name, it is not, in fact, open source). OpenRT makes for stiff competition. Arauna (http://igad.nhtv.nl/~bikker/) is very impressive, as well. On the other end of the spectrum, there's POV-Ray, which isn't known for its speed, but it is very flexible in the kinds of things it can render and can generate some fairly realistic images. Unlike most real-time ray tracers, which only render triangles, POV-Ray can render native representations of spheres, cones, toruses, heightfields, julia fractals, and a few dozen other kinds of primitives. Pbrt (http://www.pbrt.org/) is another renderer, more modern than POV-Ray, that focuses more on output quality than speed.

Wow. The POV-Ray guys are talking about Haskell [or rather, my personal addiction to it] and the Haskell guys are talking about POV-Ray... on the same day... How unlikely is that? ;-) I've been using POV-Ray for a long time. I like it for several reasons: 1. It's the only program I've ever seen [on a PC] that does ray tracing. [I'm sure there must be others...] 2. It's the only program I've seen that can render *real* curves, not fake trickery with triangle meshes. 3. It can render *arbitrary* mathematical surfaces. Want to render a 3D slice of the 4D cubic Mandelbrot set? No problem! 4. It has a novel "scene description language", which does far more than describe scenes. It's Turing-complete, and you can build physics engines with it. [It's painfully slow though!] Basically, as a maths addict, POV-Ray appeals to me. However, there are problems... The scene description language is basically a macro expansion language, and is chronically poor in data types and control structures. (The worst thing about Haskell is that IT MAKES YOU HATE NORMAL LANGUAGES!) Preserving complex scene state from frame to frame in an animation is notoriously tedious to code. Certain features are accessed in unintuitive ways to avoid breaking backwards compatibility. Overall, you tend to spend a lot of time saying which effects to simulate. [Not all of them matter for a given scene, so some can be left out - thus saving on speed!] I dislike such "clutter". For no objective reason. The POV-Ray team is currently working on the first multi-threaded version. [After years of saying it would never happen.] It's taking a while. (That's partly because the developers take product quality very seriously.) It should be interesting when it's done, but it's still taking a while. Personally, I'd quite like to write my own ray tracer to address some of these limitations. But every time I try, I end up hitting design issues [Haskell works very differently to Java] or performance issues [which I don't even know how to begin debugging]. Not to mention that POV-Ray uses sophisticated techniques like volume bounding that I know nothing about. (There's nothing like using an inherantly superior algorithm to make your code orders of magnitude faster...)

...

Simon Marlow wrote:

...
There's definitely a bug here, regardless of whether this example demonstrates it. Use of par shouldn't introduce a space leak, and currently it can.

(fortunately I'm fixing it as we speak...)

Cheers, Simon Oh, good.

Indeed - yay for Simon!

Jon Harrop

8:39 p.m.

On Thursday 24 April 2008 20:29:50 Andrew Coppin wrote:

...

2. It's the only program I've seen that can render *real* curves, not fake trickery with triangle meshes.

What you called "fake trickery with triangle meshes" is the core of all modern computer graphics. Focussing on ray tracing instead of that is rather missing the point, IMHO. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e

Andrew Coppin

26 Apr 26 Apr

7:33 a.m.

Jon Harrop wrote:

...

On Thursday 24 April 2008 20:29:50 Andrew Coppin wrote:

...
2. It's the only program I've seen that can render *real* curves, not fake trickery with triangle meshes.

What you called "fake trickery with triangle meshes" is the core of all modern computer graphics. Focussing on ray tracing instead of that is rather missing the point, IMHO.

Well, that rather depends on what your "point" is, doesn't it? Personally, I don't see the point in rendering a couple of million mathematically flat surfaces, and then trying to blur the edges to give the false illusion of a curved surface - when you could just, you know, render a real curved surface. Sure, surface normal tricks can make lighting and reflections look almost correct, but they won't fix profile views, shadows and self-shadows, intersections, or any of the other artifacts caused by using triangles. But anyway, this is wandering off-topic. Suffice it to say, for my humble little hobby graphics, I intend to stick to ray tracing. If you disagree, feel free to use something else for your stuff. :-)

david48

9:21 a.m.

On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin wrote:

...

Personally, I don't see the point in rendering a couple of million mathematically flat surfaces,

What about speed ?

Ketil Malde

1:29 p.m.

david48 writes:

...

...
Personally, I don't see the point in rendering a couple of million mathematically flat surfaces,

...

What about speed ?

"If it doesn't have to be correct, it can be arbitrarily fast" :-) -k -- If I haven't seen further, it is by standing in the footprints of giants

Sebastian Sylvan

7:11 p.m.

On Sat, Apr 26, 2008 at 10:21 AM, david48 > wrote:

...

On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin wrote:

...
Personally, I don't see the point in rendering a couple of million mathematically flat surfaces,

What about speed ?

If you tessellate the curve so that there is zero error, then it's probably going to be faster to ray trace the actual curve, since you'll essentially have loads of very small (1 pixel or thereabouts) triangles... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Andrew Coppin

8:34 p.m.

david48 wrote:

...

On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin wrote:

...
Personally, I don't see the point in rendering a couple of million mathematically flat surfaces,

What about speed ?

Speed is not always the most important thing. ;-) ["The best things come to those who wait" and all that...]

Jim Snow

27 Apr 27 Apr

12:25 a.m.

New subject: announce: Glome.hs-0.3 (ray tracing digression, glome tutorial)

...

On Sat, Apr 26, 2008 at 9:33 AM, Andrew Coppin wrote:

...
...
Personally, I don't see the point in rendering a couple of million mathematically flat surfaces,

What about speed ? That is a good point. Ray tracing may give you a better image approximation than rasterizing triangles, but unless you're tracing an infinite number of rays, it is still just an approximation. Some shapes (like spheres) can be rendered more quickly in a ray tracer than

David48 wrote: triangles, and they use less memory as well, so if you really want a sphere, there's no reason to approximate it with triangles. For other objects, the trade-offs are more complex.

...

...

...
Unfortunately, the SDL is so complex it's well-nigh impossible to support in other third-party applications.

I don't think it would be "impossible" to support in 3rd party apps. A lot of work, certainly! Especially the way the parser does parsing and macro expansion in the same pass, so a macro body need not contain a syntactically-complete code fragment. I think if somebody sat down and wrote something that would do all the macro expansion, variable substitution and so forth to leave just few geometry descriptions, *those* would be quite easy to parse and manipulate for 3rd parties.

I guess a pov importer wouldn't necessarily be all that difficult, or an exporter, for that matter. (Supporting every single primitive type, texture, and rendering feature would be a daunting challenge, but just supporting the basics might be relatively simple.) The problem comes when you want to import a nice short hand-edited script, and then do a few things to it and export it again: suddenly the file turns into a multi-megabyte monstrosity with all the loops unrolled and all the recursive macros expanded and you lose the ability to hand-edit the scene. Depending on what you're doing, this might not be a problem.

...

...
...
Personally, I'd quite like to write my own ray tracer to address some of these limitations. But every time I try, I end up hitting design issues [Haskell works very differently to Java] or performance issues [which I don't even know how to begin debugging]. Not to mention that POV-Ray uses sophisticated techniques like volume bounding that I know nothing about. (There's nothing like using an inherantly superior algorithm to make your code orders of magnitude faster...)

I haven't really run into any issues where Haskell didn't let me do what I want, except for the performance counters thing mentioned way back at the beginning of this thread (and which I've more or less given up on for now, since the acceleration structure seems to be working okay and I have other things to work on).

Well, for example, Haskell doesn't have hetrogenous lists - which are trivial in any OOP language. That's quite awkward to get around. Also, both spheres and cylinders have a "radius" property, but then you end up with name clashes. Again, a non-issue in OOP languages. [I gather there's some new GHC language extension to partially solve this one - but only when types are statically known. In the general case, you'd have to write a "radius class" and define instances... yuck!]

It's funny you mention that, those are actually problems I ran into, but (having learned my lesson the hard way in Ocaml), I decided not to try and force the language to do things my way, but instead try to do things in a way that Haskell makes easy. For instance, I started out by making each primitive a separate type (Sphere, Triangle, etc...), and then made a type class that defines a ray intersection method (and a few other odds and ends), but then I realized that I can't do something as simple as store a list of primitives (or if there is in fact a way, I'm not aware of it). Instead, I made a single large sum type with all my primitives: data Solid = Sphere {center :: Vec, radius, invradius :: Flt} | Triangle {v1, v2, v3 :: Vec} | TriangleNorm {v1, v2, v3, n1, n2, n3 :: Vec} | Disc Vec Vec Flt -- position, normal, r*r | Cylinder Flt Flt Flt -- radius height1 height2 | Cone Flt Flt Flt Flt -- r clip1 clip2 height | Plane Vec Flt -- normal, offset from origin | Box Bbox | Group [Solid] | Intersection [Solid] | Bound Solid Solid | Difference Solid Solid | Bih {bihbb :: Bbox, bihroot :: BihNode} | Instance Solid Xfm | Tex Solid Texture | Nothing deriving Show etc... (strictness annotations and a few broken primitives omitted for brevity) so that now I can easily make a list of primitives. (In fact, that's what a Group is.) I also gave up on using named fields, since coming up with a separate name for each one gets ugly: instead of radius, you have sphere_radius, cylinder_radius, cone_radius disc_radius, etc...

...

...
Cones and spheres and boxes and the like can be made into some fairly elaborate scenes, but you can't really make, say, a realistic human figure out of those sorts of primitives.

No, but with things like isosurfaces and parametric surfaces, the world is your oyster. And you can do a surprising amount of things with blobs. I understand that all those curved surfaces which are currently popular in consumer electronics are done with splines, not triangles...

That world is a very slow oyster. I've used blobs and isosurfaces in POV-Ray, and they're wonderfully expressive but also incredibly slow. (The water in this scene, for instance, is an isosurface, and the boat hull is a blob and some CSG: http://syn.cs.pdx.edu/~jsnow/castle-crop.jpg) There are also other problems: the isosurface solvers are sometimes wrong, leading to ugly artifacts (and a lot of manual tweaking to make them go away), and applying a texture to an isosurface or blob isn't necessarily straightforward. Another issue is that isosurfaces are great for filling the scene with procedurally-generated complexity, but sometimes you need to do actual physics computations in order to make the object do the right thing. For instance, in the image I linked above, the water looks pretty good, but the waves should really be reflecting off of the castle when they hit it, and there's no way I know of to do that without actually doing some sort of finite-element fluid simulation. And since that's already an approximation, and it's already taking up a lot of memory, you might as well just represent it as a triangle mesh (or whatever happens to be the easiest representation to render efficiently). There's really a trade-off here between quality, time, and space. Any blob or isosurface can be approximated with triangles, and if the triangles are smaller than a pixel (or sometimes even if they're quite a bit bigger) the difference isn't going to be noticeable. However, using huge numbers of triangles consumes a ridiculous amount of memory, so you often have to use a coarser representation. (Here's an interesting paper about doing this sort of thing in a clever way: http://www.llnl.gov/graphics/ROAM/roam.pdf) Isosurfaces and blobs (http://povray.org/documentation/view/3.6.1/71/, http://povray.org/documentation/view/3.6.1/73/) are two features I really like about POV-Ray, but they're very CPU intensive. In other news, I have the beginnings of a tutorial up on the Haskell wiki that describes how to write your own scene with Glome: http://www.haskell.org/haskellwiki/Glome_tutorial -jim

Andrew Coppin

9:05 a.m.

New subject: Glome.hs-0.3 (ray tracing digression)

Jim Snow wrote:

...

I guess a pov importer wouldn't necessarily be all that difficult, or an exporter, for that matter. (Supporting every single primitive type, texture, and rendering feature would be a daunting challenge, but just supporting the basics might be relatively simple.) The problem comes when you want to import a nice short hand-edited script, and then do a few things to it and export it again: suddenly the file turns into a multi-megabyte monstrosity with all the loops unrolled and all the recursive macros expanded and you lose the ability to hand-edit the scene. Depending on what you're doing, this might not be a problem.

I see what you mean... I guess you'd want to seperate the machine-generated parts from the hand-written parts by careful use of #includes...

...

...
Well, for example, Haskell doesn't have hetrogenous lists - which are trivial in any OOP language. That's quite awkward to get around. Also, both spheres and cylinders have a "radius" property, but then you end up with name clashes. Again, a non-issue in OOP languages. [I gather there's some new GHC language extension to partially solve this one - but only when types are statically known. In the general case, you'd have to write a "radius class" and define instances... yuck!]

It's funny you mention that, those are actually problems I ran into, but (having learned my lesson the hard way in Ocaml), I decided not to try and force the language to do things my way, but instead try to do things in a way that Haskell makes easy. For instance, I started out by making each primitive a separate type (Sphere, Triangle, etc...), and then made a type class that defines a ray intersection method (and a few other odds and ends), but then I realized that I can't do something as simple as store a list of primitives (or if there is in fact a way, I'm not aware of it). Instead, I made a single large sum type with all my primitives:

...

so that now I can easily make a list of primitives. (In fact, that's what a Group is.) I also gave up on using named fields, since coming up with a separate name for each one gets ugly: instead of radius, you have sphere_radius, cylinder_radius, cone_radius disc_radius, etc...

All of which works, but now it's a pain to add new primitives. And *all* supported primitives must be defined in a single monolithic module. And you can't reuse spheres as bounding volumes without either reimplementing them or loosing type safety.

...

That world is a very slow oyster. I've used blobs and isosurfaces in POV-Ray, and they're wonderfully expressive but also incredibly slow.

This explains why my PC contains the fastest CPU that I could afford, and why a while back I actually considered purchasing an 8-core server box that was on special offer for £600. (I'm now really kicking myself that I didn't buy that! 8 Xeon cores... it would have been so fast!) But yeah, I guess it depends on what your patience level is. I like the way a couple of lines of SDL can construct a blob object so intricate that it would be impossible to construct a triangle mesh like that with any known tool... but that's just me.

...

There are also other problems: the isosurface solvers are sometimes wrong, leading to ugly artifacts

Only if you set the max_gradient too low. And that's not surprising, given how the algorithm works... [Indeed, what's surprising is that they built a solver that works for arbitrary functions!]

...

and applying a texture to an isosurface or blob isn't necessarily straightforward.

That is a pitty. On the other hand, with function-based pigments, you can do some pretty impressive stuff. [For example, using the same function as the isosurface to distribute a texture map.]

...

Another issue is that isosurfaces are great for filling the scene with procedurally-generated complexity, but sometimes you need to do actual physics computations in order to make the object do the right thing. For instance, in the image I linked above, the water looks pretty good, but the waves should really be reflecting off of the castle when they hit it, and there's no way I know of to do that without actually doing some sort of finite-element fluid simulation. And since that's already an approximation, and it's already taking up a lot of memory, you might as well just represent it as a triangle mesh (or whatever happens to be the easiest representation to render efficiently).

The way I did that is as follows: 1. Create a function that represents the waves. 2. Sample that function at multiple points along the edges of whatever is protruding through the water. 3. Add another wave source at that point with an appropreate phase offset as per the samples you just took. 4. The actual isosurface is the sum of all the waves thus mentioned. Apparently the samples just need to be "close together" relative to the wavelength of the waves and it looks right. So if you have fairly large waves, you don't need many samples [and many wave functions to sum!] The results looked fairly good to me. I have yet to see anybody doing water simulation using a triangle mesh. Invariably people just make each particle in the system into a blob component, and let POV-Ray smooth the surface.

...

There's really a trade-off here between quality, time, and space.

Indeed. If I was trying to render a feature-length movie Pixar style, I'd probably be making different choices. But I'm not. I'm building still images, and the occasional short animation. For me, quality is the main concern. ;-)

...

Any blob or isosurface can be approximated with triangles, and if the triangles are smaller than a pixel (or sometimes even if they're quite a bit bigger) the difference isn't going to be noticeable.

False. 1. As I understand it, tesselating arbitrary surfaces is still an unsolved problem. 2. If you have curved reflective or refractive surfaces, these can magnify elements of the geometry, so even if every triangle is smaller than a pixel, you can still see the difference. (Ditto for shadow considerations.)

Jim Snow

28 Apr 28 Apr

10:49 p.m.

New subject: Glome.hs-0.3 (bounding objects, heterogeneous lists)

...

...
...
Well, for example, Haskell doesn't have hetrogenous lists - which are trivial in any OOP language. That's quite awkward to get around. Also, both spheres and cylinders have a "radius" property, but then you end up with name clashes. Again, a non-issue in OOP languages. [I gather there's some new GHC language extension to partially solve this one - but only when types are statically known. In the general case, you'd have to write a "radius class" and define instances... yuck!]

It's funny you mention that, those are actually problems I ran into, but (having learned my lesson the hard way in Ocaml), I decided not to try and force the language to do things my way, but instead try to do things in a way that Haskell makes easy. For instance, I started out by making each primitive a separate type (Sphere, Triangle, etc...), and then made a type class that defines a ray intersection method (and a few other odds and ends), but then I realized that I can't do something as simple as store a list of primitives (or if there is in fact a way, I'm not aware of it). Instead, I made a single large sum type with all my primitives: so that now I can easily make a list of primitives. (In fact, that's what a Group is.) I also gave up on using named fields, since coming up with a separate name for each one gets ugly: instead of radius, you have sphere_radius, cylinder_radius, cone_radius disc_radius, etc...

All of which works, but now it's a pain to add new primitives. And *all* supported primitives must be defined in a single monolithic module. And you can't reuse spheres as bounding volumes without either reimplementing them or loosing type safety. The part about spheres and bounding spheres is actually not so much of a

Andrew Coppin wrote: problem: I implemented a general "Bound" primitive that can bound any primitive with any other. In general, you would use a simple object like a sphere or box as the left argument and a complex object as the right argument. data Solid = ... | Bound Solid Solid ... and then to intersect the "Bound" object, you first do a shadow test on the left object before testing the right object: rayint :: Solid -> Ray -> Flt -> Texture -> Rayint rayint (Bound sa sb) r d t = let (Ray orig _) = r in if inside sa orig || shadow sa r d then rayint sb r d t else RayMiss Some kinds of primitives aren't likely to work well for bounding since they don't have a well-defined inside and outside (like triangles and discs), but I'd rather provide maximum flexibility and assume that users know how to use it sensibly. http://www.haskell.org/haskellwiki/Glome_tutorial#Bounding_Objects As for the maintenance issues, that is still a problem. It would be nice to split all the individual primitives into their own modules. Sebastian Sylvan wrote:

...

On 4/27/08, Jim Snow wrote:

...
...
For instance, I started out by making each primitive a separate type (Sphere, Triangle, etc...), and then made a type class that defines a ray intersection method (and a few other odds and ends), but then I realized that I can't do something as simple as store a list of primitives (or if there is in fact a way, I'm not aware of it).

You can, by using a "wrapper" type which wraps up any instance of the Intersect class:

data Intersectable = forall a. Intersect a => MkIntersectable a

For convenience you probably want to instantiate this wrapper in the class itself:

instance Intersect Intersectable where rayIntersection (MkIntersectable x) ray = rayIntersection x ray boundingVolume (MkIntersectable x) = boundingVolume x -- etc...

Now you can stick Intersectables in lists etc.

I think that sounds like what I ought to be doing. -jim

Jim Snow

25 Apr 25 Apr

4:34 a.m.

Andrew Coppin wrote:

...

Wow. The POV-Ray guys are talking about Haskell [or rather, my personal addiction to it] and the Haskell guys are talking about POV-Ray... on the same day... How unlikely is that? ;-)

That's odd; I had a personal addiction to POV-Ray for a few years, and just now have started using Haskell.

...

I've been using POV-Ray for a long time. I like it for several reasons:

1. It's the only program I've ever seen [on a PC] that does ray tracing. [I'm sure there must be others...] 2. It's the only program I've seen that can render *real* curves, not fake trickery with triangle meshes. 3. It can render *arbitrary* mathematical surfaces. Want to render a 3D slice of the 4D cubic Mandelbrot set? No problem! 4. It has a novel "scene description language", which does far more than describe scenes. It's Turing-complete, and you can build physics engines with it. [It's painfully slow though!]

The Scene Description Language (SDL) is the best and worst thing about POV-Ray. It's very intuitive and user-friendly, so a person can reasonably write a complex scene in pure code without using an external GUI editor. Unfortunately, the SDL is so complex it's well-nigh impossible to support in other third-party applications. It's also slow. I don't know if this is still the case, but the standard way of doing an animation was to reference a "clock" variable in your scene source code that went from 0 to 1; for instance, a command to make a swing swing back and forth might looks like this: "rotate 15*sin((clock/2)*seconds*(2*pi)-((2/3)*pi))*x" "seconds" here is a variable set to the number of seconds in the animation, and "x" is the X axis unit vector <1,0,0>. The (2/3)*pi thing is to make it swing out of phase with the other swings. (this rather obfuscatory example taken from an actual ancient povray source file, you can see a rendering here: http://syn.cs.pdx.edu/~jsnow/playground.png) The scene then has to be re-parsed for every frame. For complex scenes, the scene parsing could take longer than the actual render. There are many other PC programs that do ray tracing, but POV-Ray is the only one I've had any experience with.

...

The POV-Ray team is currently working on the first multi-threaded version. [After years of saying it would never happen.] It's taking a while. (That's partly because the developers take product quality very seriously.) It should be interesting when it's done, but it's still taking a while. Personally, I'd quite like to write my own ray tracer to address some of these limitations. But every time I try, I end up hitting design issues [Haskell works very differently to Java] or performance issues [which I don't even know how to begin debugging]. Not to mention that POV-Ray uses sophisticated techniques like volume bounding that I know nothing about. (There's nothing like using an inherantly superior algorithm to make your code orders of magnitude faster...)

I haven't really run into any issues where Haskell didn't let me do what I want, except for the performance counters thing mentioned way back at the beginning of this thread (and which I've more or less given up on for now, since the acceleration structure seems to be working okay and I have other things to work on). I would certainly welcome any patches to Glome if you want to contribute in some way rather than write something from scratch. A good acceleration structure definitely makes everything go a lot faster. It's the difference between rendering a scene in a few seconds or ten minutes. BIH is what I'm using. It's relatively new. Here's a paper about it: http://ainc.de/Research/BIH.pdf The actual constructor is based loosely on this pseudocode: http://ompf.org/forum/viewtopic.php?p=1411#p1411 Evan Laforge wrote:

...

Not knowing anything about raytracing, one of the things I found interesting about the paper was that he claimed that the speedups were from using coherent ray packets, and that the shader model was orthogonal, and enough much is spent raycasting that the shader code to make much difference. The implication was that you could graft a packet style raycasting engine onto even povray and give it a nice speed boost... though you might have to lose the nifty "real" shapes to tessellated approximations.

Is this accurate? True but reality is more complicated?

You don't need to drop support for the whole multitude of primitives, though the ones that are packet-friendly will render faster. Many modern ray tracers spend most of their time traversing the acceleration structure rather than doing intersection tests with actual geometry, so all you really need to do to get most of the benefit of packet tracing is to make the acceleration structure support packets. (For the actual geometry, you can fall back on mono-rays.) I tried 2x2 packets out last night and got about a 10-15% speed improvement on the level-3 SPD sphereflake. (Shadow and reflection rays are still mono rays; if I converted them over, the improvement would presumably be more significant.) The simplifying assumption you make when you're traversing the acceleration structure with packets is that if one ray in the packet hits a bounding volume, you assume that they all do. That adds a little overhead if the rays are coherent (all starting near a common origin and pointing in almost the same direction), or a lot of overhead if they aren't coherent. If used appropriately, the performance gains outweigh the overhead by a considerable margin. Jon Harrop wrote:

...

On Thursday 24 April 2008 20:29:50 Andrew Coppin wrote:

...
...
2. It's the only program I've seen that can render *real* curves, not fake trickery with triangle meshes.

What you called "fake trickery with triangle meshes" is the core of all modern computer graphics. Focussing on ray tracing instead of that is rather missing the point, IMHO.

I agree that triangles are very important, and I support them in Glome. Cones and spheres and boxes and the like can be made into some fairly elaborate scenes, but you can't really make, say, a realistic human figure out of those sorts of primitives. Also, real data sets like terrain or 3d-scanned geometry are easy to represent as triangles. On the other hand, sometimes all you want is a box or a shiny sphere and it's silly (not to mention a waste of memory) to represent either as a bunch of triangle. So it's appropriate to support both approaches, so you can do whatever makes the most sense for the application. Ray tracing and triangles isn't an either-or thing. Some ray tracers only support triangles. There is a speed penalty for supporting other kinds of primitives, because then the ray tracer has to branch to figure out which ray-intersection test to run on each primitive, but I think the flexibility is worth the cost. -jim

Andrew Coppin

26 Apr 26 Apr

7:50 a.m.

Jim Snow wrote:

...

The Scene Description Language (SDL) is the best and worst thing about POV-Ray.

Indeed.

...

It's very intuitive and user-friendly, so a person can reasonably write a complex scene in pure code without using an external GUI editor.

Not to mention that there's a smooth and continuous path from "I rendered a reflective sphere on a chequered plane" to "I build a complex CSG object" to "I wrote my first animation" to "I just built a particle physics system / fractal terrain generator / inverse kinetics animator / whatever".

...

Unfortunately, the SDL is so complex it's well-nigh impossible to support in other third-party applications.

I don't think it would be "impossible" to support in 3rd party apps. A lot of work, certainly! Especially the way the parser does parsing and macro expansion in the same pass, so a macro body need not contain a syntactically-complete code fragment. I think if somebody sat down and wrote something that would do all the macro expansion, variable substitution and so forth to leave just few geometry descriptions, *those* would be quite easy to parse and manipulate for 3rd parties.

...

It's also slow. The scene has to be re-parsed for every frame. For complex scenes, the scene parsing could take longer than the actual render.

For complex scenes, or scenes involving complex algorithms like fractal generation or something, parsing can indeed take far longer than rendering. The SDL is basically a scripting language. It's kinda flexible in its own way, but totally useless in speed terms. It also lacks all the features of a "real" programming language - abstraction, encapsulation, rich data types, powerful control structures... actually, all the things that Haskell excells at. I did start a project a while back to try and make it possible to describe POV-Ray scenes in Haskell, thus leveraging the full power of the language. But that never came to much... [The "clock" variable is one way to animate, yes. Another is to use the "frame_number" variable. A common idiom is to generate an initial state at frame #1, and on each frame, read the state from disk, update it, write the updated version back to disk, and then render the frame. Obviously this destroys the ability to render frames in arbitrary order or distribute them across a render farm, but that's rather unavoidable. Note that the loading and saving must be hand-coded, from scratch, every time you want to do this. Tedious...]

...

...
Personally, I'd quite like to write my own ray tracer to address some of these limitations. But every time I try, I end up hitting design issues [Haskell works very differently to Java] or performance issues [which I don't even know how to begin debugging]. Not to mention that POV-Ray uses sophisticated techniques like volume bounding that I know nothing about. (There's nothing like using an inherantly superior algorithm to make your code orders of magnitude faster...)

I haven't really run into any issues where Haskell didn't let me do what I want, except for the performance counters thing mentioned way back at the beginning of this thread (and which I've more or less given up on for now, since the acceleration structure seems to be working okay and I have other things to work on).

Well, for example, Haskell doesn't have hetrogenous lists - which are trivial in any OOP language. That's quite awkward to get around. Also, both spheres and cylinders have a "radius" property, but then you end up with name clashes. Again, a non-issue in OOP languages. [I gather there's some new GHC language extension to partially solve this one - but only when types are statically known. In the general case, you'd have to write a "radius class" and define instances... yuck!]

...

A good acceleration structure definitely makes everything go a lot faster. It's the difference between rendering a scene in a few seconds or ten minutes.

BIH is what I'm using. It's relatively new. Here's a paper about it: http://ainc.de/Research/BIH.pdf

The actual constructor is based loosely on this pseudocode: http://ompf.org/forum/viewtopic.php?p=1411#p1411

I'll take a look. Certainly all the ray tracers I've written just test every ray against every object - O(n) complexity? Hmm, that's not good[tm]...

...

You don't need to drop support for the whole multitude of primitives, though the ones that are packet-friendly will render faster. Many modern ray tracers spend most of their time traversing the acceleration structure rather than doing intersection tests with actual geometry, so all you really need to do to get most of the benefit of packet tracing is to make the acceleration structure support packets. (For the actual geometry, you can fall back on mono-rays.) I tried 2x2 packets out last night and got about a 10-15% speed improvement on the level-3 SPD sphereflake. (Shadow and reflection rays are still mono rays; if I converted them over, the improvement would presumably be more significant.) The simplifying assumption you make when you're traversing the acceleration structure with packets is that if one ray in the packet hits a bounding volume, you assume that they all do. That adds a little overhead if the rays are coherent (all starting near a common origin and pointing in almost the same direction), or a lot of overhead if they aren't coherent. If used appropriately, the performance gains outweigh the overhead by a considerable margin.

Something else to add to my list of "things I should look into"...

...

Cones and spheres and boxes and the like can be made into some fairly elaborate scenes, but you can't really make, say, a realistic human figure out of those sorts of primitives.

No, but with things like isosurfaces and parametric surfaces, the world is your oyster. And you can do a surprising amount of things with blobs. I understand that all those curved surfaces which are currently popular in consumer electronics are done with splines, not triangles...

...

Also, real data sets like terrain or 3d-scanned geometry are easy to represent as triangles.

Well that's true enough...

...

It's appropriate to support both approaches, so you can do whatever makes the most sense for the application.

Indeed. And since I never use triangles for anything, I've never bothered working out how to support them. ;-)

Evan Laforge

24 Apr 24 Apr

8:05 p.m.

...

I've ended up writing something more like POV-Ray than OpenRT, and that's fine with me. I'd rather have something that's slower but more expressive than fast but inflexible. (The former is also perhaps more easily attainable, particularly in Haskell.)

Not knowing anything about raytracing, one of the things I found interesting about the paper was that he claimed that the speedups were from using coherent ray packets, and that the shader model was orthogonal, and enough much is spent raycasting that the shader code to make much difference. The implication was that you could graft a packet style raycasting engine onto even povray and give it a nice speed boost... though you might have to lose the nifty "real" shapes to tessellated approximations. Is this accurate? True but reality is more complicated?

Sebastian Sylvan

18 Apr 18 Apr

9:11 p.m.

On Fri, Apr 18, 2008 at 9:10 PM, Jim Snow wrote:

...

The scene is shared between threads. (Complex scenes can be quite large.) I'm assuming this is handled as a read-only shared memory region or something similar, as one might expect, and is not actually copied.

Ah, when people say "shared memory concurrency", they usually mean shared *mutable* memory concurrency, which this isn't. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862

Bryan O'Sullivan

9:36 p.m.

Jim Snow wrote:

...

The concurrency bug has to do with excessive memory use, and was discussed earlier here on the mailing list (search for Glome). http://hackage.haskell.org/trac/ghc/ticket/2185

Interesting. I looked at your test case. I can reproduce your problem when I build with the threaded runtime and run with a single core, but not if I use +RTS -N2. Did you overlook the possibility that you may not have told GHC how many cores to use? Also, your code is sprinkled with many more strictness annotations than it needs.

Simon Marlow

23 Apr 23 Apr

11:25 p.m.

Bryan O'Sullivan wrote:

...

Jim Snow wrote:

...
The concurrency bug has to do with excessive memory use, and was discussed earlier here on the mailing list (search for Glome). http://hackage.haskell.org/trac/ghc/ticket/2185

Interesting. I looked at your test case. I can reproduce your problem when I build with the threaded runtime and run with a single core, but not if I use +RTS -N2. Did you overlook the possibility that you may not have told GHC how many cores to use?

There's definitely a bug here, regardless of whether this example demonstrates it. Use of par shouldn't introduce a space leak, and currently it can. (fortunately I'm fixing it as we speak...) Cheers, Simon

Jim Snow

19 Apr 19 Apr

9:09 p.m.

New subject: announce: Glome-hs-0.4.1 (Haskell raytracer, now with cabal support)

Don Stewart wrote:

...

jsnow:

...
A new version of my raytracer is out...

Very impressive. Did you consider cabalising the Haskell code, so it can be easily distributed from hackage.haskell.org?

...

-- Don

A new version is up on hackage now: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/glome-hs In addition to the cabal conversion, it has a few bug fixes, performance improvements, and various code cleanups relative to version 0.3. If anyone has any problems with it, let me know. -jim

6284

Age (days ago)

6295

Last active (days ago)

List overview

Download

32 comments

15 participants

participants (15)

Andrew Coppin
Bryan O'Sullivan
Bulat Ziganshin
David Roundy
David Roundy
david48
Derek Elkins
Don Stewart
Evan Laforge
Jim Snow
Jon Harrop
Ketil Malde
Sebastian Sylvan
Simon Marlow
Stefan O'Rear