
John,
After all on the average call where an object of that size is free already it is a single array lookup, we have:
(a) fetch pointer (one read) (b) fetch next (one read) (c) store next as current (one write)
This is true for memory access; it is not true for memory allocation. I do not know how malloc allocates memory on Windows but on general POSIX systems the kernel uses a linked list and lots of other management things to reduce fragmentation, such KMEMSTAT. Malloc may also block, which is something that you have more control over in your own garbage collected heap. A really good explanation for the problem of rapidly allocating and deallocating temporary blocks of memory under 35kb is here: http://ridiculousfish.com/blog/ archives/2006/05/16/36/ . In any case, Simon Marlow had previously mentioned that alloc (from GHC's heap) is faster than malloc. He is almost certainly correct, although I hope the difference will not be that great and the only thing I have to worry about is ForeignPtr. We shall see whether malloc-memory makes a difference in the benchmarks.
A purely functional system -- one which does NOT convert self tail calls into jumps and reuse storage -- can perhaps be faster, since each thread can have its own local arena to allocate from (without need of any write barrier) .. however it isn't clear to me what the trade off is between cache misses and parallelism.
That is interesting but I do not understand whether your mention of self tail calls turned into jumps was low or high level. From the context it seems as if you are talking about a high level implementation; each function running in a separate thread. GHC's RTS does use many separate threads (the RTS is threaded by default for the latest version, 6.6). As for turning self tail calls into jumps at the low level, GHC does do this through C-- (the GHC implementation of C-- is called Cmm). I believe that is both faster and more memory efficient than a high level threaded system. Philosophically speaking, even if Simon Peyton-Jones developed Cmm to solve the problem of efficient functional computations, Cmm has turned the research of Haskell from research on a computer language to research on a system of computation. (Maybe that is what he meant when some time ago he wrote John Meacham and said that they (the GHC researchers) considered compiling via C a dead end.) A language can be anything: all it requires is a means of being translated into machine code; pseudo-intelligent and advanced compiler systems such as gcc, JHC for Haskell or OCaml for the Caml version of ML, may translate programming languages into machine code but the underlying computation remains largely sequential. The curious thing about GHC- Haskell is that through the prism of Cmm, which enforces such things as immutable variables and recursion right at the machine level, Haskell is less a language of translation to sequential machine code and more a description of a computational model. If you still think I am wrong about this, consider the possibility that Haskell with Cmm is a modern research project in the same concept that motivated Lisp: a different model of computation. Best regards, Peter