
Bulat Ziganshin wrote:
* C++ calling stack should be used as much as possible * parameters are passed in C++ way: "factorial(int n, int r)"
SM> This is unworkable, I'm afraid. Go and read Simon's paper on the STG:
SM> http://citeseer.ist.psu.edu/peytonjones92implementing.html
SM> there are two main reasons we don't use the C stack: garbage collection SM> and tail calls. What do you plan to do about GC?
minimalistic solution i see - just use two stacks. unboxed values are passed using the C conventions and C compiler will be able to optimize using of this stack. boxed values are passed using the current Sp[] machinery and therefore GC will not be changed
How do you propose to do thread switching? Incedentally, GHC had two stacks a long time ago (before 4.00) I rewrote the runtime and code generator to replace it with one stack. It was about 6 months work, if I remember correctly. I think you misused the words "just" and "minimalistic" :-) I know gcc does (some) tail-call optimisation these days. You don't get any *guarantee* that a call will be implemented as a tail-call, though, and the conditions are quite complex. The gcc folks treat it as an optimisation, rather than a fundamental part of the operational semantics, which is what you need for Haskell.
SM> This is called semi-tagging, it was tried a long time ago. Certainly SM> worth trying again, but I very much doubt the gains would be anything SM> like you claim, and it increases code size.
we will replace 2 unpredictable jumps with one that will not be even executed if value is in WHNF. at least, i'm very displeased by slowness of lazy values evaluation in GHC. for example, my binary serialization package writes 60 mb/s working with unboxed arrays and only 20 mb/s working with the lists
You need to *try* these things and measure the difference. Quite often I'm surprised with the actual results I get from an optimisation that looks on paper to be worthwhile. Today's processors with 3-level caches are complicated beasts. Semi tagging increases code size, so unless you can make some significant gains to make up for it, you've lost already. Let me give you another example: GHC puts info tables next to the entry code for a closure. On paper this looks like it might be a bad idea: it pollutes the instruction cache with data that is rarely accessed during execution. It's hard to arrange (this is one thing the mangler does), so we'd like to avoid it if possible. However, doing this eliminates an indirection when entering a closure, reduces the size of info tables by one word, and reduces code size. These effects together are more beneficial than the code-cache-pollution effect - last time I measured it the difference was ~10%. (turn off TABLES_NEXT_TO_CODE and try it yourself).
SM> There are other schemes, including this one that we particularly like: SM> use a spare bit in a pointer to indicate "points to an evaluated value". SM> Since there are two spare bits, you can go further and use the 4 SM> values to mean:
SM> 0: possibly unevaluated SM> 1: evaluated, tag 0 SM> 2: evaluated, tag 1 SM> 3: evaluated, tag 2
SM> I believe Robert Ennals tried this as part of his spec eval SM> implementation, but IIRC he didn't get separate measurements for it (or SM> it didn't improve things much). Note that this increases code size too.
to be exact, we had only 2^30-1 valid pointers, all other values are ready to use! :)
True, but I'm assuming you often want to store a pointer in addition to the tag (eg. for a cons cell). For enumerations, you're quite right that the whole word can be used for the tag as long as it is distinguished from a pointer. Cheers, Simon