Doubts in bytecode

Hello, I've two questions which are unrelated. I'll ask them one by one. Q1) I've this function: myFoo :: (Int -> Int -> Int) -> Int -> Int -> Int myFoo f x y = f (x+y) (x-y) And it compiles to the following bytecode (I've pasted just the byte code part, just ignore other details). bytes2word(NEEDHEAP_I32,PUSH_HEAP,HEAP_CVAL_I3,HEAP_ARG_ARG) , bytes2word(2,3,PUSH_HEAP,HEAP_CVAL_I4) , bytes2word(HEAP_ARG_ARG,2,3,PUSH_P1) , bytes2word(0,PUSH_P1,2,PUSH_ZAP_ARG_I1) , bytes2word(ZAP_ARG_I2,ZAP_ARG_I3,ZAP_STACK_P1,4) , bytes2word(ZAP_STACK_P1,3,EVAL,NEEDHEAP_I32) , bytes2word(APPLY,2,RETURN_EVAL,ENDCODE) Okay, as I've come to understand, this is how things work out. 1. Construct the graph for (x + y) and push its root onto stack. PUSH_HEAP, HEAP_CVAL_I3, HEAP_ARG_ARG, 2, 3 2. Construct the graph for (x - y) and push its root onto stack. PUSH_HEAP,HEAP_CVAL_I4, HEAP_ARG_ARG, 2, 3 3. Now push the first argument onto heap, make top of stack point to it and evaluate it then apply its result to top two elements on stack. (I understand that arguments have to be ZAPped before we start evaluating them, so I know what ZAP is doing here) PUSH_ZAP_ARG_I1, ZAP_ARG_I2,ZAP_ARG_I3,ZAP_STACK_P1,4 ZAP_STACK_P1,3,EVAL,NEEDHEAP_I32, APPLY,2 Now in between all these contents of stack that were pointing to two graphs (i.e x+y and x-y) are being replicated on the stack and then one of the copies (of both the graphs) is being zapped as can be seen from the code snippet below. PUSH_P1, 0, PUSH_P1,2, ,ZAP_STACK_P1,4, ZAP_STACK_P1,3 This has caught me off guard. I just couldn't figure out why is this being done ? i.e Why copy the contents of stack and then zap one of them. Also what all are the uses of ZAP nodes apart from black hole detection. Do ZAP nodes help in garbage collection too ? Now second doubt. Q2) As Malcolm explained in detail this is the purpose of CONSTR macro CONSTR(c,s,ws) Construct a tag (i.e. a header for a data node) where there is a mixture of pointers and basic values amongst the data items I compiled various examples but till now I haven't seen a single example where CONSTR is used for a mixture of pointers and basic values. It has always been for basic values. I'll illustrate with an example. data MyData a = One | Two a | Three a (MyData a) (MyData a) | Four a a a a a intToData :: Int -> MyData Int intToData n = Three 12 (Two 13) One Now the constant table for function "intToData" contains these entries. , CONSTR(2,3,0) , CONSTR(0,0,0) , CONSTR(1,1,0) Clearly, they correspond to the constructors "Three", "One" and "Two" respectively. I expected to see something like CONSTR(2,3, 2) for "Three" constructor because it's second and third field are supposed to be pointers. But this is not the case. Can you please give an example where CONSTR is actually used to construct a mixture of pointer and basic data nodes. Thanks for your patience in going through this. Any help regarding this will be highly appreciated. Regards, ------------------------------------ Arunkumar S Jadhav, Masters Student, KReSIT, IIT-Bombay, India Ph: +91-22-25764967 http://www.it.iitb.ac.in/~arunk ------------------------------------ I exist because I work.

Arunkumar S Jadhav
Now in between all these contents of stack that were pointing to two graphs (i.e x+y and x-y) are being replicated on the stack and then one of the copies (of both the graphs) is being zapped.
Yes, it is curious. I think the main reason is to swap the order of the values, so that the application of f is correct, i.e. f (x+y) (x-y) rather than f (x-y) (x+y) But I am also puzzled why the original copies of (x+y) and (x-y) remain on the stack, and why those stack entries are zapped.
Also what all are the uses of ZAP nodes apart from black hole detection. Do ZAP nodes help in garbage collection too ?
In theory the GC could recover all the space in a zapped heap node apart from the first pointer (which will eventually be overwritten with an indirection to the final result). However, the nhc98 collector does not currently do this, so I believe at the moment the ZAP bit is only used for black hole detection.
Q2) As Malcolm explained in detail this is the purpose of CONSTR macro
CONSTR(c,s,ws) Construct a tag (i.e. a header for a data node) where there is a mixture of pointers and basic values amongst the data items
It seems I was almost right in this description, but mixed up the pointers/non-pointers. s = size = total number of data items in the node ws = number of data items which are pointers to other nodes The number of non-pointers is therefore (s - ws). should read: s = size = total number of data items in the node ws = number of basic data items (non-pointers) The number of pointers is therefore (s - ws).
I compiled various examples but till now I haven't seen a single example where CONSTR is used for a mixture of pointers and basic values. It has always been for basic values.
In fact, every example has only /pointers/, with no basic data values. This is because basic data values in a polymorphic lazy language are nearly always represented as a heap pointer to the value ("boxed"), which is stored separately. The only case in which the basic value can be "in-lined" in a data structure, is when it is explicitly "unboxed" by the programmer (or implicitly "unboxed" by an optimising compiler). In the GHC compiler, for instance, unboxed values are marked in the source code with a # symbol, like this example on the GHC mailing list today: forn :: a -> Int# -> IO () forn a n | n >=# 10000# = return () | otherwise = fory a 0# >> forn a (n +# 1#) You can see that not only the literal numeric values are unboxed, but their type is different, and operations on unboxed values are also marked with a #, because their code must be different from the standard boxed versions. nhc98 has some rudimentary support for unboxed values, which is why the CONSTR macro allows to specify how many fields of the data structure are unboxed. However, I believe this compiler support was never completed by the original author, because the parser does not accept the # marks. There is one hand-written file in the runtime system that actually uses unboxed values - src/runtime/Builtin/cPack.c - but I don't think the functions defined there are imported into nhc98's libraries, so it is essentially dead code at the moment. Regards, Malcolm
participants (2)
-
Arunkumar S Jadhav
-
Malcolm Wallace