
On Thursday 14 September 2006 14:40, you wrote:
Neil Mitchell wrote:
This is how my debugger works: Rather than logging information to a file as the program runs (like HOOD) or generating the redex trail as a separate data structure in memory (which I believe Hat does; please correct me if I'm wrong), we use the YHC/NHC heap itself as the Evaluation Dependence Tree. We change the virtual machine so that when a function returns, just before the root of the tree representing the redex is overwritten by an indirection node to the root of the result tree, we keep a backup copy of the word that is overwritten. Also, the garbage collector is disabled so that indirection nodes are not eliminated. In addition, the top of stack in the caller (immediately after the return) will be a pointer to the redex node rather than the result. The attached slide gives an example of some of this; please look at it.
Interesting, one thing you might find useful is that Yhc makes it easy to add extra information to heap nodes. So for example you could have something like this (view in fixed width font). +-----------+----------+-----------+----------+ | (+) | empty | 3ptr | 5ptr | +-----------+----------+-----------+----------+ when the value is computed it becomes +-----------+---------+-----------+----------+ | IND to 8 | (+) | 3ptr | 5ptr | +-----------+---------+-----------+----------+ That way you wouldn't have to save a separate 'overwritten' list (though it does make every heap cell one word larger).
When the VM during evaluation (or the debugger) examines such a node (a node representing a function application that finished evalution), it follows the indirection and everything works as expected -- once a function is evaluted, its result is used (rather than the function being re-evaluated). But the debugger can look at the saved state and examine the redex -- the body of the function that was executed.
Yup the above scheme would also work that way :-)
We also need a breakpoint instruction to execute step-by-step (or rather function-by-function) -- in effect, controlled execution by the user.
This should be straightforward to implement.
Am I correct in assuming that if we want to leave indirection nodes as is, there is no work for the garbage collector to do? What about memory requirements once the GC is disabled? I believe that the general opinion is that memory can be problem, but on computers with say 512 MB memory is it still a problem?
Haskell tends to work by allocating lots of closures describing the computation to be performed and then reducing those closures to a value. The way it tends to work is that the closures are large and the value is small. So once the value has been computed the garbage collector throws away all the information on how to compute the value (since it's not needed) and just keeps the value. That value might then be an argument to another closure, which reduces to a value and so on. The problem is if you want to determine whether the computation was correct you need an accurate record of what computation was performed, so you can't just throw away the 'how to compute it' information. Thus you're quite right, you would need to disable the GC. The issue you're likely to come up against is that Haskell burns memory like nothing. Here's a table of some measurements on my AMD64 of some the conformance tests we use (all based on real programs submitted by Haskell users): Test Run time Total Memory Maximum Live Anna 9.8s 703Mb 1.25Mb Fluid 1.0s 64Mb 176Kb Wheelsieve2 14.0s 904Mb 13Mb Gamteb 11.3s 904Mb 2.9Mb 'Total Memory' is the total amount of memory allocated during the program run and 'Maximum Live' is the largest quantity of memory that was actually copied over during a GC. As you can see Haskell programs tend to allocate a *lot* of memory and then throw most of it away again. So if you keep everything you can exhaust even gigabytes of memory really very quickly with a computationally heavy program. Of course if you only want to debug very short running programs you should be okay.
I'm not sure how Freja works; I think it keeps a copy of the EDT (as it's created) in memory separate from the program's heap, if you understand what I mean. Am I correct? In any case it runs only on SPARC, and so does not exactly suit my purposes.
Freja does things 'piecemeal' which is it runs for a while and gathers a bit of the trace in memory and then stops. Then if the user navigating the trace wanders outside the data Freja has collected then Freja re-runs the program (from the beginning) and collects the next bit of trace in memory.
I'm largely referring to the assumptions I made in the description above -- the Eval, Return & ReturnEval instructions, the indirection nodes, etc. I think these won't be changing, right? I don't need a Haskell implementation that can compile all programs, just enough non-trivial ones to demonstrate my work. Things like garbage collection algorithms, type checking, etc can change without any problem for me. I think you now understand what I mean when I ask about Yhc being stable.
I don't think there's anything likely to change in Yhc anytime soon that is going to cause you a problem :-) Hope that helps :-) Tom