
Hello, I'm building a debugger for Haskell that works by modifying the runtime, and I need to choose between yhc & nhc. Everything I've seen points towards yhc, but I wonder how stable it is. I know it's not stable for production use as a compiler, but as a platform for a debugger? For my purposes, it does not have to work with every program. I find that yhc works fine with almost all of the half-dozen small programs (<1 page) I threw at it, but chokes on many of the hugs demos. So would you suggest I work with yhc rather than nhc? Any comments you can give me are greatly appreciated. Thanks. -- Kartik Vaddadi. Home: www.cse.iitb.ac.in/~kart Blogs: kartik.infogami.com, kartik-log.blogspot.com, kartik-rlog.blogspot.com Alternate mail ID: kartik.vad@gmail.com "50% Reservation, 100% Politics" - Protest the Indian government's decision to increase reservation in private educational institutions (yfemumbai.blogspot.com)

Hi Kartik,
If you want to modify the runtime of one of Yhc and nhc, then Yhc is
really the sensible choice! The nhc runtime is mashed into the program
and compiled as one chunk of (not very nice looking) C. The Yhc
runtime is entirely separate doing runtime bytecode interpretation -
making it much much easier to play with.
Can you describe what you mean as stable? Ability to compile haskell
programs? Code change in the runtime? Code change in the compiler?
Also, can you give us any information on your debugging stuff, since
Yhc already supports some debugging stuff related to Hat in the
bytecode - it would be interesting to see how your stuff differs.
Thanks
Neil
On 9/13/06, Kartik Vaddadi
Hello, I'm building a debugger for Haskell that works by modifying the runtime, and I need to choose between yhc & nhc. Everything I've seen points towards yhc, but I wonder how stable it is. I know it's not stable for production use as a compiler, but as a platform for a debugger? For my purposes, it does not have to work with every program. I find that yhc works fine with almost all of the half-dozen small programs (<1 page) I threw at it, but chokes on many of the hugs demos.
So would you suggest I work with yhc rather than nhc? Any comments you can give me are greatly appreciated. Thanks.
-- Kartik Vaddadi.
Home: www.cse.iitb.ac.in/~kart Blogs: kartik.infogami.com, kartik-log.blogspot.com, kartik-rlog.blogspot.com Alternate mail ID: kartik.vad@gmail.com
"50% Reservation, 100% Politics" - Protest the Indian government's decision to increase reservation in private educational institutions (yfemumbai.blogspot.com)
_______________________________________________ Yhc mailing list Yhc@haskell.org http://www.haskell.org/mailman/listinfo/yhc

Neil Mitchell wrote:
If you want to modify the runtime of one of Yhc and nhc, then Yhc is really the sensible choice! The nhc runtime is mashed into the program and compiled as one chunk of (not very nice looking) C. The Yhc runtime is entirely separate doing runtime bytecode interpretation - making it much much easier to play with.
Thank you. That (and Tom's comments) really cleared up the issue for me. I'm almost certain Yhc is the right choice.
Also, can you give us any information on your debugging stuff, since Yhc already supports some debugging stuff related to Hat in the bytecode - it would be interesting to see how your stuff differs.
This is how my debugger works: Rather than logging information to a file as the program runs (like HOOD) or generating the redex trail as a separate data structure in memory (which I believe Hat does; please correct me if I'm wrong), we use the YHC/NHC heap itself as the Evaluation Dependence Tree. We change the virtual machine so that when a function returns, just before the root of the tree representing the redex is overwritten by an indirection node to the root of the result tree, we keep a backup copy of the word that is overwritten. Also, the garbage collector is disabled so that indirection nodes are not eliminated. In addition, the top of stack in the caller (immediately after the return) will be a pointer to the redex node rather than the result. The attached slide gives an example of some of this; please look at it. When the VM during evaluation (or the debugger) examines such a node (a node representing a function application that finished evalution), it follows the indirection and everything works as expected -- once a function is evaluted, its result is used (rather than the function being re-evaluated). But the debugger can look at the saved state and examine the redex -- the body of the function that was executed. We also need a breakpoint instruction to execute step-by-step (or rather function-by-function) -- in effect, controlled execution by the user. Am I correct in assuming that if we want to leave indirection nodes as is, there is no work for the garbage collector to do? What about memory requirements once the GC is disabled? I believe that the general opinion is that memory can be problem, but on computers with say 512 MB memory is it still a problem? I'm not sure how Freja works; I think it keeps a copy of the EDT (as it's created) in memory separate from the program's heap, if you understand what I mean. Am I correct? In any case it runs only on SPARC, and so does not exactly suit my purposes.
Can you describe what you mean as stable? Ability to compile haskell programs? Code change in the runtime? Code change in the compiler?
I'm largely referring to the assumptions I made in the description above -- the Eval, Return & ReturnEval instructions, the indirection nodes, etc. I think these won't be changing, right? I don't need a Haskell implementation that can compile all programs, just enough non-trivial ones to demonstrate my work. Things like garbage collection algorithms, type checking, etc can change without any problem for me. I think you now understand what I mean when I ask about Yhc being stable. Thanks a lot for your time. I eagerly look forward to your comments. -- Kartik Vaddadi. Home: www.cse.iitb.ac.in/~kart Blogs: kartik.infogami.com, kartik-log.blogspot.com, kartik-rlog.blogspot.com Alternate mail ID: kartik.vad@gmail.com "50% Reservation, 100% Politics" - Protest the Indian government's decision to increase reservation in private educational institutions (yfemumbai.blogspot.com)

Hi,
Very interesting. I believe that the original version of Hat (way back
in the past) did this, but the move to a file trace was needed because
of a lack of memory. I guess now that memory is much cheaper and
larger, this is possibly useful once more.
As for your issue of stability, now that you have defined your
project, I'm much more confident to say Yhc is definately the vehicle
of choice. There are lots of projects that play with the bytecode -
convert to .NET, interpret in Python/Java, convert to register
machine, libraries for manipulation in Haskell etc.
As for garbage collection, just turning it off will probably get you
quite far. You might need to do some work to tie the original code
back to the interpretted stuff (desugaring complicates some things),
but some of this work has been done by Tom for the Hat stuff.
Thanks
Neil
On 9/14/06, Kartik Vaddadi
Neil Mitchell wrote:
If you want to modify the runtime of one of Yhc and nhc, then Yhc is really the sensible choice! The nhc runtime is mashed into the program and compiled as one chunk of (not very nice looking) C. The Yhc runtime is entirely separate doing runtime bytecode interpretation - making it much much easier to play with.
Thank you. That (and Tom's comments) really cleared up the issue for me. I'm almost certain Yhc is the right choice.
Also, can you give us any information on your debugging stuff, since Yhc already supports some debugging stuff related to Hat in the bytecode - it would be interesting to see how your stuff differs.
This is how my debugger works: Rather than logging information to a file as the program runs (like HOOD) or generating the redex trail as a separate data structure in memory (which I believe Hat does; please correct me if I'm wrong), we use the YHC/NHC heap itself as the Evaluation Dependence Tree. We change the virtual machine so that when a function returns, just before the root of the tree representing the redex is overwritten by an indirection node to the root of the result tree, we keep a backup copy of the word that is overwritten. Also, the garbage collector is disabled so that indirection nodes are not eliminated. In addition, the top of stack in the caller (immediately after the return) will be a pointer to the redex node rather than the result. The attached slide gives an example of some of this; please look at it.
When the VM during evaluation (or the debugger) examines such a node (a node representing a function application that finished evalution), it follows the indirection and everything works as expected -- once a function is evaluted, its result is used (rather than the function being re-evaluated). But the debugger can look at the saved state and examine the redex -- the body of the function that was executed.
We also need a breakpoint instruction to execute step-by-step (or rather function-by-function) -- in effect, controlled execution by the user.
Am I correct in assuming that if we want to leave indirection nodes as is, there is no work for the garbage collector to do? What about memory requirements once the GC is disabled? I believe that the general opinion is that memory can be problem, but on computers with say 512 MB memory is it still a problem?
I'm not sure how Freja works; I think it keeps a copy of the EDT (as it's created) in memory separate from the program's heap, if you understand what I mean. Am I correct? In any case it runs only on SPARC, and so does not exactly suit my purposes.
Can you describe what you mean as stable? Ability to compile haskell programs? Code change in the runtime? Code change in the compiler?
I'm largely referring to the assumptions I made in the description above -- the Eval, Return & ReturnEval instructions, the indirection nodes, etc. I think these won't be changing, right? I don't need a Haskell implementation that can compile all programs, just enough non-trivial ones to demonstrate my work. Things like garbage collection algorithms, type checking, etc can change without any problem for me. I think you now understand what I mean when I ask about Yhc being stable.
Thanks a lot for your time. I eagerly look forward to your comments.
-- Kartik Vaddadi.
Home: www.cse.iitb.ac.in/~kart Blogs: kartik.infogami.com, kartik-log.blogspot.com, kartik-rlog.blogspot.com Alternate mail ID: kartik.vad@gmail.com
"50% Reservation, 100% Politics" - Protest the Indian government's decision to increase reservation in private educational institutions (yfemumbai.blogspot.com)

On Thursday 14 September 2006 14:40, you wrote:
Neil Mitchell wrote:
This is how my debugger works: Rather than logging information to a file as the program runs (like HOOD) or generating the redex trail as a separate data structure in memory (which I believe Hat does; please correct me if I'm wrong), we use the YHC/NHC heap itself as the Evaluation Dependence Tree. We change the virtual machine so that when a function returns, just before the root of the tree representing the redex is overwritten by an indirection node to the root of the result tree, we keep a backup copy of the word that is overwritten. Also, the garbage collector is disabled so that indirection nodes are not eliminated. In addition, the top of stack in the caller (immediately after the return) will be a pointer to the redex node rather than the result. The attached slide gives an example of some of this; please look at it.
Interesting, one thing you might find useful is that Yhc makes it easy to add extra information to heap nodes. So for example you could have something like this (view in fixed width font). +-----------+----------+-----------+----------+ | (+) | empty | 3ptr | 5ptr | +-----------+----------+-----------+----------+ when the value is computed it becomes +-----------+---------+-----------+----------+ | IND to 8 | (+) | 3ptr | 5ptr | +-----------+---------+-----------+----------+ That way you wouldn't have to save a separate 'overwritten' list (though it does make every heap cell one word larger).
When the VM during evaluation (or the debugger) examines such a node (a node representing a function application that finished evalution), it follows the indirection and everything works as expected -- once a function is evaluted, its result is used (rather than the function being re-evaluated). But the debugger can look at the saved state and examine the redex -- the body of the function that was executed.
Yup the above scheme would also work that way :-)
We also need a breakpoint instruction to execute step-by-step (or rather function-by-function) -- in effect, controlled execution by the user.
This should be straightforward to implement.
Am I correct in assuming that if we want to leave indirection nodes as is, there is no work for the garbage collector to do? What about memory requirements once the GC is disabled? I believe that the general opinion is that memory can be problem, but on computers with say 512 MB memory is it still a problem?
Haskell tends to work by allocating lots of closures describing the computation to be performed and then reducing those closures to a value. The way it tends to work is that the closures are large and the value is small. So once the value has been computed the garbage collector throws away all the information on how to compute the value (since it's not needed) and just keeps the value. That value might then be an argument to another closure, which reduces to a value and so on. The problem is if you want to determine whether the computation was correct you need an accurate record of what computation was performed, so you can't just throw away the 'how to compute it' information. Thus you're quite right, you would need to disable the GC. The issue you're likely to come up against is that Haskell burns memory like nothing. Here's a table of some measurements on my AMD64 of some the conformance tests we use (all based on real programs submitted by Haskell users): Test Run time Total Memory Maximum Live Anna 9.8s 703Mb 1.25Mb Fluid 1.0s 64Mb 176Kb Wheelsieve2 14.0s 904Mb 13Mb Gamteb 11.3s 904Mb 2.9Mb 'Total Memory' is the total amount of memory allocated during the program run and 'Maximum Live' is the largest quantity of memory that was actually copied over during a GC. As you can see Haskell programs tend to allocate a *lot* of memory and then throw most of it away again. So if you keep everything you can exhaust even gigabytes of memory really very quickly with a computationally heavy program. Of course if you only want to debug very short running programs you should be okay.
I'm not sure how Freja works; I think it keeps a copy of the EDT (as it's created) in memory separate from the program's heap, if you understand what I mean. Am I correct? In any case it runs only on SPARC, and so does not exactly suit my purposes.
Freja does things 'piecemeal' which is it runs for a while and gathers a bit of the trace in memory and then stops. Then if the user navigating the trace wanders outside the data Freja has collected then Freja re-runs the program (from the beginning) and collects the next bit of trace in memory.
I'm largely referring to the assumptions I made in the description above -- the Eval, Return & ReturnEval instructions, the indirection nodes, etc. I think these won't be changing, right? I don't need a Haskell implementation that can compile all programs, just enough non-trivial ones to demonstrate my work. Things like garbage collection algorithms, type checking, etc can change without any problem for me. I think you now understand what I mean when I ask about Yhc being stable.
I don't think there's anything likely to change in Yhc anytime soon that is going to cause you a problem :-) Hope that helps :-) Tom

Tom Shackell wrote:
Yhc makes it easy to add extra information to heap nodes
Interesting, I didn't know this.
if you keep everything you can exhaust even gigabytes of memory really very quickly with a computationally heavy program.
I didn't realize that. Thanks.
Freja does things 'piecemeal' ...
I was trying to ask a slightly different question. Once Freja decides that it wants to collect trace information on a certain portion of the EDT, does the trace exist seperate from the heap memory used by the program, or is (part of the) heap itself used as the EDT like in my scheme? Would I be correct in saying that Freja (and Hat, Hood & Buddha) all duplicate information -- once in the normal heap of the program being debugged, and once again in the trace structures of the debugger? -- Kartik Vaddadi. Home: www.cse.iitb.ac.in/~kart Blogs: kartik.infogami.com, kartik-log.blogspot.com, kartik-rlog.blogspot.com Alternate mail ID: kartik.vad@gmail.com "50% Reservation, 100% Politics" - Protest the Indian government's decision to increase reservation in private educational institutions (yfemumbai.blogspot.com)

On Friday 15 September 2006 08:22, Kartik Vaddadi wrote:
Tom Shackell wrote:
I was trying to ask a slightly different question. Once Freja decides that it wants to collect trace information on a certain portion of the EDT, does the trace exist seperate from the heap memory used by the program, or is (part of the) heap itself used as the EDT like in my scheme? Would I be correct in saying that Freja (and Hat, Hood & Buddha) all duplicate information -- once in the normal heap of the program being debugged, and once again in the trace structures of the debugger?
Hat, Hood and Buddha all duplicate the information. Though 'duplicate' is somewhat inaccurate because of course all three allow garbage collection meaning the vast majority of the data in the heap is collected (and hence no longer duplicated). I can't remember whether Freja does or not, my suggestion is to read Nilsson's thesis if you haven't done so already: http://citeseer.ist.psu.edu/nilsson98declarative.html Cheers :-) Tom

Hi Kartik, I think I'd be confident in saying that Yhc should now be able to run anything that nhc could. And if it can't then we'd be interested in seeing the program so we can fix it. Of course things which aren't Haskell 98 won't work (sadly we need someone to rewrite our type system to go beyond Haskell 98 ...). I would also say that the Yhc runtime is both a lot cleaner and a lot better documented that the nhc one. However, this is of course my own and very biased opinion, since I wrote the runtime :-) Yhc is also being much more actively developed than nhc, which means it should be easier to get help with problems. Your project sounds interesting, what is it that you'd like to do with yhc more exactly? Haskell is a notoriously hard language to write a useful debugger for (due to laziness), also a large part of my research is writing a tracing/debugging tool for Haskell using yhc, so I'd be interested to hear your plans :-) Anyway, hope that's informative :-) Cheers Tom On Wednesday 13 September 2006 12:16, Kartik Vaddadi wrote:
Hello, I'm building a debugger for Haskell that works by modifying the runtime, and I need to choose between yhc & nhc. Everything I've seen points towards yhc, but I wonder how stable it is. I know it's not stable for production use as a compiler, but as a platform for a debugger? For my purposes, it does not have to work with every program. I find that yhc works fine with almost all of the half-dozen small programs (<1 page) I threw at it, but chokes on many of the hugs demos.
So would you suggest I work with yhc rather than nhc? Any comments you can give me are greatly appreciated. Thanks.
participants (3)
-
Kartik Vaddadi
-
Neil Mitchell
-
Tom Shackell