Inspecting function arguments in GHCi

Hello, I'm interested in inspecting the strictness of functions at runtime and the depth of thunks "in the wild." For this reason I'm modifying GHC 8.10.2, essentially to add additional information to breakpoints. I'd like to reuse the logic behind GHCi's :print command (pprintClosureCommand, obtainTermFromId, ...) for which I suppose I need Id's. Those however don't exist for destructuring patterns, such as those in the following equations: last [x] = x last (_:xs) = last xs So I'm wondering where would be a good place in the pipeline to transform patterns like these into at-patterns, to give them Id's. However, the breakpoint logic only looks at the free variables of the right-hand sides and not transitively, which means that e.g. in the following example neither ':print arg1' nor ':print as' works when the interpreter hits a breakpoint in the top level expression on the RHS: qsort arg1@(a:as) = qsort left ++ [a] ++ qsort right where (left, right) = (filter (<=a) as, filter (>a) as) Thus I'd also like to know how to extend the free var logic for Tickish that eventually leads to CgBreakInfo and :print's ability to inspect these bindings at runtime. My goal would be to determine to what extent was a thunk evaluated during function application. Any advice would be greatly appreciated! Regards, Andrew Kvapil

Andrew
We have very poor documentation of the inner workings of the entire breakpoint
and debugging mechanism. And very few (zero?) people who truly understand it.
You could do a great service by starting a Note or a wiki page or something
that lays out the moving parts. You may not feel that you are well equipped to
do so, but you'd almost certainly improve matters!
Has anyone else been working in this space. Matthew P perhaps?
| So I'm wondering where would be a good place in the pipeline to
| transform patterns like these into at-patterns, to give them Id's.
I'm not sure what you have in mind. Could you say more about (a) what you'd like
the user experience to be, and (b) how you are considering implementing it.
| However, the breakpoint logic only looks at the free variables of the
| right-hand sides and not transitively, which means that e.g. in the
| following example neither ':print arg1' nor ':print as' works when the
| interpreter hits a breakpoint in the top level expression on the RHS:
Perhaps you are suggesting that each breakpoint should capture bindings for
*all in-scope variables* rather than *all free variable of the sub-expression".
If so, that sounds pretty feasible. It might risk keeping variables alive
that would otherwise have been garbage-collected, but maybe that's a price
worth paying.
Simon
| -----Original Message-----
| From: ghc-devs

Dear Simon, Thank you for the swift response. I did notice the lack of comprehensive documentation on the details of the debugger. I keep notes of my findings as I explore the codebase and expect to publish them once I get them into a more presentable form. This work is a part of my bachelor's thesis, so a fairly detailed explanation should come out of it.
I'm not sure what you have in mind. Could you say more about (a) what you'd like the user experience to be, and (b) how you are considering implementing it.
Perhaps you are suggesting that each breakpoint should capture bindings for *all in-scope variables* rather than *all free variable of the sub-expression". If so, that sounds pretty feasible. It might risk keeping variables alive that would otherwise have been garbage-collected, but maybe that's a
To be clear, I don't intend to affect the user experience of breakpoints themselves. What I'm after is the information about thunks collected at runtime without user intervention, I added what is essentially a recreation of breakpoints called "tracepoints" for this purpose. The user would simply invoke a traced program and information about the strictness of various function applications would be logged to a file. The result of the execution would be ultimately useful for statistical analysis of the use of laziness in Haskell, although at this point that is a distant and uncertain goal. At the moment, the implementation of tracepoints suspends the GHCi evaluation thread and immediately resumes it (by literally queueing ":continue" in the UI). To get any useful information and start logging it at runtime, I need to capture more information than just the free variables of an expression at bytecode generation time (or sooner, since from what I gather this is in part also a responsibility of the Tickish logic). Perhaps the approach I've taken thus far is nonsensical or significantly inferior to some well-known alternative I'm unaware of. If that seems to be the case, I will happily listen to advice suggesting a change of course. This is the first time I'm hacking on GHC and I have to say I find the scale and complexity of the codebase somewhat daunting, to say the least. price
worth paying.
Andrew
We have very poor documentation of the inner workings of the entire breakpoint and debugging mechanism. And very few (zero?) people who truly understand it.
You could do a great service by starting a Note or a wiki page or something that lays out the moving parts. You may not feel that you are well equipped to do so, but you'd almost certainly improve matters!
Has anyone else been working in this space. Matthew P perhaps?
| So I'm wondering where would be a good place in the pipeline to | transform patterns like these into at-patterns, to give them Id's.
I'm not sure what you have in mind. Could you say more about (a) what you'd like the user experience to be, and (b) how you are considering implementing it.
| However, the breakpoint logic only looks at the free variables of the | right-hand sides and not transitively, which means that e.g. in the | following example neither ':print arg1' nor ':print as' works when the | interpreter hits a breakpoint in the top level expression on the RHS:
Perhaps you are suggesting that each breakpoint should capture bindings for *all in-scope variables* rather than *all free variable of the sub-expression". If so, that sounds pretty feasible. It might risk keeping variables alive that would otherwise have been garbage-collected, but maybe that's a
Yes, not (necessarily) for breakpoints, but for tracepoints I think that's closer to the desired behaviour. Perhaps there's even a simpler approach that lets tracepoints only capture the "top-level" function arguments, i.e. not the bindings that come from destructuring, as in the shared qsort example. The top-level function arguments are what I had in mind when suggesting the conversion to at-patterns. Regards, Andrew On 25/01/2021 13:22, Simon Peyton Jones wrote: price
worth paying.
Simon
| -----Original Message----- | From: ghc-devs
On Behalf Of Andrew | Kvapil | Sent: 25 January 2021 11:06 | To: ghc-devs@haskell.org | Subject: Inspecting function arguments in GHCi | | Hello, | | I'm interested in inspecting the strictness of functions at runtime | and the depth of thunks "in the wild." For this reason I'm modifying | GHC 8.10.2, essentially to add additional information to breakpoints. | I'd like to reuse the logic behind GHCi's :print command | (pprintClosureCommand, obtainTermFromId, ...) for which I suppose I | need Id's. Those however don't exist for destructuring patterns, such | as those in the following equations: | | last [x] = x | last (_:xs) = last xs | | So I'm wondering where would be a good place in the pipeline to | transform patterns like these into at-patterns, to give them Id's. | However, the breakpoint logic only looks at the free variables of the | right-hand sides and not transitively, which means that e.g. in the | following example neither ':print arg1' nor ':print as' works when the | interpreter hits a breakpoint in the top level expression on the RHS: | | qsort arg1@(a:as) = qsort left ++ [a] ++ qsort right | where (left, right) = (filter (<=a) as, filter (>a) as) | | Thus I'd also like to know how to extend the free var logic for | Tickish that eventually leads to CgBreakInfo and :print's ability to | inspect these bindings at runtime. My goal would be to determine to | what extent was a thunk evaluated during function application. | | Any advice would be greatly appreciated! | | Regards, | Andrew Kvapil | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail. | haskell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc- | devs&data=04%7C01%7Csimonpj%40microsoft.com%7C329b12ba7bb74a2657d2 | 08d8c1213bcc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637471695782 | 207814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ | BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uAi4NmjQLj3QG3B7ton5GeFDy | IWJecxtXoXiTIP11tE%3D&reserved=0

Andrew Kvapil
Hello,
I'm interested in inspecting the strictness of functions at runtime and the depth of thunks "in the wild."
Hi Andrew, Interesting. When I first read your introduction my first thought was to rather walk the "normal" heap using ghc-heap or, perhaps, the relatively new ghc-debug library [1] rather than introduce this feature in GHCi. It's hard to know whether non-GHCi-based approach is viable without knowing more about what you are doing, but it potentially brings the benefit of generality: your analysis is not limited to programs with can be run under GHCi. Of course, it also brings some challenges: You would need to find a way to associate info table symbols with whatever information you need of from the Core program and in the presence of simplification tying your results back to the source program may not be possible at all. Cheers, - Ben [1] https://gitlab.haskell.org/ghc/ghc-debug
For this reason I'm modifying GHC 8.10.2, essentially to add additional information to breakpoints. I'd like to reuse the logic behind GHCi's :print command (pprintClosureCommand, obtainTermFromId, ...) for which I suppose I need Id's. Those however don't exist for destructuring patterns, such as those in the following equations:
last [x] = x last (_:xs) = last xs
So I'm wondering where would be a good place in the pipeline to transform patterns like these into at-patterns, to give them Id's. However, the breakpoint logic only looks at the free variables of the right-hand sides and not transitively, which means that e.g. in the following example neither ':print arg1' nor ':print as' works when the interpreter hits a breakpoint in the top level expression on the RHS:
qsort arg1@(a:as) = qsort left ++ [a] ++ qsort right where (left, right) = (filter (<=a) as, filter (>a) as)
Thus I'd also like to know how to extend the free var logic for Tickish that eventually leads to CgBreakInfo and :print's ability to inspect these bindings at runtime. My goal would be to determine to what extent was a thunk evaluated during function application.
Note that Luite's recent work on refactoring the bytecode generator to produce code from STG is quite relevant here. In particular, you will likely want to look at !4589 [1], which does the work of refactoring Tickish to follow the Trees That Grow pattern. You would likely want to do the same to capture your free variable information. Cheers, - Ben [1] https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4589

Hello Ben, Thanks for your suggestions. The decision to adapt GHCi came out of a discussion with my supervisor and his colleagues. At this point the entire set of desired capabilities of the work is still unknown, however we do consider the GHCi-compatible programs to represent a large enough set for future analysis, and the ease of mapping breakpoints back to source code is a significant benefit. I do plan on using the ghc-heap-view (I assume that's what you meant by ghc-heap, or is there another library I don't know about?) logic in the project, although I'm currently more focused on implementing the proper hook mechanism. I expect that events of deeply nested thunks being forced will be quite important. The possibility of tracking control flow via breakpoints/tracepoints also seems appealing. I'm not aware of any existing solutions which would allow dynamic tracing, although it's very well possible I didn't look hard enough. Regarding ghc-debug, I'm not sure what kinds of trade-offs it offers compared to the approach I'm currently taking. It looks like it's a fairly newborn project, do you think it's mature enough for the proposed use cases? I couldn't find docs online, although I did come across [0] which Discourse[1] says is related. I've yet to watch the introduction video. Support for unboxed tuples and other features not supported by GHCi would of course be nice, although performance is not a concern. Keeping the relationship between source code spans and heap objects in the infotables is an intriguing idea.
Note that Luite's recent work on refactoring the bytecode generator to
produce code from STG is quite relevant here. In particular, you will
likely want to look at !4589 [1], which does the work of refactoring
Tickish to follow the Trees That Grow pattern. You would likely want to
do the same to capture your free variable information.
Excellent, I was not aware of this. Thank you! Regards, Andrew [0]: https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/ [1]: https://discourse.haskell.org/t/an-introduction-to-ghc-debug-precise-memory-... On 26/01/2021 16:28, Ben Gamari wrote:
Andrew Kvapil
writes: Hello,
I'm interested in inspecting the strictness of functions at runtime and the depth of thunks "in the wild."
Hi Andrew,
Interesting. When I first read your introduction my first thought was to rather walk the "normal" heap using ghc-heap or, perhaps, the relatively new ghc-debug library [1] rather than introduce this feature in GHCi. It's hard to know whether non-GHCi-based approach is viable without knowing more about what you are doing, but it potentially brings the benefit of generality: your analysis is not limited to programs with can be run under GHCi.
Of course, it also brings some challenges: You would need to find a way to associate info table symbols with whatever information you need of from the Core program and in the presence of simplification tying your results back to the source program may not be possible at all.
Cheers,
- Ben
[1] https://gitlab.haskell.org/ghc/ghc-debug
For this reason I'm modifying GHC 8.10.2, essentially to add additional information to breakpoints. I'd like to reuse the logic behind GHCi's :print command (pprintClosureCommand, obtainTermFromId, ...) for which I suppose I need Id's. Those however don't exist for destructuring patterns, such as those in the following equations:
last [x] = x last (_:xs) = last xs
So I'm wondering where would be a good place in the pipeline to transform patterns like these into at-patterns, to give them Id's. However, the breakpoint logic only looks at the free variables of the right-hand sides and not transitively, which means that e.g. in the following example neither ':print arg1' nor ':print as' works when the interpreter hits a breakpoint in the top level expression on the RHS:
qsort arg1@(a:as) = qsort left ++ [a] ++ qsort right where (left, right) = (filter (<=a) as, filter (>a) as)
Thus I'd also like to know how to extend the free var logic for Tickish that eventually leads to CgBreakInfo and :print's ability to inspect these bindings at runtime. My goal would be to determine to what extent was a thunk evaluated during function application.
Note that Luite's recent work on refactoring the bytecode generator to produce code from STG is quite relevant here. In particular, you will likely want to look at !4589 [1], which does the work of refactoring Tickish to follow the Trees That Grow pattern. You would likely want to do the same to capture your free variable information.
Cheers,
- Ben
[1] https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4589

Hi Andrew,
I updated the README for ghc-debug last week but forget to send this mail!
https://gitlab.haskell.org/ghc/ghc-debug
Cheers,
Matt
On Wed, Jan 27, 2021 at 10:15 PM Andrew Kvapil
Hello Ben,
Thanks for your suggestions. The decision to adapt GHCi came out of a discussion with my supervisor and his colleagues. At this point the entire set of desired capabilities of the work is still unknown, however we do consider the GHCi-compatible programs to represent a large enough set for future analysis, and the ease of mapping breakpoints back to source code is a significant benefit.
I do plan on using the ghc-heap-view (I assume that's what you meant by ghc-heap, or is there another library I don't know about?) logic in the project, although I'm currently more focused on implementing the proper hook mechanism. I expect that events of deeply nested thunks being forced will be quite important. The possibility of tracking control flow via breakpoints/tracepoints also seems appealing. I'm not aware of any existing solutions which would allow dynamic tracing, although it's very well possible I didn't look hard enough.
Regarding ghc-debug, I'm not sure what kinds of trade-offs it offers compared to the approach I'm currently taking. It looks like it's a fairly newborn project, do you think it's mature enough for the proposed use cases? I couldn't find docs online, although I did come across [0] which Discourse[1] says is related. I've yet to watch the introduction video. Support for unboxed tuples and other features not supported by GHCi would of course be nice, although performance is not a concern. Keeping the relationship between source code spans and heap objects in the infotables is an intriguing idea.
Note that Luite's recent work on refactoring the bytecode generator to
produce code from STG is quite relevant here. In particular, you will
likely want to look at !4589 [1], which does the work of refactoring
Tickish to follow the Trees That Grow pattern. You would likely want to
do the same to capture your free variable information.
Excellent, I was not aware of this. Thank you!
Regards, Andrew
[0]: https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/ [1]: https://discourse.haskell.org/t/an-introduction-to-ghc-debug-precise-memory-...
On 26/01/2021 16:28, Ben Gamari wrote:
Andrew Kvapil
writes: Hello,
I'm interested in inspecting the strictness of functions at runtime and the depth of thunks "in the wild."
Hi Andrew,
Interesting. When I first read your introduction my first thought was to rather walk the "normal" heap using ghc-heap or, perhaps, the relatively new ghc-debug library [1] rather than introduce this feature in GHCi. It's hard to know whether non-GHCi-based approach is viable without knowing more about what you are doing, but it potentially brings the benefit of generality: your analysis is not limited to programs with can be run under GHCi.
Of course, it also brings some challenges: You would need to find a way to associate info table symbols with whatever information you need of from the Core program and in the presence of simplification tying your results back to the source program may not be possible at all.
Cheers,
- Ben
[1] https://gitlab.haskell.org/ghc/ghc-debug
For this reason I'm modifying GHC 8.10.2, essentially to add additional information to breakpoints. I'd like to reuse the logic behind GHCi's :print command (pprintClosureCommand, obtainTermFromId, ...) for which I suppose I need Id's. Those however don't exist for destructuring patterns, such as those in the following equations:
last [x] = x last (_:xs) = last xs
So I'm wondering where would be a good place in the pipeline to transform patterns like these into at-patterns, to give them Id's. However, the breakpoint logic only looks at the free variables of the right-hand sides and not transitively, which means that e.g. in the following example neither ':print arg1' nor ':print as' works when the interpreter hits a breakpoint in the top level expression on the RHS:
qsort arg1@(a:as) = qsort left ++ [a] ++ qsort right where (left, right) = (filter (<=a) as, filter (>a) as)
Thus I'd also like to know how to extend the free var logic for Tickish that eventually leads to CgBreakInfo and :print's ability to inspect these bindings at runtime. My goal would be to determine to what extent was a thunk evaluated during function application.
Note that Luite's recent work on refactoring the bytecode generator to produce code from STG is quite relevant here. In particular, you will likely want to look at !4589 [1], which does the work of refactoring Tickish to follow the Trees That Grow pattern. You would likely want to do the same to capture your free variable information.
Cheers,
- Ben
[1] https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4589
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
participants (4)
-
Andrew Kvapil
-
Ben Gamari
-
Matthew Pickering
-
Simon Peyton Jones