
Hi! We are now able to generate DWARF debug info, by passing -g to GHC. This will allow for better debugging (e.g. using GDB) and profiling (e.g. using Linux perf events). To make this feature more user accessible we need to ship debug info for the core libraries (and perhaps the RTS). The reason we need to ship debug info is that it's difficult, or impossible in the case of base, for the user to rebuild these libraries.The question is, how do we do this well? I don't think our "way" solution works very well. It causes us to recompile too much and GHC doesn't know which "ways" have been built or not. I believe other compilers, e.g. GCC, ship debug symbols in separate files ( https://packages.debian.org/sid/libc-dbg) that e.g. GDB can then look up. -- Johan

On Fri, Jan 2, 2015 at 6:18 PM, Johan Tibell
I believe other compilers, e.g. GCC, ship debug symbols in separate files ( https://packages.debian.org/sid/libc-dbg
) that e.g. GDB can then look up.
Lookaside debugging information is (a) a Linux-ism, although possibly also included in mingw --- but not OS X or the *BSDs (b) on RPM-based systems at least, is split out of objects into separate files, and thence into debug packages, by the standard RPM support macros before the standard strip step (I expect debuild does something similar on Debian-ish systems). -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Brandon,
If we just built GHC with debug symbols enabled, everything should just
work from a packaging perspective?
On Fri, Jan 2, 2015 at 7:26 PM, Brandon Allbery
On Fri, Jan 2, 2015 at 6:18 PM, Johan Tibell
wrote: I believe other compilers, e.g. GCC, ship debug symbols in separate files (https://packages.debian.org/sid/libc-dbg
) that e.g. GDB can then look up.
Lookaside debugging information is (a) a Linux-ism, although possibly also included in mingw --- but not OS X or the *BSDs (b) on RPM-based systems at least, is split out of objects into separate files, and thence into debug packages, by the standard RPM support macros before the standard strip step (I expect debuild does something similar on Debian-ish systems).
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Fri, Jan 2, 2015 at 7:54 PM, Johan Tibell
If we just built GHC with debug symbols enabled, everything should just work from a packaging perspective?
On most RPM systems, at least (I get debuginfo packages for local RPM builds, with nothing special in the specs files). Someone else would have to comment on Debian's build system, although I expect that it is similarly automated. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

The debian package seems to simply put un-stripped libraries into a special path (/usr/lib/debug/...). This should be relatively straight-forward to implement. Note though that from a look at the RPM infrastructure, they have a tool in there (dwarfread) which actually parses through DWARF information and updates paths, so there is possibly more going on here. On the other hand, supporting -gsplit-dwarf seems to be a different mechanism, called Fission[1]. I haven't looked too much at the implementation yet, but to me it looks like it means generating copies of debug sections (such as .debug-line.dwo) which will then be extracted using "objcopy --extract-dwo". This might take a bit more work to implement, both on DWARF generation code as well as infrastructure. Interestingly enough, doing this kind of splitting will actually buy us next to nothing - with Fission both .debug_line and .debug_frame would remain in the binary unchanged, so all we'd export would be some fairly inconsequential data from .debug_info. In contrast to other programming languages, we just don't have that much debug information in the first place. Well, at least not yet. Greetings, Peter [1] https://gcc.gnu.org/wiki/DebugFission On 03/01/2015 00:18, Johan Tibell wrote:
Hi!
We are now able to generate DWARF debug info, by passing -g to GHC. This will allow for better debugging (e.g. using GDB) and profiling (e.g. using Linux perf events). To make this feature more user accessible we need to ship debug info for the core libraries (and perhaps the RTS). The reason we need to ship debug info is that it's difficult, or impossible in the case of base, for the user to rebuild these libraries.The question is, how do we do this well? I don't think our "way" solution works very well. It causes us to recompile too much and GHC doesn't know which "ways" have been built or not.
I believe other compilers, e.g. GCC, ship debug symbols in separate files (https://packages.debian.org/sid/libc-dbg) that e.g. GDB can then look up.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

How much debug info (as a percentage) do we currently generate? Could we
just keep it in there in the release?
On Sat, Jan 3, 2015 at 1:33 PM, Peter Wortmann
The debian package seems to simply put un-stripped libraries into a special path (/usr/lib/debug/...). This should be relatively straight-forward to implement. Note though that from a look at the RPM infrastructure, they have a tool in there (dwarfread) which actually parses through DWARF information and updates paths, so there is possibly more going on here.
On the other hand, supporting -gsplit-dwarf seems to be a different mechanism, called Fission[1]. I haven't looked too much at the implementation yet, but to me it looks like it means generating copies of debug sections (such as .debug-line.dwo) which will then be extracted using "objcopy --extract-dwo". This might take a bit more work to implement, both on DWARF generation code as well as infrastructure.
Interestingly enough, doing this kind of splitting will actually buy us next to nothing - with Fission both .debug_line and .debug_frame would remain in the binary unchanged, so all we'd export would be some fairly inconsequential data from .debug_info. In contrast to other programming languages, we just don't have that much debug information in the first place. Well, at least not yet.
Greetings, Peter
[1] https://gcc.gnu.org/wiki/DebugFission
On 03/01/2015 00:18, Johan Tibell wrote:
Hi!
We are now able to generate DWARF debug info, by passing -g to GHC. This will allow for better debugging (e.g. using GDB) and profiling (e.g. using Linux perf events). To make this feature more user accessible we need to ship debug info for the core libraries (and perhaps the RTS). The reason we need to ship debug info is that it's difficult, or impossible in the case of base, for the user to rebuild these libraries.The question is, how do we do this well? I don't think our "way" solution works very well. It causes us to recompile too much and GHC doesn't know which "ways" have been built or not.
I believe other compilers, e.g. GCC, ship debug symbols in separate files (https://packages.debian.org/sid/libc-dbg) that e.g. GDB can then look up.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Okay, I ran a little experiment - here's the size of the debug sections that Fission would keep (for base library): .debug_abbrev: 8932 - 0.06% .debug_line: 374134 - 2.6% .debug_frame: 671200 - 4.5% Not that much. On the other hand, .debug_info is a significant contributor: .debug_info(full): 4527391 - 30% Here's what this contains: All procs get a corresponding DWARF entry, and we declare all Cmm blocks as "lexical blocks". The latter isn't actually required right now - to my knowledge, GDB simply ignores it, while LLDB shows it as "inlined" routines. In either case, it just shows yet more GHC-generated names, so it's really only useful for profiling tools that know Cmm block names. So here's what we get if we strip out block information: .debug_info(!block): 1688410 - 11% This eliminates a good chunk of information, and might therefore be a good idea for "-g1" at minimum. If we want this as default for 7.10, this would make the total overhead about 18%. Acceptable? I can supply a patch if needed. Just for comparison - for Fission we'd strip proc records as well, which would cause even more extreme savings: .debug_info(!proc): 36081 - 0.2% At this point the overhead would be just about 7% - but without doing Fission properly this would most certainly affect debuggers. Greetings, Peter On 03/01/2015 21:22, Johan Tibell wrote:
How much debug info (as a percentage) do we currently generate? Could we just keep it in there in the release?

What about keeping exactly what -g1 keeps for gcc (i.e. functions, external
variables, and line number tables)?
On Sun, Jan 4, 2015 at 5:48 PM, Peter Wortmann
Okay, I ran a little experiment - here's the size of the debug sections that Fission would keep (for base library):
.debug_abbrev: 8932 - 0.06% .debug_line: 374134 - 2.6% .debug_frame: 671200 - 4.5%
Not that much. On the other hand, .debug_info is a significant contributor:
.debug_info(full): 4527391 - 30%
Here's what this contains: All procs get a corresponding DWARF entry, and we declare all Cmm blocks as "lexical blocks". The latter isn't actually required right now - to my knowledge, GDB simply ignores it, while LLDB shows it as "inlined" routines. In either case, it just shows yet more GHC-generated names, so it's really only useful for profiling tools that know Cmm block names.
So here's what we get if we strip out block information:
.debug_info(!block): 1688410 - 11%
This eliminates a good chunk of information, and might therefore be a good idea for "-g1" at minimum. If we want this as default for 7.10, this would make the total overhead about 18%. Acceptable? I can supply a patch if needed.
Just for comparison - for Fission we'd strip proc records as well, which would cause even more extreme savings:
.debug_info(!proc): 36081 - 0.2%
At this point the overhead would be just about 7% - but without doing Fission properly this would most certainly affect debuggers.
Greetings, Peter
On 03/01/2015 21:22, Johan Tibell wrote:
How much debug info (as a percentage) do we currently generate? Could we just keep it in there in the release?
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

(sorry for late answer) Yes, that's pretty much what this would boil down to. The patch is trivial: https://github.com/scpmw/ghc/commit/29acc#diff-1 I think this is a good idea anyways. We can always re-introduce the data for higher -g<n> levels. Greetings, Peter On 05/01/2015 00:59, Johan Tibell wrote:
What about keeping exactly what -g1 keeps for gcc (i.e. functions, external variables, and line number tables)?
On Sun, Jan 4, 2015 at 5:48 PM, Peter Wortmann
mailto:scpmw@leeds.ac.uk> wrote: Okay, I ran a little experiment - here's the size of the debug sections that Fission would keep (for base library):
.debug_abbrev: 8932 - 0.06% .debug_line: 374134 - 2.6% .debug_frame: 671200 - 4.5%
Not that much. On the other hand, .debug_info is a significant contributor:
.debug_info(full): 4527391 - 30%
Here's what this contains: All procs get a corresponding DWARF entry, and we declare all Cmm blocks as "lexical blocks". The latter isn't actually required right now - to my knowledge, GDB simply ignores it, while LLDB shows it as "inlined" routines. In either case, it just shows yet more GHC-generated names, so it's really only useful for profiling tools that know Cmm block names.
So here's what we get if we strip out block information:
.debug_info(!block): 1688410 - 11%
This eliminates a good chunk of information, and might therefore be a good idea for "-g1" at minimum. If we want this as default for 7.10, this would make the total overhead about 18%. Acceptable? I can supply a patch if needed.
Just for comparison - for Fission we'd strip proc records as well, which would cause even more extreme savings:
.debug_info(!proc): 36081 - 0.2%
At this point the overhead would be just about 7% - but without doing Fission properly this would most certainly affect debuggers.
Greetings, Peter
On 03/01/2015 21:22, Johan Tibell wrote: > How much debug info (as a percentage) do we currently generate? Could we just keep it in there in the release?
_________________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://www.haskell.org/__mailman/listinfo/ghc-devs http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

We should merge this fix to the 7.10 branch.
On Jan 8, 2015 11:52 PM, "Peter Wortmann"
(sorry for late answer)
Yes, that's pretty much what this would boil down to. The patch is trivial:
https://github.com/scpmw/ghc/commit/29acc#diff-1
I think this is a good idea anyways. We can always re-introduce the data for higher -g<n> levels.
Greetings, Peter
On 05/01/2015 00:59, Johan Tibell wrote:
What about keeping exactly what -g1 keeps for gcc (i.e. functions, external variables, and line number tables)?
On Sun, Jan 4, 2015 at 5:48 PM, Peter Wortmann
mailto:scpmw@leeds.ac.uk> wrote: Okay, I ran a little experiment - here's the size of the debug sections that Fission would keep (for base library):
.debug_abbrev: 8932 - 0.06% .debug_line: 374134 - 2.6% .debug_frame: 671200 - 4.5%
Not that much. On the other hand, .debug_info is a significant contributor:
.debug_info(full): 4527391 - 30%
Here's what this contains: All procs get a corresponding DWARF entry, and we declare all Cmm blocks as "lexical blocks". The latter isn't actually required right now - to my knowledge, GDB simply ignores it, while LLDB shows it as "inlined" routines. In either case, it just shows yet more GHC-generated names, so it's really only useful for profiling tools that know Cmm block names.
So here's what we get if we strip out block information:
.debug_info(!block): 1688410 - 11%
This eliminates a good chunk of information, and might therefore be a good idea for "-g1" at minimum. If we want this as default for 7.10, this would make the total overhead about 18%. Acceptable? I can supply a patch if needed.
Just for comparison - for Fission we'd strip proc records as well, which would cause even more extreme savings:
.debug_info(!proc): 36081 - 0.2%
At this point the overhead would be just about 7% - but without doing Fission properly this would most certainly affect debuggers.
Greetings, Peter
On 03/01/2015 21:22, Johan Tibell wrote: > How much debug info (as a percentage) do we currently generate? Could we just keep it in there in the release?
_________________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://www.haskell.org/__mailman/listinfo/ghc-devs http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I've been building the RTS with debug symbols for our internal GHC build at FB, because it makes investigating problems a lot easier. I should probably upstream this patch. Shipping libraries with debug symbols should be fine, as long as they can be stripped - Peter, does stripping remove everything that -g creates? Cheers, Simon On 02/01/2015 23:18, Johan Tibell wrote:
Hi!
We are now able to generate DWARF debug info, by passing -g to GHC. This will allow for better debugging (e.g. using GDB) and profiling (e.g. using Linux perf events). To make this feature more user accessible we need to ship debug info for the core libraries (and perhaps the RTS). The reason we need to ship debug info is that it's difficult, or impossible in the case of base, for the user to rebuild these libraries.The question is, how do we do this well? I don't think our "way" solution works very well. It causes us to recompile too much and GHC doesn't know which "ways" have been built or not.
I believe other compilers, e.g. GCC, ship debug symbols in separate files (https://packages.debian.org/sid/libc-dbg) that e.g. GDB can then look up.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Could we get this for 7.10 so our debug info story is more "well-rounded"?
On Fri, Jan 9, 2015 at 5:11 PM, Simon Marlow
I've been building the RTS with debug symbols for our internal GHC build at FB, because it makes investigating problems a lot easier. I should probably upstream this patch.
Shipping libraries with debug symbols should be fine, as long as they can be stripped - Peter, does stripping remove everything that -g creates?
Cheers, Simon
On 02/01/2015 23:18, Johan Tibell wrote:
Hi!
We are now able to generate DWARF debug info, by passing -g to GHC. This will allow for better debugging (e.g. using GDB) and profiling (e.g. using Linux perf events). To make this feature more user accessible we need to ship debug info for the core libraries (and perhaps the RTS). The reason we need to ship debug info is that it's difficult, or impossible in the case of base, for the user to rebuild these libraries.The question is, how do we do this well? I don't think our "way" solution works very well. It causes us to recompile too much and GHC doesn't know which "ways" have been built or not.
I believe other compilers, e.g. GCC, ship debug symbols in separate files (https://packages.debian.org/sid/libc-dbg) that e.g. GDB can then look up.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Yes - strip will catch everything. Greetings, Peter On 09/01/2015 17:11, Simon Marlow wrote:
I've been building the RTS with debug symbols for our internal GHC build at FB, because it makes investigating problems a lot easier. I should probably upstream this patch.
Shipping libraries with debug symbols should be fine, as long as they can be stripped - Peter, does stripping remove everything that -g creates?
Cheers, Simon
On 02/01/2015 23:18, Johan Tibell wrote:
Hi!
We are now able to generate DWARF debug info, by passing -g to GHC. This will allow for better debugging (e.g. using GDB) and profiling (e.g. using Linux perf events). To make this feature more user accessible we need to ship debug info for the core libraries (and perhaps the RTS). The reason we need to ship debug info is that it's difficult, or impossible in the case of base, for the user to rebuild these libraries.The question is, how do we do this well? I don't think our "way" solution works very well. It causes us to recompile too much and GHC doesn't know which "ways" have been built or not.
I believe other compilers, e.g. GCC, ship debug symbols in separate files (https://packages.debian.org/sid/libc-dbg) that e.g. GDB can then look up.
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (4)
-
Brandon Allbery
-
Johan Tibell
-
Peter Wortmann
-
Simon Marlow