[GHC] #9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv)

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------------------+------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Runtime System | Version: 7.8.2 Keywords: | Operating System: Architecture: Unknown/Multiple | Unknown/Multiple Difficulty: Easy (less than 1 hour) | Type of failure: Other Blocked By: | Test Case: Related Tickets: | Blocking: -------------------------------------------+------------------------------- A patch in GHC-7.7 modified the "thread stop status" encoding, so the current ghc-events (version 0.4.2.0 from 2012) produces wrong stop reasons when reading an eventlog written by GHC-7.8.2. The GHC patch is http://git.haskell.org/ghc.git/commit/70e20631742e516c6a11c3c112fbd5b4a08c15... The MVarRead block reason was added in the middle, shifting all following block reason codes one up. The error can be observed when a thread blocks on a black hole during a program run. Attached is a small test program which produces this situation, and two event logs for the same program run with the same arguments and RTS parameters. The program's two threads evaluate the same data, the parent thread typically blocks on a blackhole created by a forked child thread. Note the sequence around user event "child": The log written by GHC-7.6.3 correctly shows blocking on a black hole, while the log written by GHC-7.8.2 claims the main thread blocks "on I/O read". -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Changes (by jberthold): * cc: jberthold (added) Comment: Like the GHC-6.x tid field length problem, any solution to the problem requires baking in knowledge of GHC version differences into ghc-events (might break other software that writes event log format). A fix in ghc-events is possible (the fork ghc-events-parallel has been fixed like this as a temporary solution): EVENT_USER_MARKER was added after the 7.6 branch, so the ghc-events parser can use it as an indication to use different thread stop state decoders. When fixing it like this, however, older tools will always yield the error. OTOH, when reverting to the original encoding in GHC (fix inside GHC), one needs to reliably detect GHC-7.8.2 and distinguish it from subsequent versions. Again by adding a new event type in master and GHC-7.8.3. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Changes (by ezyang): * cc: ezyang (added) Comment: Oof, sorry about breaking backwards compatibility in such an annoying way. I am sympathetic, but it is not clear to me what the correct path from here is. Here are some possibilities: 1. Accept that `includes/Constants.h` really is GHC internal and subject to change however we like (`f30d527344db528618f64a25250a3be557d9f287` is an example of another commit which changed numbering, so my commit is definitely in keeping with tradition), and setup `ghc-events` to match on the relevant GHC version and change its behavior appropriately. I guess we'd need to tag eventlog with the relevant GHC version (I hope we're doing that already...) To take things a step further, since ghc-events would be GHC specific, we might fold it into the GHC build tree, so it tracks changes with the rest of GHC (this might not be so appropriate as you might process eventlogs produced by one version of GHC with another version.) 2. Establish the constants as a proper ABI which we guarantee the stability of, and add comments and test-cases to prevent people from renumbering them to maintain cleanliness. We can take this further and consider the state of affairs in 7.8.2 to be a bug, and renumber back to the original version. (I believe this is your proposal.) `ghc-event` only needs to work around this particular version. (I think version numbers should be added to the eventlog format anyway.) Note: some of the fields we use these constants for are fixed size, so if we never reuse numbers we may run out of space, so we may eventually need to reuse some of them. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by jberthold): Replying to [comment:2 ezyang]:
Oof, sorry about breaking backwards compatibility in such an annoying way.
I think we should have some "warning signs" in the code (includes/rts/Constants.h), otherwise this will happen again sooner or later. The shared knowledge of ghc-events and GHC runtime system is in includes/rts/EventLogFormat, which is obviously not up to date right now. includes/rts/Constants.h should point there. About the options:
1. Accept that `includes/Constants.h` really is GHC internal
Decidedly not the intention (for the parts mentioned in EventLogFormat.h). The other commit you found is interesting... My best guess is tha "Relocated" was never actually written to an event log (I might be wrong, though...)
2. Establish the constants as a proper ABI which we guarantee the stability of, and add comments and test-cases to prevent people from renumbering them to maintain cleanliness.We can take this further and consider the state of affairs in 7.8.2 to be a bug,
This is what Simon thought of, judging from what he replied to my mail earlier. I completely agree with you, there have to be comments in the code to avoid this in the future. Then either 2a) ghc-events stays as it is and will never parse ghc-7.8.2 eventlogs 100% correctly 2b) GHC and ghc-events are adapted to recognise ghc-7.8.2 traces (and not later versions) from their header, by adding an event or extending one with a new field, in GHC-7.8.3. Or there might be 3. Consider older ghc-events versions outdated (something we wanted to avoid), and only adapt ghc-events. This is what I did for the fork ghc- events-parallel (but it leaves the route to 2B open). And: Yes, an event indicating the RTS version is written by (newer versions of) GHC. However, we cannot rely on it being present, and it is fiddly/fragile to read the beginning of an eventlog and then switch parsers when (and iff) it is found. I tried that, but then went for the other fix. The information must be derived from the header (event information), not from events that may or may not be written. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by ezyang): Here's another perspective: really this is just a bug in this function implementation: {{{ INLINE_HEADER void traceEventStopThread(Capability *cap STG_UNUSED,· StgTSO *tso STG_UNUSED,· StgThreadReturnCode status STG_UNUSED, StgWord32 info STG_UNUSED) { traceSchedEvent2(cap, EVENT_STOP_THREAD, tso, status, info); dtraceStopThread((EventCapNo)cap->no, (EventThreadID)tso->id, (EventThreadStatus)status, (EventThreadID)info); } }}} We shouldn't cast from `StgThreadReturnCode` to `EventThreadStatus`, really we should have some translation function from one to the other (coded in terms of the constants, so updating `Constants.h` works automatically). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: normal | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by simonmar): I agree with Edward. The fact that the thread statuses are written verbatim into the event log is the real bug here, because it ties the thread status values to the event log format, whereas they ought to be an internal implementation detail. We only did it this way in the first place because we are lazy (that's the royal "we"). So I propose we fix it in 7.8.3. That is, revert to the original numbering in the event log. But do it with a big switch statement in `traceEventStopThread`, rather than undoing the patch that reordered the numbers. Then we won't have this problem again. Whether ghc-events is patched to work with 7.8.2 or not is a separate matter. It would be good to do this, but it doesn't change the fact that existing ghc-events (and ThreadScope) will give strange results with 7.8.2. If ghc-events is fixed, I'd much rather it was done by looking at the version number rather than inferring it from something else. Is that really too hard? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Changes (by simonpj): * priority: normal => high Comment: Edward says he'll look at this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: simonmar Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by ezyang): Since we dropped the ball on this and 7.8.3 is really soon, the current proposal is to just revert to the original numbering, but not add the big switch statement. This should be a small noncontroversial patch, and then we can fix it properly later. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: new Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Changes (by ezyang): * owner: simonmar => ezyang * milestone: 7.8.4 => 7.8.3 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events,
threadscope, edentv)
-------------------------------+-------------------------------------------
Reporter: jberthold | Owner: ezyang
Type: bug | Status: new
Priority: high | Milestone: 7.8.3
Component: Runtime | Version: 7.8.2
System | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Easy (less than 1 hour)
Unknown/Multiple | Blocked By:
Type of failure: Other | Related Tickets:
Test Case: |
Blocking: |
-------------------------------+-------------------------------------------
Comment (by Edward Z. Yang

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Changes (by ezyang): * status: new => merge -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by jberthold): I would prefer if the patch included a change to some event (eventlog header), so that the 7.8.2 version can be recognized by reading the header. This is the best way to fix ghc-events. The event EVENT_USER_MARKER was new in 7.8, and my fix to ghc-events- parallel was based on this. If the old numbering is restored now, I would like to have a way to distinguish version 7.8.3 from 7.8.2, which is easily done by changing events (could be adding an event which is not used, or extending an event with a new field). If any such change is pending, I suggest to apply it now, or revert the numbering later. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by simonmar): @jberthold: why not check the RTS version? It's in the RTS_IDENTIFIER event. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by jberthold): Replying to [comment:13 simonmar]:
@jberthold: why not check the RTS version? It's in the RTS_IDENTIFIER event.
That would be possible, assuming that the event is indeed present in every trace (preferably near the beginning). The implementation has to "unwrap" all event blocks which group the event, though, which appears cumbersome. The more serious problem I see is parsing traces without this event: in order to restart with the proper block reason decoder, the parser needs to keep a reference to the input event stream (making it a list, not a stream), leading to memory issues for larger traces (and they do get large...) Identifying the version from header information is much cleaner, that is what the header is for. (OK, I admit that adding or extending an event for that purpose is a hack). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by simonmar): You're right, the `RTS_IDENTIFIER` event could be anywhere in the stream, possibly right at the end. We should put something in the header for this. Still, 7.8.2 was buggy and it's not a disaster if we don't support it properly. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by carter): Agreed with Simon, especially since event log support was pretty busted with 7.8.2 anyways (eg: user events didn't seem to show up correctly!) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: merge Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Comment (by juhpetersen): If we don't care too much about 7.8.2, is there anything more to do here for 7.8.3? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events,
threadscope, edentv)
-------------------------------+-------------------------------------------
Reporter: jberthold | Owner: ezyang
Type: bug | Status: merge
Priority: high | Milestone: 7.8.3
Component: Runtime | Version: 7.8.2
System | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Easy (less than 1 hour)
Unknown/Multiple | Blocked By:
Type of failure: Other | Related Tickets:
Test Case: |
Blocking: |
-------------------------------+-------------------------------------------
Comment (by Austin Seipp

#9003: GHC eventlog: thread stop status codes modified (breaking ghc-events, threadscope, edentv) -------------------------------+------------------------------------------- Reporter: jberthold | Owner: ezyang Type: bug | Status: closed Priority: high | Milestone: 7.8.3 Component: Runtime | Version: 7.8.2 System | Keywords: Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 hour) Unknown/Multiple | Blocked By: Type of failure: Other | Related Tickets: Test Case: | Blocking: | -------------------------------+------------------------------------------- Changes (by thoughtpolice): * status: merge => closed * resolution: => fixed Comment: @jberthold - this is done. 7.8.3 and HEAD now include a `EVENT_HACK_BUG_T9003` event that you should be able to use. Closing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9003#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC