[GHC] #8611: nofib’s cacheprof’s allocations nondeterminisitic

#8611: nofib’s cacheprof’s allocations nondeterminisitic
-----------------------------------------+---------------------------------
Reporter: nomeata | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: NoFib benchmark suite | Version: 7.6.3
Keywords: | Operating System:
Architecture: Unknown/Multiple | Unknown/Multiple
Difficulty: Unknown | Type of failure: None/Unknown
Blocked By: | Test Case:
Related Tickets: | Blocking:
-----------------------------------------+---------------------------------
This seems to be neither expected nor desired, and hence worth
investigating:
{{{
./cacheprof +RTS -t
<

#8611: nofib’s cacheprof’s allocations nondeterminisitic ------------------------------------------+-------------------------------- Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark suite | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: ------------------------------------------+-------------------------------- Comment (by nomeata): I could not find the cause immediately, so I’ll report my partial findings here. * It is not the IO part. Inlining the input as one big string, and skipping the output, does not make the symptom disappear. * Ticky-ticky profiling yields identical results for the cacheprof modules, the only difference is in the global counters (`HEAP_CHK_ctr` and `STK_CHK_ctr`) to be precise. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic ------------------------------------------+-------------------------------- Reporter: nomeata | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark suite | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Test Case: | Difficulty: Unknown Blocking: | Blocked By: | Related Tickets: ------------------------------------------+-------------------------------- Comment (by nomeata): The cause does not seem to be tied to a particular part of the code: Luckily, `doFile` has a very plain pipeline. The symptoms disappear when I stop at processing `with_ccs` and appear if I go until `with_synth_2`. But if I read `with_css` from a file instead of calculating it and only do the `synth_2` call, the result also becomes deterministic. Tricky. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic
------------------------------------------+--------------------------------
Reporter: nomeata | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: NoFib benchmark suite | Version: 7.6.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: |
------------------------------------------+--------------------------------
Comment (by Joachim Breitner

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by osa1): * cc: osa1 (added) * version: 7.6.3 => 8.5 Comment: This has been tainting nofib results since years, should we maybe prioritize and fix this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): It just needs someone determined enough to find the cause… -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) Comment: Although serialising and deserialising the result of `with_ccs` gets rid of the non-determinism in total allocations AFAICT (using Read/Show is pretty slow, so I only did ~20 iterations), the average residency/number of samples seems to vary by one (23 vs. 24). Not sure if it's important. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): If I seq `with_ccs` like this: {{{ foldr seq () with_ccs `seq` return final }}} where `final = final_cleanup with_synth_2`, the non-determinism seems irreproducible. Still, the number of samples (and average residency, consequently) varies between 10 and 11. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): If I force `data_areas` in `synth_2` like this: {{{ foldr seq () data_areas `seq` synthd_assy ++ data_areas }}} the non-determinism is irreproducible. If I instead force the whole list, like this: {{{ let res = synthd_assy ++ data_areas in foldr seq () res `seq` res }}} it's non-deterministic still. Seems like this isn't really possible to localise?! I'll leave this for another day. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Some notes from my Windows machine: - In the default build configuration with `-O2`, allocations are completely stable. Maximum residency is unstable, though. - The instability in max residency goes away if I do similar surgically `seq`s as above. - In `-O0`, I even get unstable allocations on Windows, with the samples spread wider than on Linux the other day. - In `-O0 -prof -fprof-auto` there's still a seldom, minimal instability of 40 bytes delta - Judging from `+RTS -S` output, the culprit seems to be an errorneous life set. Comparing two reports from different runs, there was always a point at which the live bytes differed and only much later total bytes allocated would differ. I had a sample where the difference was 1488 bytes in (minor) collection no. 11. These allocations seem to be kept live until the end of the program, and more live data is added during the run of the program. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4450 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * related: => #4450 Comment: Just a few notes while it's still fresh in my head. I always compared two traces via `rr`. - Based on my observation that bytes copied was the first metric to diverge, I did some printf debugging in the RTS. - I noticed that the first divergence comes from copying additional blackholes. After evacuating the blackhole in trace B, it resumes 'normal' operation, e.g. does the same thing trace A would do instead of evacuating the black hole. - The black holes are evacuated when the stack is scavenged. They're pointed to by update frames, all of which originally pointed to `stg_sel_0_upd` selector thunks. - I also noticed that the `mut_list_size` output of `+RTS -Dg` varies between the two traces, and it's never because of any mutable primitives (e.g., arrays, MVars, etc.). I'm not sure what pushes (or fails to push) the update frames. Maybe it has to do with `ThreadPaused.c:stackSqueeze()`? Maybe this is related to #4450? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4450 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: simonmar (added) Comment: Indeed, never making the call to `ThreadPaused.c:stackSqueeze()` gets rid of the non-determinism. So does `+RTS -V0` (which disables the master tick timer that triggers context switches, if understand correctly) as proposed in #4450 and `+RTS -Z` (which just deactivates stack squeezing completely). The latter seems like an easy fix for this particular benchmark, but this will probably come up again in the future. Maybe we can teach the RTS not do stack squeezing during context switches? CCing simonmar for input, as I'm not particularly familiar with RTS internals. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4450 | Differential Rev(s): Phab:D5460 Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * status: new => patch * differential: => Phab:D5460 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

The latter seems like an easy fix for this particular benchmark, but
#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4450 | Differential Rev(s): Phab:D5460 Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:11 sgraf]: this will probably come up again in the future. Maybe we can teach the RTS not do stack squeezing during context switches?
CCing simonmar for input, as I'm not particularly familiar with RTS
internals. Continuing discussion in #16065. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4450 | Differential Rev(s): Phab:D5460 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Whoohoo, thanks for drilling down here! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: NoFib benchmark | Version: 8.5 suite | Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4450 #16065 | Differential Rev(s): Phab:D5470 Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * differential: Phab:D5460 => Phab:D5470 * related: #4450 => #4450 #16065 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8611#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8611: nofib’s cacheprof’s allocations nondeterminisitic
-------------------------------------+-------------------------------------
Reporter: nomeata | Owner: (none)
Type: bug | Status: patch
Priority: normal | Milestone:
Component: NoFib benchmark | Version: 8.5
suite |
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #4450 #16065 | Differential Rev(s): Phab:D5470
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Sebastian Graf
participants (1)
-
GHC