[GHC] #8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown)

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X
slowdown)
------------------------------------+---------------------------------------
Reporter: rrnewton | Owner:
Type: bug | Status: new
Priority: normal | Component: Runtime System
Version: 7.6.3 | Keywords:
Os: Unknown/Multiple | Architecture: Unknown/Multiple
Failure: Runtime performance bug | Blockedby:
Blocking: | Related:
------------------------------------+---------------------------------------
This is a strange one.
I'm in the process of writing an efficient routine for reading large
files of decimal/ASCII numbers, in order to make reasonable Haskell
versions of the PBBS benchmarks.
The following reproducer reads a file and parses the numbers inside
it. Simply calling "getNumCapabilities" BEFORE reading the file makes
it leap from taking 4.5 seconds to 25-30 seconds:
https://github.com/iu-
parfunc/lvars/blob/cfb5110eb172b9a9dd10e9863b2f87420d1dadf6/haskell-
prototype/Util/PBBS.hs#L30
The file is a standalone reproducer and there's a Makefile in that
directory that will build and run it in the bugged and unbugged modes.
When you run "make" you should see output as in the following Gist:
https://gist.github.com/rrnewton/5901965
......
time ./unbugged.exe
Using parReadNats + readFile
Time to read file sequentially: 0.311108s
SKIPPING read of num capabilities...
Now this is getting ridiculous...
Result: 1 segments of output

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) ---------------------------------+------------------------------------------ Reporter: rrnewton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: Runtime performance bug Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Changes (by simonpj): * difficulty: => Unknown Old description:
This is a strange one.
I'm in the process of writing an efficient routine for reading large files of decimal/ASCII numbers, in order to make reasonable Haskell versions of the PBBS benchmarks.
The following reproducer reads a file and parses the numbers inside it. Simply calling "getNumCapabilities" BEFORE reading the file makes it leap from taking 4.5 seconds to 25-30 seconds:
https://github.com/iu- parfunc/lvars/blob/cfb5110eb172b9a9dd10e9863b2f87420d1dadf6/haskell- prototype/Util/PBBS.hs#L30
The file is a standalone reproducer and there's a Makefile in that directory that will build and run it in the bugged and unbugged modes. When you run "make" you should see output as in the following Gist:
https://gist.github.com/rrnewton/5901965
...... time ./unbugged.exe Using parReadNats + readFile Time to read file sequentially: 0.311108s SKIPPING read of num capabilities... Now this is getting ridiculous... Result: 1 segments of output
4.59 real 3.89 user 0.66 sys time ./bugged.exe Using parReadNats + readFile Time to read file sequentially: 0.342555s Read num capabilities as 1 Now this is getting ridiculous... Result: 1 segments of output 38.79 real 37.57 user 1.18 sys Currently I'm running it on a 550Mb file containing 69 million numbers. Any file of whitespace-separated positive numbers will do, but if you want the exact file I'm using you can download it here:
http://www.cs.indiana.edu/~rrnewton/temp/3Dgrid_J_10000000
Or you can download the PBBS benchmark suite and run the generator to produce that file (http://www.cs.cmu.edu/~pbbs/). Anyway, it should have nothing to do with the file.
One thing that is interesting is that calling certain OTHER system functions appears to create the same performance effect as "getNumCapabilities". Namely, getEnvironment and setNumCapabilities cause the problem, but getEnv and writeFile don't.
I've attached the Core produced in the bugged and unbugged case to the Gist as well:
https://gist.github.com/rrnewton/5901965#file-pbbs_bugged-dump-simpl https://gist.github.com/rrnewton/5901965#file-pbbs_unbugged-dump-simpl
Finally, the above numbers are from my mac laptop, GHC 7.6.2. But the problem is exactly the same on the Linux/GHC-7.6.3 machine I tried, and I'm in the process of building a fresh GHC head to check it there...
New description:
This is a strange one.
I'm in the process of writing an efficient routine for reading large
files of decimal/ASCII numbers, in order to make reasonable Haskell
versions of the PBBS benchmarks.
The following reproducer reads a file and parses the numbers inside
it. Simply calling "getNumCapabilities" BEFORE reading the file makes
it leap from taking 4.5 seconds to 25-30 seconds:
https://github.com/iu-
parfunc/lvars/blob/cfb5110eb172b9a9dd10e9863b2f87420d1dadf6/haskell-
prototype/Util/PBBS.hs#L30
The file is a standalone reproducer and there's a Makefile in that
directory that will build and run it in the bugged and unbugged modes.
When you run "make" you should see output as in the following Gist:
{{{
https://gist.github.com/rrnewton/5901965
......
time ./unbugged.exe
Using parReadNats + readFile
Time to read file sequentially: 0.311108s
SKIPPING read of num capabilities...
Now this is getting ridiculous...
Result: 1 segments of output

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) ---------------------------------+------------------------------------------ Reporter: rrnewton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: Runtime performance bug Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by rrnewton): UPDATE: I found at least one other way to trigger the bug without calling those system functions. This makes me think its some kind of compiler bug (or mis-optimization) rather than a runtime bug. I would love to know if anyone else can reproduce this problem... -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) ---------------------------------+------------------------------------------ Reporter: rrnewton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: Runtime performance bug Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by rrnewton): I checked with GHC 7.4.2 (Mac and Linux) and see an identical behavior -- same bug. (I'm trying to run with 7.7.20130628 but having trouble installing the vector package at this moment...) -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) ---------------------------------+------------------------------------------ Reporter: rrnewton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: Runtime performance bug Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by rrnewton): Replying to [comment:3 rrnewton]:
(I'm trying to run with 7.7.20130628 but having trouble installing the vector package at this moment...)
By the way, I'm continuing to update the reproducer -- to get it working with 7.7 and to shrink it further -- on the "ghcbug" branch of this repository: https://github.com/iu-parfunc/lvars/tree/ghcbug -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) ---------------------------------+------------------------------------------ Reporter: rrnewton | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Keywords: | Os: Unknown/Multiple Architecture: Unknown/Multiple | Failure: Runtime performance bug Difficulty: Unknown | Testcase: Blockedby: | Blocking: Related: | ---------------------------------+------------------------------------------ Comment(by ezyang): What happens when you inline getNumCapabilities? It's a very simple function: {{{ getNumCapabilities :: IO Int getNumCapabilities = do n <- peek enabled_capabilities return (fromIntegral n) foreign import ccall "&enabled_capabilities" enabled_capabilities :: Ptr CInt }}} -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) --------------------------------------+------------------------------------- Reporter: rrnewton | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Resolution: wontfix | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: Runtime performance bug | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: --------------------------------------+------------------------------------- Changes (by rrnewton): * status: new => closed * resolution: => wontfix Comment: Ah... my mistake. What was so apparently baffling is actually pretty reasonable. In this case adding more code to the function simply changes inlining decisions, which in turn determines whether dictionary elimination can be done. The 6X performance difference is just because of that. Perhaps such a function should be monomorphic to be safe... since the difference is so big. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) --------------------------------------+------------------------------------- Reporter: rrnewton | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Resolution: wontfix | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: Runtime performance bug | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: --------------------------------------+------------------------------------- Comment(by ezyang): The INLINE pragma should help here. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) --------------------------------------+------------------------------------- Reporter: rrnewton | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Resolution: wontfix | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: Runtime performance bug | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: --------------------------------------+------------------------------------- Comment(by simonpj): I think you are saying that ''inlining a single function, not in an inner loop, has a 6x effect on runtime''. That seems very odd to me. The only examples I've seen with this kind of behaviour is are when a tight loop calls an overloaded arithmetic function; inlining it make it turn into a few machine instructions. But `getNumCapabilities` doesn't seem to be that kind of beast. So I'm still puzzled S -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown) --------------------------------------+------------------------------------- Reporter: rrnewton | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Runtime System | Version: 7.6.3 Resolution: wontfix | Keywords: Os: Unknown/Multiple | Architecture: Unknown/Multiple Failure: Runtime performance bug | Difficulty: Unknown Testcase: | Blockedby: Blocking: | Related: --------------------------------------+------------------------------------- Comment(by ezyang): He isn't referring to getNumCapabilities getting inlined; he's referring to the parReadNats function, which is polymorphic over Num instances. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC