Re: [GHC] #8027: Adding one call to getNumCapabilities triggers performance nose dive (6X slowdown)

3 Jul 2013

      #8027: Adding one call to getNumCapabilities triggers performance nose dive (6X
slowdown)
---------------------------------+------------------------------------------
    Reporter:  rrnewton          |       Owner:                         
        Type:  bug               |      Status:  new                    
    Priority:  normal            |   Milestone:                         
   Component:  Runtime System    |     Version:  7.6.3                  
    Keywords:                    |          Os:  Unknown/Multiple       
Architecture:  Unknown/Multiple  |     Failure:  Runtime performance bug
  Difficulty:  Unknown           |    Testcase:                         
   Blockedby:                    |    Blocking:                         
     Related:                    |  
---------------------------------+------------------------------------------
Changes (by simonpj):

  * difficulty:  => Unknown

Old description:
...
This is a strange one.
I'm in the process of writing an efficient routine for reading large
files of decimal/ASCII numbers, in order to make reasonable Haskell
versions of the PBBS benchmarks.
The following reproducer reads a file and parses the numbers inside
it.  Simply calling "getNumCapabilities" BEFORE reading the file makes
it leap from taking 4.5 seconds to 25-30 seconds:
https://github.com/iu-
parfunc/lvars/blob/cfb5110eb172b9a9dd10e9863b2f87420d1dadf6/haskell-
prototype/Util/PBBS.hs#L30
The file is a standalone reproducer and there's a Makefile in that
directory that will build and run it in the bugged and unbugged modes.
When you run "make" you should see output as in the following Gist:
https://gist.github.com/rrnewton/5901965
......
     time ./unbugged.exe
     Using parReadNats + readFile
     Time to read file sequentially: 0.311108s
     SKIPPING read of num capabilities...
     Now this is getting ridiculous...
     Result: 1 segments of output

             4.59 real         3.89 user         0.66 sys
     time ./bugged.exe
     Using parReadNats + readFile
     Time to read file sequentially: 0.342555s
     Read num capabilities as 1
     Now this is getting ridiculous...
     Result: 1 segments of output

            38.79 real        37.57 user         1.18 sys
Currently I'm running it on a 550Mb file containing 69 million
numbers.  Any file of whitespace-separated positive numbers will do,
but if you want the exact file I'm using you can download it here:
http://www.cs.indiana.edu/~rrnewton/temp/3Dgrid_J_10000000
Or you can download the PBBS benchmark suite and run the generator to
produce that file (http://www.cs.cmu.edu/~pbbs/).  Anyway, it should
have nothing to do with the file.
One thing that is interesting is that calling certain OTHER system
functions appears to create the same performance effect as
"getNumCapabilities".  Namely, getEnvironment and setNumCapabilities
cause the problem, but getEnv and writeFile don't.
I've attached the Core produced in the bugged and unbugged case to the
Gist as well:
https://gist.github.com/rrnewton/5901965#file-pbbs_bugged-dump-simpl
   https://gist.github.com/rrnewton/5901965#file-pbbs_unbugged-dump-simpl
Finally, the above numbers are from my mac laptop, GHC 7.6.2.  But the
problem
is exactly the same on the Linux/GHC-7.6.3 machine I tried, and I'm in
the process of building a fresh GHC head to check it there...
New description:

 This is a strange one.

 I'm in the process of writing an efficient routine for reading large
 files of decimal/ASCII numbers, in order to make reasonable Haskell
 versions of the PBBS benchmarks.

 The following reproducer reads a file and parses the numbers inside
 it.  Simply calling "getNumCapabilities" BEFORE reading the file makes
 it leap from taking 4.5 seconds to 25-30 seconds:

    https://github.com/iu-
 parfunc/lvars/blob/cfb5110eb172b9a9dd10e9863b2f87420d1dadf6/haskell-
 prototype/Util/PBBS.hs#L30

 The file is a standalone reproducer and there's a Makefile in that
 directory that will build and run it in the bugged and unbugged modes.
 When you run "make" you should see output as in the following Gist:
 {{{
   https://gist.github.com/rrnewton/5901965

      ......
      time ./unbugged.exe
      Using parReadNats + readFile
      Time to read file sequentially: 0.311108s
      SKIPPING read of num capabilities...
      Now this is getting ridiculous...
      Result: 1 segments of output

              4.59 real         3.89 user         0.66 sys
      time ./bugged.exe
      Using parReadNats + readFile
      Time to read file sequentially: 0.342555s
      Read num capabilities as 1
      Now this is getting ridiculous...
      Result: 1 segments of output

             38.79 real        37.57 user         1.18 sys
 }}}
 Currently I'm running it on a 550Mb file containing 69 million
 numbers.  Any file of whitespace-separated positive numbers will do,
 but if you want the exact file I'm using you can download it here:

   http://www.cs.indiana.edu/~rrnewton/temp/3Dgrid_J_10000000

 Or you can download the PBBS benchmark suite and run the generator to
 produce that file (http://www.cs.cmu.edu/~pbbs/).  Anyway, it should
 have nothing to do with the file.

 One thing that is interesting is that calling certain OTHER system
 functions appears to create the same performance effect as
 "getNumCapabilities".  Namely, getEnvironment and setNumCapabilities
 cause the problem, but getEnv and writeFile don't.

 I've attached the Core produced in the bugged and unbugged case to the
 Gist as well:

    https://gist.github.com/rrnewton/5901965#file-pbbs_bugged-dump-simpl
    https://gist.github.com/rrnewton/5901965#file-pbbs_unbugged-dump-simpl

 Finally, the above numbers are from my mac laptop, GHC 7.6.2.  But the
 problem
 is exactly the same on the Linux/GHC-7.6.3 machine I tried, and I'm in
 the process of building a fresh GHC head to check it there...

--

-- 
Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/8027#comment:1
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler