[GHC] #15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Keywords: | Operating System: Linux accelerate,memory,compile | Architecture: x86_64 | Type of failure: Compile-time (amd64) | performance bug Test Case: accelerate | Blocked By: 1.2.0 | Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Every time I try to run {{{ cabal install accelerate }}} the compiler take up nearly all 4GB of memory. Stracing it reveals that it seems to be spending nearly all of the time it runs checking a timer file descriptor, and occasionally checking its memory usage without actually doing anything about the problem. This makes installing accelerate nearly impossible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by j.waldmann): Have you tried {{{ strace cabal install accelerate -v -v -v -j1 --ghc-options=-v2 }}} (or similar) it definitely shows that ghc is working. Compilation time still is abysmal. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tmcdonell): * cc: tmcdonell (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by noah): Okay, I've tried that and now it seems to be able to compile accelerate. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Indeed spending 40 seconds in the simplifier does sound bad. Can you attach the full output from compiling with `ghc -v3`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by noah): * Attachment "stdout.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by noah): * Attachment "stderr.txt.gz" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by noah): Okay, Uploaded the files. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tdammers): * owner: (none) => tdammers -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Looks like the simplifier blows up core size (compiling `Data.Array.Accelerate.Analysis.Hash`): {{{ Result size of Desugar (after optimization) = {terms: 3,048, ... }}} This is what we get before the simplifier kicks in - perfectly normal. And then the first round of simplification happens: {{{ Result size of Specialise = {terms: 6,666, types: 10,365, coercions: 1,218, joins: 0/5} -- OverSatApps = False}) = {terms: 10,863, types: 14,960, coercions: 1,218, joins: 0/5} -- Result size of Simplifier iteration=1 = {terms: 137,878, -- Result size of Simplifier iteration=2 = {terms: 66,407, types: 72,375, coercions: 25,148, joins: 0/991} Result size of Simplifier = {terms: 66,295, types: 72,319, coercions: 25,148, joins: 0/991} }}} ...increasing core size by a factor of 20 (peaking at 40). And the next round is even worse: {{{ Result size of Simplifier iteration=1 = {terms: 465,354, -- Result size of Simplifier iteration=2 = {terms: 259,229, -- Result size of Simplifier iteration=3 = {terms: 469,826, -- Result size of Simplifier iteration=4 = {terms: 345,027, -- Result size of Simplifier = {terms: 345,027, }}} Our perfectly reasonable 3000-something terms now blow up to over 300k. Now on to figuring out *why* it blows up. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): So here's an interesting candidate: {{{ -- RHS size: {terms: 27, types: 44, coercions: 4, joins: 0/0} encodeConst encodeConst = \ @ t_areG ds_dtLe ds_dtLf -> case ds_dtLe of { TypeRunit co_areI -> case ds_dtLf `cast` Co:2 of { () -> mempty $fMonoidBuilder }; TypeRscalar t_aq5y -> encodeScalarConst t_aq5y ds_dtLf; TypeRpair @ a1_areO @ b_areP co_areQ ta_aq5A tb_aq5B -> case ds_dtLf `cast` Co:2 of { (a_aq5C, b_aq5D) -> <> $fSemigroupBuilder (encodeConst ta_aq5A a_aq5C) (encodeConst tb_aq5B b_aq5D) } } end Rec } }}} Perfectly benign, but after simplification, we get: {{{ -- RHS size: {terms: 81,945, types: 43,626, coercions: 10,191, joins: 0/2,111} encodeConst1_r4XSg encodeConst1_r4XSg ... }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Unfortunately, some dependencies no longer build on GHC HEAD, so I'll try and boil it down to a self-contained reproduction case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Here's a reduced version: {{{#!haskell {-# LANGUAGE GADTs #-} {-# LANGUAGE PatternGuards #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} {-# OPTIONS_HADDOCK hide #-} -- | -- Module : Repro -- -- based on Data.Array.Accelerate.Analysis.Hash from accelerate -- -- Copyright : [2017] Manuel M T Chakravarty, Gabriele Keller, Trevor L. McDonell -- License : BSD3 module Repro where import Data.ByteString.Builder import Data.ByteString.Builder.Extra import Data.Monoid import Foreign.C.Types import System.IO.Unsafe ( unsafePerformIO ) import System.Mem.StableName ( hashStableName, makeStableName ) import Prelude hiding ( exp ) import Data.Array.Accelerate.Type import Data.Array.Accelerate.Analysis.Hash.TH {-# INLINE encodeSingleConst #-} encodeSingleConst :: SingleType t -> t -> Builder encodeSingleConst (NumSingleType t) = encodeNumConst t encodeSingleConst (NonNumSingleType t) = encodeNonNumConst t {-# INLINE encodeVectorConst #-} encodeVectorConst :: VectorType t -> t -> Builder encodeVectorConst (Vector2Type t) (V2 a b) = intHost $(hashQ "V2") <> encodeSingleConst t a <> encodeSingleConst t b encodeVectorConst (Vector3Type t) (V3 a b c) = intHost $(hashQ "V3") <> encodeSingleConst t a <> encodeSingleConst t b <> encodeSingleConst t c encodeVectorConst (Vector4Type t) (V4 a b c d) = intHost $(hashQ "V4") <> encodeSingleConst t a <> encodeSingleConst t b <> encodeSingleConst t c <> encodeSingleConst t d encodeVectorConst (Vector8Type t) (V8 a b c d e f g h) = intHost $(hashQ "V8") <> encodeSingleConst t a <> encodeSingleConst t b <> encodeSingleConst t c <> encodeSingleConst t d <> encodeSingleConst t e <> encodeSingleConst t f <> encodeSingleConst t g <> encodeSingleConst t h encodeVectorConst (Vector16Type t) (V16 a b c d e f g h i j k l m n o p) = intHost $(hashQ "V16") <> encodeSingleConst t a <> encodeSingleConst t b <> encodeSingleConst t c <> encodeSingleConst t d <> encodeSingleConst t e <> encodeSingleConst t f <> encodeSingleConst t g <> encodeSingleConst t h <> encodeSingleConst t i <> encodeSingleConst t j <> encodeSingleConst t k <> encodeSingleConst t l <> encodeSingleConst t m <> encodeSingleConst t n <> encodeSingleConst t o <> encodeSingleConst t p {-# INLINE encodeNonNumConst #-} encodeNonNumConst :: NonNumType t -> t -> Builder encodeNonNumConst TypeBool{} x = intHost $(hashQ "Bool") <> word8 (fromBool x) encodeNonNumConst TypeChar{} x = intHost $(hashQ "Char") <> charUtf8 x encodeNonNumConst TypeCSChar{} (CSChar x) = intHost $(hashQ "CSChar") <> int8 x encodeNonNumConst TypeCUChar{} (CUChar x) = intHost $(hashQ "CUChar") <> word8 x encodeNonNumConst TypeCChar{} (CChar x) = intHost $(hashQ "CChar") <> $( [e| int8 |] ) x {-# INLINE fromBool #-} fromBool :: Bool -> Word8 fromBool True = 1 fromBool False = 0 {-# INLINE encodeNumConst #-} encodeNumConst :: NumType t -> t -> Builder encodeNumConst (IntegralNumType t) = encodeIntegralConst t encodeNumConst (FloatingNumType t) = encodeFloatingConst t {-# INLINE encodeIntegralConst #-} encodeIntegralConst :: IntegralType t -> t -> Builder encodeIntegralConst TypeInt{} x = intHost $(hashQ "Int") <> intHost x encodeIntegralConst TypeInt8{} x = intHost $(hashQ "Int8") <> int8 x encodeIntegralConst TypeInt16{} x = intHost $(hashQ "Int16") <> int16Host x encodeIntegralConst TypeInt32{} x = intHost $(hashQ "Int32") <> int32Host x encodeIntegralConst TypeInt64{} x = intHost $(hashQ "Int64") <> int64Host x encodeIntegralConst TypeWord{} x = intHost $(hashQ "Word") <> wordHost x encodeIntegralConst TypeWord8{} x = intHost $(hashQ "Word8") <> word8 x encodeIntegralConst TypeWord16{} x = intHost $(hashQ "Word16") <> word16Host x encodeIntegralConst TypeWord32{} x = intHost $(hashQ "Word32") <> word32Host x encodeIntegralConst TypeWord64{} x = intHost $(hashQ "Word64") <> word64Host x encodeIntegralConst TypeCShort{} (CShort x) = intHost $(hashQ "CShort") <> int16Host x encodeIntegralConst TypeCUShort{} (CUShort x) = intHost $(hashQ "CUShort") <> word16Host x encodeIntegralConst TypeCInt{} (CInt x) = intHost $(hashQ "CInt") <> int32Host x encodeIntegralConst TypeCUInt{} (CUInt x) = intHost $(hashQ "CUInt") <> word32Host x encodeIntegralConst TypeCLLong{} (CLLong x) = intHost $(hashQ "CLLong") <> int64Host x encodeIntegralConst TypeCULLong{} (CULLong x) = intHost $(hashQ "CULLong") <> word64Host x encodeIntegralConst TypeCLong{} (CLong x) = intHost $(hashQ "CLong") <> $( [e| int64Host |] ) x encodeIntegralConst TypeCULong{} (CULong x) = intHost $(hashQ "CULong") <> $( [e| word64Host |] ) x {-# INLINE encodeFloatingConst #-} encodeFloatingConst :: FloatingType t -> t -> Builder encodeFloatingConst TypeHalf{} (Half (CUShort x)) = intHost $(hashQ "Half") <> word16Host x encodeFloatingConst TypeFloat{} x = intHost $(hashQ "Float") <> floatHost x encodeFloatingConst TypeDouble{} x = intHost $(hashQ "Double") <> doubleHost x encodeFloatingConst TypeCFloat{} (CFloat x) = intHost $(hashQ "CFloat") <> floatHost x encodeFloatingConst TypeCDouble{} (CDouble x) = intHost $(hashQ "CDouble") <> doubleHost x }}} In order to compile this, only 3 dependencies need to be downloaded: `base-orphans`, `half`, and `hashable`. This makes it easy to build this without Cabal, plug in different compilers, and compile files individually. Now, it turns out that the above sample, after everything it depends on has been compiled, takes about 30 seconds to compile (with full optimizations), and Core size blows up to about 140k terms. Changing the pragmas for `encodeSingleConst` and `encodeVectorConst` to `NOINLINE` however brings this down to only 3 seconds; changing only `encodeVectorConst` still gets us 20 seconds. Tentative hypothesis: simplifier somehow chokes on the TH-generated code in `encodeVectorConst`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Interesting -- thanks for the smaller code. This is beginning to smell like #15253. See esp comment:22. `encodeVectorConst` has stuff like {{{ intHost $(hashQ "V16") <> encodeSingleConst t a <> encodeSingleConst t b <> etc }}} where `encodeSingleConst` is a giant `case`. So I think we may be over- doing case-of-case as in #15253. I've been trying to work on #15253 but keep getting distracted. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Unfortunately, the `dump-simpl` output is too large to attach (~3 MiB), but almost all of it is indeed one huge `case` for `encodeVectorConst`, and it does fit the case-of-case pattern. So let me see if I understand this correctly: case-of-case is supposed to unwrap constructs of the shape `case (case a of b -> c) of d -> e` into `case a of b -> (case c of d -> e)`, and normally, the result would then reduce to something simpler than the original case-of-case construct - but in this case, it doesn't. Does that mean what we need is a way to tell when this is going to happen, and in those cases, skip the transform? And something I don't understand yet is how this relates to #15253 (adding support for type-level integers). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by alpmestan): * cc: alpmestan (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Darn. I meant #13253, esp comment:22. Sorry about that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Ah, yes, that makes sense. A lot, actually. The situation fits the description like a glove - we have the nested cases, we have small expressions at the leaves, and we have the "-O2 triggers exponential blowup" part. However, the patch proposed in https://ghc.haskell.org/trac/ghc/ticket/13253#comment:24 does not make things any better - core size still blows up the same (114k terms), and compilation time is about the same. In fact, no matter what conditions I put in that particular spot, I always get the same result, suggesting that this particular program never hits the `OneOcc` branch at all. I still think case-of-case is the direction to investigate, but this particular patch does not seem to be the solution. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tdammers): * Attachment "Repro.hs" added. Further simplified reproduction case -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Further simplified reproduction case produces about 25,000 terms as-is, but only 7,000 when marking `encodeVectorConst` and `encodeSingleConst` as `NOINLINE`. I've removed the dependency on TH, some extensions, and all other modules except the `.Type` one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Here's what the updated example looks like after desugaring: {{{ encodeIntegralConst encodeIntegralConst = \ @ t_aaZ7 ds_dbct x_aaUu -> case ds_dbct of { TypeInt co_aaZJ _ -> <> $fSemigroupBuilder (intHost (I# 200#)) (intHost (x_aaUu `cast` Co:2)); TypeInt8 co_aaZR _ -> <> $fSemigroupBuilder (intHost (I# 208#)) (int8 (x_aaUu `cast` Co:2)); TypeInt16 co_ab00 _ -> <> $fSemigroupBuilder (intHost (I# 216#)) (int16Host (x_aaUu `cast` Co:2)); -- ... } -- RHS size: {terms: 45, types: 57, coercions: 14, joins: 0/0} encodeFloatingConst encodeFloatingConst = \ @ t_aaY8 ds_db5K ds_db5L -> case ds_db5K of { TypeHalf co_aaYn _ -> <> $fSemigroupBuilder (intHost (I# 500#)) (word16Host (((ds_db5L `cast` Co:2) `cast` Co:1) `cast` Co:1)); TypeFloat co_aaYC _ -> <> $fSemigroupBuilder (intHost (I# 510#)) (floatHost (ds_db5L `cast` Co:2)); -- ... } -- RHS size: {terms: 10, types: 12, coercions: 0, joins: 0/0} encodeNumConst encodeNumConst = \ @ t_ab24 ds_dbjh -> case ds_dbjh of { IntegralNumType t_aaUs -> encodeIntegralConst t_aaUs; FloatingNumType t_aaUt -> encodeFloatingConst t_aaUt } -- RHS size: {terms: 5, types: 0, coercions: 0, joins: 0/0} $trModule $trModule = Module (TrNameS "main"#) (TrNameS "Repro"#) -- RHS size: {terms: 11, types: 4, coercions: 0, joins: 0/0} fromBool fromBool = \ ds_dbjn -> case ds_dbjn of { False -> fromInteger $fNumWord8 0; True -> fromInteger $fNumWord8 1 } -- RHS size: {terms: 46, types: 57, coercions: 13, joins: 0/0} encodeNonNumConst encodeNonNumConst = \ @ t_ab2w ds_dbjr x_aaUn -> case ds_dbjr of { TypeBool co_ab2I _ -> <> $fSemigroupBuilder (intHost (I# 0#)) (word8 (fromBool (x_aaUn `cast` Co:2))); TypeChar co_ab2N _ -> <> $fSemigroupBuilder (intHost (I# 100#)) (charUtf8 (x_aaUn `cast` Co:2)); -- ... } -- RHS size: {terms: 10, types: 12, coercions: 0, joins: 0/0} encodeSingleConst encodeSingleConst = \ @ t_ab3l ds_dblJ -> case ds_dblJ of { NumSingleType t_aaUe -> encodeNumConst t_aaUe; NonNumSingleType t_aaUf -> encodeNonNumConst t_aaUf } -- RHS size: {terms: 47, types: 52, coercions: 4, joins: 0/0} encodeVectorConst encodeVectorConst = \ @ t_ab3r ds_dblP ds_dblQ -> case ds_dblP of { __DEFAULT -> patError "Repro.hs:(29,1)-(30,133)|function encodeVectorConst"#; Vector2Type @ a_ab3G co_ab3H t_aaUg -> case ds_dblQ `cast` Co:2 of { V2 a_aaUh b_aaUi -> <> $fSemigroupBuilder (intHost (I# 2#)) (<> $fSemigroupBuilder (encodeSingleConst t_aaUg a_aaUh) (encodeSingleConst t_aaUg b_aaUi)) }; Vector3Type @ a_ab3X co_ab3Y t_aaUj -> case ds_dblQ `cast` Co:2 of { V3 a_aaUk b_aaUl c_aaUm -> <> $fSemigroupBuilder (intHost (I# 3#)) (<> $fSemigroupBuilder (encodeSingleConst t_aaUj a_aaUk) (<> $fSemigroupBuilder (encodeSingleConst t_aaUj b_aaUl) (encodeSingleConst t_aaUj c_aaUm))) } } }}} After inlining, the pattern that we get is something like: {{{ case a of A1 b1 -> case b1 of B1 c -> case c of ... A2 b2 -> case b2 of B2 d -> case d of ... }}} But this not the case-of-case pattern at all! It's just a very large nested `case`, resulting from excessive inlining, and the structure of the whole thing is such that every path through the tree of `case`s is unique at every step, so statically analyzing the `case` branches does not lead to any simplification. In fact, we can achieve the same kind of blowup with the following example code (no dependencies whatsoever): {{{#!haskell module SimpleBlowup where data Foo = Foo1 Bar | Foo2 Bar | Foo3 Bar | Foo4 Bar | Foo5 Bar | Foo6 Bar | Foo7 Bar | Foo8 Bar | Foo9 Bar | Foo10 Bar | Foo11 Bar | Foo12 Bar | Foo13 Bar | Foo14 Bar | Foo15 Bar | Foo16 Bar | Foo17 Bar | Foo18 Bar | Foo19 Bar | Foo20 Bar data Bar = Bar1 Baz | Bar2 Baz | Bar3 Baz | Bar4 Baz data Baz = Baz1 Int | Baz2 Int | Baz3 Int | Baz4 Int | Baz5 Int | Baz6 Int | Baz7 Int | Baz8 Int | Baz9 Int | Baz10 Int | Baz11 Int | Baz12 Int | Baz13 Int | Baz14 Int | Baz15 Int | Baz16 Int | Baz17 Int | Baz18 Int | Baz19 Int | Baz20 Int {-#INLINE encodeFoo #-} encodeFoo :: Foo -> Int encodeFoo (Foo1 bar) = encodeBar bar + 1 encodeFoo (Foo2 bar) = encodeBar bar + 2 encodeFoo (Foo3 bar) = encodeBar bar + 3 encodeFoo (Foo4 bar) = encodeBar bar + 4 encodeFoo (Foo5 bar) = encodeBar bar + 5 encodeFoo (Foo6 bar) = encodeBar bar + 6 encodeFoo (Foo7 bar) = encodeBar bar + 7 encodeFoo (Foo8 bar) = encodeBar bar + 8 encodeFoo (Foo9 bar) = encodeBar bar + 9 encodeFoo (Foo10 bar) = encodeBar bar + 10 encodeFoo (Foo11 bar) = encodeBar bar + 11 encodeFoo (Foo12 bar) = encodeBar bar + 12 encodeFoo (Foo13 bar) = encodeBar bar + 13 encodeFoo (Foo14 bar) = encodeBar bar + 14 encodeFoo (Foo15 bar) = encodeBar bar + 15 encodeFoo (Foo16 bar) = encodeBar bar + 16 encodeFoo (Foo17 bar) = encodeBar bar + 17 encodeFoo (Foo18 bar) = encodeBar bar + 18 encodeFoo (Foo19 bar) = encodeBar bar + 19 encodeFoo (Foo20 bar) = encodeBar bar + 20 {-#INLINE encodeBar #-} encodeBar :: Bar -> Int encodeBar (Bar1 baz) = encodeBaz baz + 1 encodeBar (Bar2 baz) = encodeBaz baz + 2 encodeBar (Bar3 baz) = encodeBaz baz + 3 encodeBar (Bar4 baz) = encodeBaz baz + 4 {-#INLINE encodeBaz #-} encodeBaz :: Baz -> Int encodeBaz (Baz1 i) = encodeInt i + 1 encodeBaz (Baz2 i) = encodeInt i + 2 encodeBaz (Baz3 i) = encodeInt i + 3 encodeBaz (Baz4 i) = encodeInt i + 4 encodeBaz (Baz5 i) = encodeInt i + 5 encodeBaz (Baz6 i) = encodeInt i + 6 encodeBaz (Baz7 i) = encodeInt i + 7 encodeBaz (Baz8 i) = encodeInt i + 8 encodeBaz (Baz9 i) = encodeInt i + 9 encodeBaz (Baz10 i) = encodeInt i + 10 encodeBaz (Baz11 i) = encodeInt i + 11 encodeBaz (Baz12 i) = encodeInt i + 12 encodeBaz (Baz13 i) = encodeInt i + 13 encodeBaz (Baz14 i) = encodeInt i + 14 encodeBaz (Baz15 i) = encodeInt i + 15 encodeBaz (Baz16 i) = encodeInt i + 16 encodeBaz (Baz17 i) = encodeInt i + 17 encodeBaz (Baz18 i) = encodeInt i + 18 encodeBaz (Baz19 i) = encodeInt i + 19 encodeBaz (Baz20 i) = encodeInt i + 20 {-#INLINE encodeInt #-} encodeInt :: Int -> Int encodeInt i = (i * 47 + 31) `mod` 17 }}} This blows up Core size to about 45,000 terms. Changing all the `INLINE`s to `NOINLINE` however, we only get 1,200 terms. So AFAICT, this isn't case-of-case being overly eager, it's just GHC obediently honoring those `INLINE` pragmas, which legit produces a lot of Core, and due to the structure of the things being inlined, the usual crossing-off of obvious non-matches isn't possible. This is probably not something you encounter a lot in the wild, because in order to trigger this in a noticable way, the following conditions must be met: - Several layers of function applications forming a nested pattern-match - ...each of them marked `INLINE` - ...and essentially consisting of a pattern-matching construct with many (tens or more) branches I presume that this would typically involve nested data types with many constructors on multiple levels. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
So AFAICT, this isn't case-of-case being overly eager, it's just GHC obediently honoring those `INLINE` pragmas, which legit produces a lot of Core
Yes you are right about that. * But why did it not happen in earlier versions of GHC, which should have been equally obedient? I'm worried that this kind of blow-up can happen even without INLINE pragmas: {{{ f1 x = if x>0 then 0 else 1 f2 x = if x>0 then f1 (x-1) else f1 (x-2) f3 x = if x>0 then f2 (x-1) else f2 (x-2) f4 x = if x>0 then f3 (x-1) else f3 (x-2) h x = f x }}} Now * `f4` looks small, so we could inline it at its call in `h` * Now we have two calls to `f3`; but `f3` is small so we can inline them both. * Now we have four calls to `f2`; but `f2` is small so we can inline them all. ...and so on. This happens if we inline "bottom up". If instead we did "top-down" we might inline `f1` into `f2`, and `f2` into `f3`... but then `f3` would look big so we would not inline it into `f4`. But we are clearly walking close to the precipice. Who writes such function nests? Well, `accelerate` perhaps (but see the above question). But they ''also'' arise naturally from the join points created from deeply-nested case-of-case transforms, and that's the relevance of the case-of-case stuff. So we have two threads to pursue * Why does `accelerate` have INLINE pragmas on these nested functions? Obedience to those pragmas will certainly cause trouble. * How can we avoid blow-up when (absent INLINE pragmas) such definition nests occur naturally? This is the thing I've been thinking about. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers):
But why did it not happen in earlier versions of GHC, which should have been equally obedient?
Do we have evidence of this? So far, I have only tested `accelerate` against 8.4 and HEAD, and both show the blow-up. AFAICT, the `accelerate` situation is rather specific, I don't see people come up with code like this a lot, not with the explicit inlining anyway. Which leaves those situations where GHC decides to inline on its own; but for those cases, my guess would be that the explanation why this happens now and not before is because we simply optimize more aggressively now. I'll run `accelerate` and the reproduction case against an older GHC just to double check, but I'd expect things to still blow up.
Why does accelerate have INLINE pragmas on these nested functions? Obedience to those pragmas will certainly cause trouble.
Well, what do you know, there's already a ticket: https://github.com/AccelerateHS/accelerate/issues/428. I took the liberty to comment on that, so hopefully we'll get some feedback on the matter from there.
How can we avoid blow-up when (absent INLINE pragmas) such definition nests occur naturally? This is the thing I've been thinking about.
So essentially this boils down to figuring out whether inlining is going to pay off; and the challenge is that just looking at the size before and after inlining isn't going to be enough, because normally we expect the large post-inlining (or post-case-of-case-transform) core to be large but sufficiently shrinkable (by crossing off obvious non-matches). So what we would need is a way to tell whether our code, once blown up, will indeed be shrinkable, either before we blow it up, or right after (but before any further transformations, because those are now potentially expensive). This is probably something that should be addressed via #13253 though; it's not what causes the problem in this case here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Update: it appears that `accelerate` is already slow to compile on GHC 8.0.2, so this doesn't seem to be a new issue - more likely, it's one that we run into more often now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: benl@… (added) Comment: I talked to Ben Lippmeier at ICFP, who said he'd look into the accelerate end of this ticket, and discuss it with Trevor. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by George): Shouldn't the milestone be moved to 8.8.1? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.6.1 => 8.8.1 Comment: Yep! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: #15751 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tdammers): * related: => #15751 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15488: GHC takes up huge amount of memory when compiling accelerate 1.2.0 -------------------------------------+------------------------------------- Reporter: noah | Owner: tdammers Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: | accelerate,memory,compile Operating System: Linux | Architecture: x86_64 | (amd64) Type of failure: Compile-time | Test Case: accelerate performance bug | 1.2.0 Blocked By: | Blocking: Related Tickets: #15751 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): The `ZipWith` module peformance problems seem to have a different cause; we'll look into that one separately in #15751. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15488#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC