[GHC] #14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Keywords: | Operating System: Unknown/Multiple PatternMatchWarnings | Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Consider the following program: {{{#!hs {-# LANGUAGE GADTs #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} module Bug where data family Sing (z :: k) class SEq k where (%==) :: forall (a :: k) (b :: k). Sing a -> Sing b -> () infix 4 %== data Foo a b c d = A a b c d | B a b c d | C a b c d | D a b c d | E a b c d | F a b c d data instance Sing (z_awDE :: Foo a b c d) where SA :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('A a b c d) SB :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('B a b c d) SC :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('C a b c d) SD :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('D a b c d) SE :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('E a b c d) SF :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('F a b c d) $([d| instance (SEq a, SEq b, SEq c, SEq d) => SEq (Foo a b c d) where (%==) (SA _ _ _ _) (SA _ _ _ _) = () (%==) (SA _ _ _ _) (SB _ _ _ _) = () (%==) (SA _ _ _ _) (SC _ _ _ _) = () (%==) (SA _ _ _ _) (SD _ _ _ _) = () (%==) (SA _ _ _ _) (SE _ _ _ _) = () (%==) (SA _ _ _ _) (SF _ _ _ _) = () (%==) (SB _ _ _ _) (SA _ _ _ _) = () (%==) (SB _ _ _ _) (SB _ _ _ _) = () (%==) (SB _ _ _ _) (SC _ _ _ _) = () (%==) (SB _ _ _ _) (SD _ _ _ _) = () (%==) (SB _ _ _ _) (SE _ _ _ _) = () (%==) (SB _ _ _ _) (SF _ _ _ _) = () (%==) (SC _ _ _ _) (SA _ _ _ _) = () (%==) (SC _ _ _ _) (SB _ _ _ _) = () (%==) (SC _ _ _ _) (SC _ _ _ _) = () (%==) (SC _ _ _ _) (SD _ _ _ _) = () (%==) (SC _ _ _ _) (SE _ _ _ _) = () (%==) (SC _ _ _ _) (SF _ _ _ _) = () (%==) (SD _ _ _ _) (SA _ _ _ _) = () (%==) (SD _ _ _ _) (SB _ _ _ _) = () (%==) (SD _ _ _ _) (SC _ _ _ _) = () (%==) (SD _ _ _ _) (SD _ _ _ _) = () (%==) (SD _ _ _ _) (SE _ _ _ _) = () (%==) (SD _ _ _ _) (SF _ _ _ _) = () (%==) (SE _ _ _ _) (SA _ _ _ _) = () (%==) (SE _ _ _ _) (SB _ _ _ _) = () (%==) (SE _ _ _ _) (SC _ _ _ _) = () (%==) (SE _ _ _ _) (SD _ _ _ _) = () (%==) (SE _ _ _ _) (SE _ _ _ _) = () (%==) (SE _ _ _ _) (SF _ _ _ _) = () (%==) (SF _ _ _ _) (SA _ _ _ _) = () (%==) (SF _ _ _ _) (SB _ _ _ _) = () (%==) (SF _ _ _ _) (SC _ _ _ _) = () (%==) (SF _ _ _ _) (SD _ _ _ _) = () (%==) (SF _ _ _ _) (SE _ _ _ _) = () (%==) (SF _ _ _ _) (SF _ _ _ _) = () |]) }}} It takes significantly longer to compile this program on 8.4 and HEAD: {{{ $ /opt/ghc/8.4.1/bin/ghc --version The Glorious Glasgow Haskell Compilation System, version 8.4.1 $ time /opt/ghc/8.4.1/bin/ghc Bug.hs -fforce-recomp [1 of 1] Compiling Bug ( Bug.hs, Bug.o ) real 0m0.285s user 0m0.236s sys 0m0.036s $ /opt/ghc/head/bin/ghc --version The Glorious Glasgow Haskell Compilation System, version 8.5.20180306 $ time /opt/ghc/head/bin/ghc Bug.hs -fforce-recomp [1 of 1] Compiling Bug ( Bug.hs, Bug.o ) real 0m29.684s user 0m29.656s sys 0m0.060s }}} The reason for this regression is somewhat incidental—it's due to commit ffb2738f86c4e4c3f0eaacf0a95d7326fdd2e383 (`Fix #14838 by marking TH- spliced code as FromSource`). Before that commit, we were supressing pattern-match coverage checking entirely on TH-quoted code. We no longer do this, which means that we coverage-check the TH-quoted instance in that program, which appears to be why it takes so long to compile. This is a serious issue in practice because a good chunk of `singletons`-generated code is of this form, which means that a good amount of code is effectively uncompilable on GHC HEAD now. (See, for instance, this [https://travis- ci.org/goldfirere/singletons/jobs/350483543#L1182 Travis failure] on GHC HEAD.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): Curiously, data family instances seem to play a role in this. If I replace the data family formulation of `Sing` with a normal datatype: {{{#!hs {-# LANGUAGE GADTs #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} module Bug where class SEq k where (%==) :: forall (a :: k) (b :: k). Sing a -> Sing b -> () infix 4 %== data Foo a b c d = A a b c d | B a b c d | C a b c d | D a b c d | E a b c d | F a b c d data Sing (z_awDE :: k) where SA :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('A a b c d) SB :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('B a b c d) SC :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('C a b c d) SD :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('D a b c d) SE :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('E a b c d) SF :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('F a b c d) $([d| instance (SEq a, SEq b, SEq c, SEq d) => SEq (Foo a b c d) where (%==) (SA _ _ _ _) (SA _ _ _ _) = () (%==) (SA _ _ _ _) (SB _ _ _ _) = () (%==) (SA _ _ _ _) (SC _ _ _ _) = () (%==) (SA _ _ _ _) (SD _ _ _ _) = () (%==) (SA _ _ _ _) (SE _ _ _ _) = () (%==) (SA _ _ _ _) (SF _ _ _ _) = () (%==) (SB _ _ _ _) (SA _ _ _ _) = () (%==) (SB _ _ _ _) (SB _ _ _ _) = () (%==) (SB _ _ _ _) (SC _ _ _ _) = () (%==) (SB _ _ _ _) (SD _ _ _ _) = () (%==) (SB _ _ _ _) (SE _ _ _ _) = () (%==) (SB _ _ _ _) (SF _ _ _ _) = () (%==) (SC _ _ _ _) (SA _ _ _ _) = () (%==) (SC _ _ _ _) (SB _ _ _ _) = () (%==) (SC _ _ _ _) (SC _ _ _ _) = () (%==) (SC _ _ _ _) (SD _ _ _ _) = () (%==) (SC _ _ _ _) (SE _ _ _ _) = () (%==) (SC _ _ _ _) (SF _ _ _ _) = () (%==) (SD _ _ _ _) (SA _ _ _ _) = () (%==) (SD _ _ _ _) (SB _ _ _ _) = () (%==) (SD _ _ _ _) (SC _ _ _ _) = () (%==) (SD _ _ _ _) (SD _ _ _ _) = () (%==) (SD _ _ _ _) (SE _ _ _ _) = () (%==) (SD _ _ _ _) (SF _ _ _ _) = () (%==) (SE _ _ _ _) (SA _ _ _ _) = () (%==) (SE _ _ _ _) (SB _ _ _ _) = () (%==) (SE _ _ _ _) (SC _ _ _ _) = () (%==) (SE _ _ _ _) (SD _ _ _ _) = () (%==) (SE _ _ _ _) (SE _ _ _ _) = () (%==) (SE _ _ _ _) (SF _ _ _ _) = () (%==) (SF _ _ _ _) (SA _ _ _ _) = () (%==) (SF _ _ _ _) (SB _ _ _ _) = () (%==) (SF _ _ _ _) (SC _ _ _ _) = () (%==) (SF _ _ _ _) (SD _ _ _ _) = () (%==) (SF _ _ _ _) (SE _ _ _ _) = () (%==) (SF _ _ _ _) (SF _ _ _ _) = () |]) }}} Then the compilation time for GHC HEAD goes back to being the same as in 8.4.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): This is bad. Really bad. I looked briefly into the GHC source, and found [http://git.haskell.org/ghc.git/blob/d9d463289fe20316cff12a8f0dbf414db678fa72... this ominous Note]: {{{ Note [Translate CoPats] ~~~~~~~~~~~~~~~~~~~~~~~ The pattern match checker did not know how to handle coerced patterns `CoPat` efficiently, which gave rise to #11276. The original approach translated `CoPat`s: pat |> co ===> x (pat <- (e |> co)) Instead, we now check whether the coercion is a hole or if it is just refl, in which case we can drop it. Unfortunately, data families generate useful coercions so guards are still generated in these cases and checking data families is not really efficient. }}} If that is to be believed, then is coverage-checking data family instances really doomed to be slow? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): To make the data type version of the program as slow to compile as the data family instance version, you can use explicit guards: {{{#!hs {-# LANGUAGE GADTs #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeInType #-} module Bug where class SEq k where (%==) :: forall (a :: k) (b :: k). Sing a -> Sing b -> () infix 4 %== data Foo a b c d = A a b c d | B a b c d | C a b c d | D a b c d | E a b c d | F a b c d data Sing (z_awDE :: k) where SA :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('A a b c d) SB :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('B a b c d) SC :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('C a b c d) SD :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('D a b c d) SE :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('E a b c d) SF :: Sing a -> Sing b -> Sing c -> Sing d -> Sing ('F a b c d) instance (SEq a, SEq b, SEq c, SEq d) => SEq (Foo a b c d) where (%==) x y | SA {} <- x , SA {} <- y = () | SA {} <- x , SB {} <- y = () | SA {} <- x , SC {} <- y = () | SA {} <- x , SD {} <- y = () | SA {} <- x , SE {} <- y = () | SA {} <- x , SF {} <- y = () | SB {} <- x , SA {} <- y = () | SB {} <- x , SB {} <- y = () | SB {} <- x , SC {} <- y = () | SB {} <- x , SD {} <- y = () | SB {} <- x , SE {} <- y = () | SB {} <- x , SF {} <- y = () | SC {} <- x , SA {} <- y = () | SC {} <- x , SB {} <- y = () | SC {} <- x , SC {} <- y = () | SC {} <- x , SD {} <- y = () | SC {} <- x , SE {} <- y = () | SC {} <- x , SF {} <- y = () | SD {} <- x , SA {} <- y = () | SD {} <- x , SB {} <- y = () | SD {} <- x , SC {} <- y = () | SD {} <- x , SD {} <- y = () | SD {} <- x , SE {} <- y = () | SD {} <- x , SF {} <- y = () | SE {} <- x , SA {} <- y = () | SE {} <- x , SB {} <- y = () | SE {} <- x , SC {} <- y = () | SE {} <- x , SD {} <- y = () | SE {} <- x , SE {} <- y = () | SE {} <- x , SF {} <- y = () | SF {} <- x , SA {} <- y = () | SF {} <- x , SB {} <- y = () | SF {} <- x , SC {} <- y = () | SF {} <- x , SD {} <- y = () | SF {} <- x , SE {} <- y = () | SF {} <- x , SF {} <- y = () }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by RyanGlScott): (Warning: half-baked thoughts follow.) For reasons that I don't fully understand, it seems that coverage-checking guards is not nearly as efficient as coverage-checking raw constructor patterns. In that case, a question arises: why are we desugaring coercion patterns (like what data family constructors give you) to guards? It seems that we could handle coercion patterns in a fairly natural way by extending the algorithm from [https://www.microsoft.com/en-us/research/wp- content/uploads/2016/08/gadtpm-acm.pdf GADTs Meet Their Match] slightly. First, we could extend the pattern syntax (figure 2 from the GADTs Meet Their Match paper) to include coercion patterns directly: {{{ p,x ::= x | K p_1 ... p_n | G | (p |> co) }}} Where `co : τ_1 ~ τ_2` is a coercion. Then, we could extend the coverage checking algorithm in figure 3 to include a case for coercion patterns. For instance: {{{ C ((p |> co) q_1 ... q_n) (Γ ⊢ u_0 u_1 ... u_n ⊳ Δ) = C (p q_1 ... q_n) (Γ, y : τ_1 ⊢ y u_0 u_1 ... u_n ⊳ Δ') where Γ ⊢ p : τ_1 Γ ⊢ u_o : τ_2 Γ ⊢ c : τ_1 ~ τ_2 y#Γ Δ' = Δ ∪ u ≈ (p |> co) }}} And similarly for U and D. That way, we wouldn't need guards at all here—we'd just have an extra case for coercion patterns that "pushes through" the types as necessary. Does this sound plausible? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Does this sound plausible?
Yes, something like that looks plausible, except that I think the last line should be more like {{{ Δ' = Δ ∪ u_0 ≈ (y |> co) }}} (patterns `p` don't appear in expressions). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): It seems to me like comment:4 (which I updated to fix a few mistakes) is only part of the story. In particular, the constraint `Δ'` proposed in that comment is not currently admitted under the constraint syntax given in the paper. In particular, the right hand size of a term equality (e.g. the `≈` production) is expected to be an expression. However, there is no way to lift `(u_0 |> sym co)` into an expression. More generally, it doesn't make sense to me to lift a value abstraction like `u_0` into an expression. Perhaps a new constraint variety is needed? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.6.1 => 8.8.1 Comment: Unfortunately the summary here suggests that the status quo isn't acceptable for release but the future direction is rather unclear. I can see a few possibilities: 1. Teach the coverage checker to bail out (with a warning) if it sees a "large" pattern of this form 2. Allow a TH splice to explicitly request that coverage checking be disabled. This seems like a 3. Add a new `FromTH` `Origin` and a flag allowing the user to choose whether to run the coverage checker on TH splices. 4. Advise users to disable coverage checking in any module affected by the bug 5. Revert ffb2738f86c4e4c3f0eaacf0a95d7326fdd2e383 for 8.6 Frankly, all of these options seem pretty terrible given that this pattern is problematic only due to a (hopefully temporary) limitation of the coverage checker. That being said, we clearly need to do something for 8.6. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by guibou): @bgamari I'm sorry, but your second point in comment:7 seems truncated. That's minor, but I'm interested by your conclusion on that point. Thank you. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14899: Significant compilation time regression between 8.4 and HEAD due to coverage checking -------------------------------------+------------------------------------- Reporter: RyanGlScott | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.5 Resolution: | Keywords: | PatternMatchWarnings, newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * keywords: PatternMatchWarnings => PatternMatchWarnings, newcomer Comment: Unfortunately there wasn't really a conclusion (hence nothing happened for 8.6.1. Task (6) above would make for a great task for someone with a free weekend or three. It would require working out the theory but the coverage checker paper is fairly accessible. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14899#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC