
#11303: Pattern matching against sets of strings sharing a prefix blows up pattern checker -------------------------------------+------------------------------------- Reporter: bgamari | Owner: gkaracha Type: bug | Status: new Priority: highest | Milestone: 8.0.1 Component: Compiler | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11302 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by gkaracha): Ahhhh, I see. This has been noticed before. Actually, I have added a note in `deSugar/Check.hs` about it: `Note [Literals in PmPat]`. My primary test case for this problem has been function `mkTextEncoding'` from `libraries/base/GHC/IO/Encoding.hs` until now but I guess your example stresses the same problem even more. The source of the problem lies in the way we translate literals: * In the paper, we treat them as guards: literal `l` gets translated to `x (True <- x == l)`. This allows the algorithm to work uniformly on everything but is too expensive, even for `mkTextEncoding'`. Essentially, the covered set contains `x |> {True ~ (x == l)}` and the uncovered contains `x |> {False ~ (x == l)}`. Hence, we explode first (everything is a guard) and then prune (use the term oracle to check the constraints) which is really exponential. * Hence, I changed the implementation and added literals in the pattern language. This means that for literal `l` the covered set contains `l |> {}` and the uncovered set contains `x |> {False ~ (x == l)}` like before. Since literals are patterns, more things can be checked eagerly, like with constructors where we can see immediately that e.g. `True` can not match `False`. Hence, we generate less from the start. The last thing to do (and I think this will address all performance issues concerning literals) is to completely add literals in the pattern language, that is, use a variant of Sestoft's negative patterns: {{{#!hs data PmPat :: PatTy -> * where PmCon :: { ... } -> PmPat t PmVar :: Id -> PmPat t PmLit :: PmLit -> PmPat t PmNLit :: Id -> [PmLit] -> PmPat VA -- add this constructor PmGrd :: { ... } -> PmPat PAT }}} The idea is that `PmNLit x lits` represents in a compact way __all literals `x` that are **not equal** to any of `lits`__. Hence, we generate much less and there is significantly less need for pruning. There is a slight possibility that this change will affect #322 and also I expect the size of the checker to increase 5-10% (LoC) but I have no better solution at the moment. Any ideas on this approach? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11303#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler