RE: Separating typechecking and type error reporting in two passes?

30 Nov 2016

      |      Then a second pass is done over the syntax tree. This pass does
|  keep
|      track of the context. Whenever it finds some error evidence, it
|      reports it.

The syntax tree is a big type. A second pass would be a fairly big deal.  But doable.

You'd need to be able to look at the nature of the error; SDocs won't do. Maybe errors become a data type.  That might be a good idea but it would be another big deal.

I very much doubt that you'll be able to discard the context information from the type checker. Maybe some of it.   I can't say exactly why, it's a gut feel for now.

There is a rich literature on type errors, worth digging into.  Alejandro Serrano and his adviser Juraiaan Hage are thinking about this at the moment.  SherLoc was interesting too; worked for Haskell.

Lennart gave a nice talk at the Haskell Implementors meeting last year about error provenance.  Maybe available online?

Simon

|  -----Original Message-----
|  From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of
|  Joachim Breitner
|  Sent: 29 November 2016 16:20
|  To: ghc-devs@haskell.org
|  Subject: Separating typechecking and type error reporting in two
|  passes?
|  
|  Hi,
|  
|  the following is a rough idea that I came up with while pondering
|  #12864, which is about better (and fewer!) error messages when the
|  user forgot an argument to a function, or swapped their order.
|  
|  
|  The current scheme of type checking and error reporting is the
|  following (please correct me if I am wrong, I have actually done
|  little in this area so far):
|  
|      The typechecker traverses the syntax tree from top to bottom.
|  Along
|      the way, it keeps track of the current context in terms of SDoc
|      fragments of the form “In the second argument of…”. It generates
|      constraints (e.g. the argument type of a function is the type of
|  the
|      actual argument), which it either solves, or propagate outwards as
|      “wanted constraints”, which refer to their context, and refer to a
|      “hole” in the program where, if this constraint gets solved later,
|      some form of evidence gets filled in.
|      At the end, if there are any wanted constraints left, they are
|      reported as type errors, together with the context they are
|      mentioned.
|      But with -fdefer-type-errors, they are not reported, but the
|      evidence holes are filled in with essentially calls to `error`
|  that
|      print the error message at runtime.
|  
|  The problem I see with this approach is that the type checker has to
|  remember interesting bits about context that might possibly eventually
|  be relevant when reporting the error. In particular, it makes it
|  harder  to report multiple related error messages at once.
|  
|  
|  So I am wondering if this approach would be better:
|  
|      The type checker does *not* bother keeping track of context: It
|      traverses the syntax tree, creates constraints, and unsolvable
|      constraints get filled with “error evidence”. Basically as if
|      -fdefer-type-errors is one.
|  
|      Then a second pass is done over the syntax tree. This pass does
|  keep
|      track of the context. Whenever it finds some error evidence, it
|      reports it.
|  
|  
|  I see the following advantages:
|  
|   * The type checker code gets a bit simpler and tidier.
|   * As most compiled programs are type correct [citation needed], the
|     type checker, but not the error reporting, is compiler-performance-
|     critical. This might make type checking a (possibly insignificant)
|     tad faster.
|   * The error reporting pass can “look around” more. For example,
|  before
|     recursing into a pair `(e1,e2)`, it could check for the special
|  case
|     of `(e1 ▷ TyError Int Bool, e1 ▷ TyError Bool Int)` and report this
|     as one error “Tuple with swapped arguments” instead of two errors.
|     The #12864 is a more elaborate application of this.
|   * There could even be an intermediate pass over the syntax tree
|     between typechecking and error reporting that “transform” or
|     “optimizes” the AST by moving type errors around, by coalescing
|     error or other things, all with the aim of giving easier to
|     understand error messages.
|   * (This is getting into the area of wild dreams:)
|     Library authors could somehow express custom error messages
|     for certain patterns of the form
|     If the AST now contains an application of `Library.foo` where
|     the first argument is a type error `TyError Int Bool`, then
|     give this particular domain-specific error message.
|  
|  I don’t see any disadvantages now, but there probably are some, and I
|  would be grateful about feedback from those who actually have worked
|  with this part of GHC before.
|  
|  Thanks,
|  Joachim
|  
|  --
|  Joachim “nomeata” Breitner
|    mail@joachim-breitner.de •
|  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.j
|  oachim-
|  breitner.de%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C41dcebe5f26043
|  94f94108d41873a1f0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636160
|  332322948063&sdata=hoylWVvBSQjqGmm0ICnr3nDFz80waxDStDOjWDPQb5A%3D&rese
|  rved=0
|    XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
|    Debian Developer: nomeata@debian.org

RE: Separating typechecking and type error reporting in two passes?

Simon Peyton Jones