
Dear Cafe, I'm exploring the design space of programming language libraries with enhanced usability and I'd your help and comments. I'll start with a few short questions, but offer a detailed discussion of the motivations and the problems I'm facing below. So, if you have interest in the subject or feel you can offer some insight, please, do read on. * Does any generic traversal/transformation (uniplate-style) library support GADTs? * What is the best choice, performance- and memory-wise, for a parser combinator library with support for arbitrary look-ahead? Parsec is considered slow by some [1], but is it only in comparison with attoparsec (which, unfortunately, doesn't support arbitrary look-ahead)? Is there any parser library that performs better than Parsec while still supporting arbitrary look-ahead. * Any multi-mode pretty printer libraries? By multi-mode I mean writing code once and being able to generate, say, both "pretty" and "minified" text representations of a program by changing just one parameter. Also, what's the most efficient pretty-printing library nowadays? Blaze? I've been using Haskell for quite a while now, primarily, for programming-language applications: program analysis, transformation and compilation. I'm sure many would agree that PL work is where Haskell shines. In the recent years new language features and libraries --namely, GADTs, Template Haskell, quasi-quotation and generic programming--- have appeared that could make working with languages even easier than before. That's why it's sad to see that none of the PL libraries seem to make good use of these features (however, I might be starting to understand why). So, I'm currently exploring the design space for a library that uses these advanced Haskell features for delivering better usability, and I'm having problems with implementing some of them. I welcome comments on both the motivations, overall design and the more technical aspects. I've omitted a few details because it's a long e-mail as is. If something is not clear or doesn't make sense, please, let me know. I'll start by listing the features that an "ideal" PL library should have, and that I've come to cherish as both a heavy user and a developer of such. The basic features (pretty much every library has them) include a parser (text->AST (abstract syntax tree)) and a pretty-printer (AST->text), as well as a Haskell representation of the AST that is somewhat easy to use. Pretty much every library has that --- although some might debate the ease of use of the AST representations. However, there are other features that, in my opinion, are essential to a PL library. The features are motivated by three requirements: static safety (as few run-time errors as possible), minimal code duplication (DRY) and ease of use and inspection of the code. 1) the pretty-printer should be multi-mode. One should be able to write code once and be able to generate different textual representations of the AST: - the "pretty" which is nice to the eye with white spaces, indentations etc. - minified, with minimum white space (while still being valid) - debuggable which inserts comments based on AST annotations - source-map generation - being able to generate colored LaTeX/HTML code would be nice, but non-essential 2) ASTs should be statically safe: you should only be able construct values that represent valid programs, or get a typechecker/compiler error otherwise. Languages that have syntactic productions that can appear in one context but not in another need GADTs with type witnesses to achieve that. In fact, such languages are often used to motivate GADTs in the first place [2]. And while the problem in [2] could have been solved by splitting the Expr datatype into two (IntExpr and BoolExpr), in some languages this can't be done (or produces awkward syntax trees). 3) a quasi-quoter with support for anti-quotation and quoted patterns. This also saves a lot of typing *and* makes your code less error-prone and easier to read. What is better (to both write and read)?
[js|#x# = (function (a, b) {return {t1: a + b, t2: a*b};})(#x#, #y#);|] or ExprStmt def $ AssignExpr def x (CallExpr def (FuncExpr def Nothing [Id def "a", Id def "b"] $ ReturnStmt def $ ObjectLit [(PropId def $ Id def "t1", InfixExpr def OpAdd (VarRef def $ Id def "a") (VarRef def $ Id def "b")), PropId def $ Id def "t2", InfixExpr def OpMul (VarRef def $ Id def "a") (VarRef def $ Id def "b")])) [x, y]
The caveat here is that, to help ensure correctness, the quasi-quoter and the parser should share code as much code as possible. Ideally, there should be just one parser that has a switch for recognizing normal and quasi-quoted programs. However, that would require adding additional constructors representing anti-quotations to our AST. And with that the user might be able to generate invalid AST's and cause a run-time error. The solution to the last problem is to statically constrain all the values that are passed to, say, the pretty-printer so that they are guaranteed to be free of anti-quotes (see an example definition below). However, that, again, requires GADTs (e.g. have all the AST datatypes have an extra type parameter).
data EType = Complete | HasHoles type family Quoted a b :: * canHaveHolesT :: a -> b -> Quoted a b canHaveHolesT _ _ = undefined type instance Quoted HasHoles Complete = HasHoles type instance Quoted Complete HasHoles = HasHoles type instance Quoted HasHoles HasHoles = HasHoles type instance Quoted Complete Complete = HasHoles data Expr t where EInt :: Int -> Expr Complete EAdd :: Expr t1 -> Expr c2 -> Expr (Holes t1 t2) ... EQuote :: String -> Expr HasHoles
And then we could have a normal parser return a value 'Expr Complete' and a quasi-quotation parser retunr a value 'Expr HasHoles'. Similarly, the pretty printer function could have type 'Expr Complete -> Doc'. 4) We should be able to annotate ASTs with arbitrary values, and change the types of those values as we go. The most user friendly way, IMO, is to have the AST datatypes be polymorphic and have that type parameter as an extra field in every constructor. E.g.,
data Expr t a where EInt :: a -> Int -> Expr Complete a EAdd :: a -> Expr t1 a -> Expr c2 a -> Expr (Holes t1 t2) a ... EQuote :: a -> String -> Expr HasHoles a
Then we can use the functions in Traverseable to change types of annotations, and inspect the values by pattern-matching on constructors. 5) support for generic operations on syntax trees. Uniplate, which has been designed to work with ASTs, and is awesome for that purpose because it saves a lot of time. I use transform(Bi) and universe(Bi) all the time and it saves *a lot* of typing. Pretty much all my analysis/transformation code uses those four small-but-powerful function calls -- and, dare I say, it's quite elegant. Other useful, but not crucial features include: 1) diffs for ASTs (in the spirit of the 'gdiff' library, which, alas, doesn't work with polymorphic datatypes) 2) QuickCheck arbitrary instances for ASTs. No technical difficulty there, but writing instances that generate interesting programs and don't run out of memory is quite hard :) I wish 'Agata' was still supported, or there was some library that helps writing Arbitrary instances for ASTs. If you think there's another feature in mind that is missing from the list, please, let me know. The (perceived) challenges in implementing the functionality outlined above are as follows: 1) No multi-mode pretty-printing library. I think that the mutli-mode functionality could be implemented on top of an existing library by definining new combinators, but it would be nice to have a library that supports them out of the box. The particular features that I'm missing are: - "non-essential space/(soft-)line break" combinators that are interpreted as spaces/line-breaks in the "pretty" mode and as empty docs in the "minified" mode. - "comment" combinator which inserts the text in a comment only if the "debug" mode is on - being able to record the positions of AST nodes in the resulting text (for generating source maps). Not sure what would be a convenient interface for that. Note: I know that mainland-pretty has position information, but I don't think it's helpful for generating source maps. 2) The biggest problem is that there are two good reasons to use GADTs when specifying AST datatypes. However, uniplate doesn't work with GADTs and, as far as I know, no currently supported generic programming library does (to be precise, I need support for families of mutually recursive polymorphic GADTs). Am I missing some library, or is my understanding correct? If it's the latter, is there any fundamental limitation that prevents creating such a library? Maybe there are other (but still elegant) ways to satisfy my requirements without using GADTs? 3) 'gdiff' doesn't support polymorphic datatypes. Is there any other library that does? [1] http://www.serpentine.com/blog/2010/03/03/whats-in-a-parsing-library-1/ [2] http://en.wikibooks.org/wiki/Haskell/GADT PS: My attempts so far are in https://github.com/achudnov/language-nextgen/blob/master/Language/Nextgen/Sy... Regards, Andrey Chudnov