
Hi folks, I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much. The rationale is that Cabal needs to parse things, like .cabal files and currently we do not have a decent parser in the core libraries. By decent I mean one that can produce error messages with source locations and that doesn't have unpredictable memory use. The only parser in the core libraries at the moment is Text.ParserCombinators.ReadP from the base package and that fails my "decent" criteria on both counts. Its idea of an error message is (), and on some largish .cabal files we take 100s of MB to parse (I realise that the ReadP in the base package is a cutdown version so I don't mean to malign all ReadP-style libs out there). Partly due to the performance problem, the terrible .cabal file error messages, and partly because Doaitse Swierstra keeps asking me if .cabal files have a grammar, I've been writing a new .cabal parser. It uses an alex lexer and a parsec parser. It's fast and the error messages are pretty good. I have reverse engineered a grammar that closely matches the existing parser and .cabal files in the wild, though I'm not sure Doaitse will be satisfied with the approach I've taken to handling layout. Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages). I've been doing regression testing against hackage and I'm satisfied that the new parser matches close enough. I've uncovered all kinds of horrors with .cabal files in the wild relying on quirks of the old parser. I've made adjustments for most of them but I will be breaking a half dozen old packages (most of those don't actually build correctly because though their syntax errors are not picked up by the parser, they do cause failure eventually). So far I've just done the outline parser, not the individual field parsers. I'll be doing those next and then integrate. So this change is still a bit of a ways off, but I thought it'd be useful to warn people now. Duncan