Advance notice that I'd like to make Cabal depend on parsec

Hi folks, I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much. The rationale is that Cabal needs to parse things, like .cabal files and currently we do not have a decent parser in the core libraries. By decent I mean one that can produce error messages with source locations and that doesn't have unpredictable memory use. The only parser in the core libraries at the moment is Text.ParserCombinators.ReadP from the base package and that fails my "decent" criteria on both counts. Its idea of an error message is (), and on some largish .cabal files we take 100s of MB to parse (I realise that the ReadP in the base package is a cutdown version so I don't mean to malign all ReadP-style libs out there). Partly due to the performance problem, the terrible .cabal file error messages, and partly because Doaitse Swierstra keeps asking me if .cabal files have a grammar, I've been writing a new .cabal parser. It uses an alex lexer and a parsec parser. It's fast and the error messages are pretty good. I have reverse engineered a grammar that closely matches the existing parser and .cabal files in the wild, though I'm not sure Doaitse will be satisfied with the approach I've taken to handling layout. Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages). I've been doing regression testing against hackage and I'm satisfied that the new parser matches close enough. I've uncovered all kinds of horrors with .cabal files in the wild relying on quirks of the old parser. I've made adjustments for most of them but I will be breaking a half dozen old packages (most of those don't actually build correctly because though their syntax errors are not picked up by the parser, they do cause failure eventually). So far I've just done the outline parser, not the individual field parsers. I'll be doing those next and then integrate. So this change is still a bit of a ways off, but I thought it'd be useful to warn people now. Duncan

On Thu, 2013-03-14 at 14:53 +0000, Duncan Coutts wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
It's already been pointed out to me that this also implies the following dependencies: text, deepseq, mtl, transformers deepseq is a core package already I think, though ghc doesn't actually depend on it currently. I should also say that I want to make Cabal depend on bytestring and text too. -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts wrote: Hi folks, I want to give you advance notice that I would like to make Cabal depend
on parsec. The implication is that GHC would therefore depend on parsec
and thus it would become a core package, rather than just a HP package.
So this would affect both GHC and the HP, though I hope not too much. +1 from me, although the amount of potential knock-on work might be
discouraging. The current cabal-install bootstrap process (which is
currently pretty easy and is necessary at times) will get a bunch more deps
as a result of this change, no?
--
Gregory Collins

On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts
wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
+1 from me, although the amount of potential knock-on work might be discouraging. The current cabal-install bootstrap process (which is currently pretty easy and is necessary at times) will get a bunch more deps as a result of this change, no?
Yes it will, but given that we do have a script it's not too bad I think. And overall I think its worth it to have the better error messages, performance and memory use. Do you have any idea how slow it is to parse all the .cabal files on hackage, and how much memory that takes? You'd be horrified :-) Duncan

This GHC dependency on Cabal is putting a rather troubling constraint
in Cabal's evolution, which in my opinion is a serious problem. When I
first took a look at the dependencies between GHC and Cabal I found it
a bit strange that GHC would depend on Cabal as I would expect GHC to
be as low in the dependency tree as possible to avoid exactly these
kinds of problems.
These GHC dependencies on Cabal are in fact small (see
http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
for a summary) and with a little bit of refactoring it would be
possible to split these dependencies into a very small shared package
with minimal or no further dependencies. This would liberate Cabal to
make the necessary refactoring.
IMHO, the addition of these new dependencies to Cabal should go
together with splitting the GHC-Cabal shared dependencies into a
separate package so that there would be no additional coordination
needed from then on between these two development efforts (except when
dealing with this new package).
On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts
wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
+1 from me, although the amount of potential knock-on work might be discouraging. The current cabal-install bootstrap process (which is currently pretty easy and is necessary at times) will get a bunch more deps as a result of this change, no?
Yes it will, but given that we do have a script it's not too bad I think. And overall I think its worth it to have the better error messages, performance and memory use. Do you have any idea how slow it is to parse all the .cabal files on hackage, and how much memory that takes? You'd be horrified :-)
Duncan
_______________________________________________ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel

On Thu, 2013-03-14 at 12:22 -0300, Administrator wrote:
This GHC dependency on Cabal is putting a rather troubling constraint in Cabal's evolution, which in my opinion is a serious problem. When I first took a look at the dependencies between GHC and Cabal I found it a bit strange that GHC would depend on Cabal as I would expect GHC to be as low in the dependency tree as possible to avoid exactly these kinds of problems.
The problem is that a compiler is a rather sophisticated application and so though you'd like it to have minimal deps, it needs to do so much stuff that it ends up needing lots of deps to support its features. Things would be easier if that were not the case, and it's made harder by the fact that ghc is not just a program, but it's exposed as a library, which exposes all of its dependencies.
These GHC dependencies on Cabal are in fact small (see http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png for a summary) and with a little bit of refactoring it would be possible to split these dependencies into a very small shared package with minimal or no further dependencies. This would liberate Cabal to make the necessary refactoring.
Except that the bits of Cabal that ghc needs are exactly the bits that will now need parsec, text etc. The shared part would be the part that defines the InstalledPackageInfo and the parser for that. Also, though the ghc library has only relatively small dependencies on Cabal, the ghc build process uses Cabal extensively, and currently the system is that libraries that ghc needs to build get included as core libraries and shipped with ghc. That itself could change but it's also more work.
IMHO, the addition of these new dependencies to Cabal should go together with splitting the GHC-Cabal shared dependencies into a separate package so that there would be no additional coordination needed from then on between these two development efforts (except when dealing with this new package).
So I would consider this if I thought it'd make a difference. In particular at some point we'll want to split the Cabal lib into the bit that just defines types and parsers etc, and the part that is a build system. But even that wouldn't save us any dependencies in this situation. Duncan

Yes I think that'd be a great plan. It's bizarre that GHC depends on *all* of Cabal, but only uses a tiny part of it (more or less the Package data type I think).
Simon
| -----Original Message-----
| From: cabal-devel-bounces@haskell.org [mailto:cabal-devel-bounces@haskell.org]
| On Behalf Of Administrator
| Sent: 14 March 2013 15:23
| To: Duncan Coutts
| Cc: Lentczner; cabal-devel; Haskell Libraries; ghc-devs@haskell.org
| Subject: Re: Advance notice that I'd like to make Cabal depend on parsec
|
| This GHC dependency on Cabal is putting a rather troubling constraint
| in Cabal's evolution, which in my opinion is a serious problem. When I
| first took a look at the dependencies between GHC and Cabal I found it
| a bit strange that GHC would depend on Cabal as I would expect GHC to
| be as low in the dependency tree as possible to avoid exactly these
| kinds of problems.
|
| These GHC dependencies on Cabal are in fact small (see
| http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
| for a summary) and with a little bit of refactoring it would be
| possible to split these dependencies into a very small shared package
| with minimal or no further dependencies. This would liberate Cabal to
| make the necessary refactoring.
|
| IMHO, the addition of these new dependencies to Cabal should go
| together with splitting the GHC-Cabal shared dependencies into a
| separate package so that there would be no additional coordination
| needed from then on between these two development efforts (except when
| dealing with this new package).
|
|
| On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
|

On Thu, 2013-03-14 at 16:44 +0000, Simon Peyton-Jones wrote:
Yes I think that'd be a great plan. It's bizarre that GHC depends on *all* of Cabal, but only uses a tiny part of it (more or less the Package data type I think).
The sensible way to split it (I think) would be like this: cabal-lib: Distribution.* -- containing definitions of types and parsers & pretty printers -- including the InstalledPackageInfo cabal-build-simple Distribution.Simple.* -- the build system for "Simple" packages cabal -- the program, what is currently called cabal-install And then the ghc package would only depend on the cabal-lib package. But it's that package that is going to use bytestring, text, parsec etc, for its type definitions and parser. The InstalledPackageInfo and its parser is what ghc and ghc-pkg primarily use (though there's the opportunity to share code for handling package indexes) and that type and that parser are also going to end up using text and parsec etc. It'd be possible to split things out further and have InstalledPackageInfo and the types it uses and a special parser just for that with fewer dependencies, but I'm not sure that's really worth it and it would duplicate things (the types and/or parsers shared by InstalledPackageInfo and the source package description). So all in all, the split I suggest above makes sense for its own reasons but it wouldn't help ghc here, and a further split just to help ghc would be rather annoying. Duncan
| -----Original Message----- | From: cabal-devel-bounces@haskell.org [mailto:cabal-devel-bounces@haskell.org] | On Behalf Of Administrator | Sent: 14 March 2013 15:23 | To: Duncan Coutts | Cc: Lentczner; cabal-devel; Haskell Libraries; ghc-devs@haskell.org | Subject: Re: Advance notice that I'd like to make Cabal depend on parsec | | This GHC dependency on Cabal is putting a rather troubling constraint | in Cabal's evolution, which in my opinion is a serious problem. When I | first took a look at the dependencies between GHC and Cabal I found it | a bit strange that GHC would depend on Cabal as I would expect GHC to | be as low in the dependency tree as possible to avoid exactly these | kinds of problems. | | These GHC dependencies on Cabal are in fact small (see | http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png | for a summary) and with a little bit of refactoring it would be | possible to split these dependencies into a very small shared package | with minimal or no further dependencies. This would liberate Cabal to | make the necessary refactoring. | | IMHO, the addition of these new dependencies to Cabal should go | together with splitting the GHC-Cabal shared dependencies into a | separate package so that there would be no additional coordination | needed from then on between these two development efforts (except when | dealing with this new package). | | | On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts |
wrote: | > On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote: | >> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts | > > wrote: | >> | >> > Hi folks, | >> > | >> > I want to give you advance notice that I would like to make Cabal depend | >> > on parsec. The implication is that GHC would therefore depend on parsec | >> > and thus it would become a core package, rather than just a HP package. | >> > So this would affect both GHC and the HP, though I hope not too much. | >> | >> | >> +1 from me, although the amount of potential knock-on work might be | >> discouraging. The current cabal-install bootstrap process (which is | >> currently pretty easy and is necessary at times) will get a bunch more deps | >> as a result of this change, no? | > | > Yes it will, but given that we do have a script it's not too bad I | > think. And overall I think its worth it to have the better error | > messages, performance and memory use. Do you have any idea how slow it | > is to parse all the .cabal files on hackage, and how much memory that | > takes? You'd be horrified :-) | > | > Duncan | > | > | > _______________________________________________ | > cabal-devel mailing list | > cabal-devel@haskell.org | > http://www.haskell.org/mailman/listinfo/cabal-devel | | _______________________________________________ | cabal-devel mailing list | cabal-devel@haskell.org | http://www.haskell.org/mailman/listinfo/cabal-devel

* Duncan Coutts
The InstalledPackageInfo and its parser is what ghc and ghc-pkg primarily use (though there's the opportunity to share code for handling package indexes) and that type and that parser are also going to end up using text and parsec etc.
Correct me if I'm wrong, but isn't it just a strange coincidence that InstalledPackageInfo is serialised in the format similar to .cabal format? InstalledPackageInfos aren't supposed to be edited by hand and do not need good error reporting. They can be serialized using any serialization library. (Then again, "any serialization library" like aeson would probably bring more dependencies than you're considering...) Roman

On Thu, 2013-03-14 at 21:29 +0200, Roman Cheplyaka wrote:
* Duncan Coutts
[2013-03-14 17:12:14+0000] The InstalledPackageInfo and its parser is what ghc and ghc-pkg primarily use (though there's the opportunity to share code for handling package indexes) and that type and that parser are also going to end up using text and parsec etc.
Correct me if I'm wrong, but isn't it just a strange coincidence that InstalledPackageInfo is serialised in the format similar to .cabal format?
It's not a very strange coincidence. The type is not specific to ghc, it's defined in a compiler-neutral way by the original Cabal spec. So since both the source package and installed package info was defined in the Cabal spec, using the same kind of external syntax and sharing many of the same types, then they both ended up in the Cabal lib and share the same parsers & pretty printers.
InstalledPackageInfos aren't supposed to be edited by hand and do not need good error reporting. They can be serialized using any serialization library.
Right, it doesn't need good error reporting (though it's nice if it's fast, which it isn't currently). The main advantage of the current arrangement is that the source and installed package descriptions get to share the same types and parser/pretty printer. I think there's a slightly more general point here though. Why is it that we don't have any good parser in the core packages? It's not just Cabal that needs to parse things. We have two useless parsers in the base package, ReadS and ReadP. Haskell is famous for its parser combinators and yet our core infrastructure is stuck with only useless ones! Duncan

On Thu, Mar 14, 2013 at 7:53 AM, Duncan Coutts wrote: Hi folks, I want to give you advance notice that I would like to make Cabal depend
on parsec. The implication is that GHC would therefore depend on parsec
and thus it would become a core package, rather than just a HP package.
So this would affect both GHC and the HP, though I hope not too much. The rationale is that Cabal needs to parse things, like .cabal files and
currently we do not have a decent parser in the core libraries. By
decent I mean one that can produce error messages with source locations
and that doesn't have unpredictable memory use. The only parser in the
core libraries at the moment is Text.ParserCombinators.ReadP from the
base package and that fails my "decent" criteria on both counts. Its
idea of an error message is (), and on some largish .cabal files we take
100s of MB to parse (I realise that the ReadP in the base package is a
cutdown version so I don't mean to malign all ReadP-style libs out
there). Partly due to the performance problem, the terrible .cabal file error
messages, and partly because Doaitse Swierstra keeps asking me if .cabal
files have a grammar, I've been writing a new .cabal parser. It uses an
alex lexer and a parsec parser. It's fast and the error messages are
pretty good. I have reverse engineered a grammar that closely matches
the existing parser and .cabal files in the wild, though I'm not sure
Doaitse will be satisfied with the approach I've taken to handling
layout. Why did I choose parsec? Practicality dictates that I can only use
things in the core libraries, and the nearest thing we have to that is
the parser lib that is in the HP. I tried to use happy but I could not
construct a grammar/lexer combo to handle the layout (also, happy is not
exactly known for its great error messages). Failed attempt aside for a moment, I think you should reconsider happy. Can
you learn how to do layout from reading the GHC source? The happy
documentation that explains how to attach a monad (you could use it to
communicate between alex and happy for layout info) is a bit misleading but
I have examples I can share with you. I haven't specifically tackled the
layout problem but I could try to make a parser if it would help.
One major benefit of using happy is that the productions of the grammar can
be analyzed for shift/shift and shift/reduce conflicts. The equivalent
analysis doesn't appear to be possible in parsec. In theory, applicative
parsers should allow for this but my understanding is that parsec does not
have this feature for its applicative subset.
Other benefits are: a) GHC can certainly use parers generated by it, b) the
generated code uses common dependencies, c) it's fast, d) it's expressive.
What is it about happy parser errors that you don't like? Do you know
examples where parsec does a better job?
I have an alex + happy parser for a tiny functional language that I can
share with you if you'd like to give it another go. It doesn't support
layout at the moment, but I think I could add that.
Jason

On Thu, 2013-03-14 at 09:39 -0700, Jason Dagit wrote:
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Failed attempt aside for a moment, I think you should reconsider happy. Can you learn how to do layout from reading the GHC source?
Yes I looked at it, though Haskell's layout is a bit different.
The happy documentation that explains how to attach a monad (you could use it to communicate between alex and happy for layout info) is a bit misleading but I have examples I can share with you.
Yes, that's what I was doing. I've used happy with monadic lexers with feedback between the lexer and parser before, e.g. when I wrote the C parser now used in language-c.
I haven't specifically tackled the layout problem but I could try to make a parser if it would help.
One major benefit of using happy is that the productions of the grammar can be analyzed for shift/shift and shift/reduce conflicts.
Right, I know and that's great. For example there's no way I could have extended the C89 grammar I started with to cover C99 and GNU C extensions without the aid of that analysis. In this case I could not for the life of me construct a grammar that didn't have conflicts. Now it's plausible that now that I have worked out a grammar using parsec that I could have another go with happy and make it work, though I'd have to do the layout rather differently from how I do it with parsec. I was so pleased to finally have something work, I didn't feel like going back and trying it with happy again. I'd be happy to show you the code I've got with parsec and you can have a go with happy.
The equivalent analysis doesn't appear to be possible in parsec. In theory, applicative parsers should allow for this but my understanding is that parsec does not have this feature for its applicative subset.
Right, it doesn't.
Other benefits are: a) GHC can certainly use parers generated by it, b) the generated code uses common dependencies, c) it's fast, d) it's expressive.
Yes, I started with happy for all those reasons. The speed isn't a problem here. I'm using a fast lexer using alex and profiling indicates that still almost all the time is spent in the lexer and very little in the parser. (And that's after I submitted a patch to alex which gets us a 30% perf improvement.) About dependencies. So if we got it working with happy, there is still the issue that we need to parse the individual fields. The way the .cabal (and other files like ghc-pkg input files) work is that we parse the outline and then use individual parsers on the fields. For the latter we use a type class with a parser and pretty printer. That approach using a type class more or less requires that we use a parser combinator approach, rather than a monolithic happy style parser. And it's actually the field parsers that are a large part of the problem: they give us no error messages and their performance is atrocious (that's where we get the massive memory blowups). I think happy just isn't suitable there, so I'd want to use parsec (or any other decent combinator lib) for that part anyway.
What is it about happy parser errors that you don't like? Do you know examples where parsec does a better job?
Happy doesn't really give parser errors at all as such. It tells you where it failed and you can poke at the token stream and do what you like. It doesn't tell you what production you're in, what set of tokens it was expecting, nothing. Parsec tells us what tokens it was expecting and it tells us what production it was in and it has code to take that info and generate reasonable error messages from it (which I've extended to include the line in question and a visual position indicator). The reason ghc's parser error messages are so bad is exactly because happy doesn't really give us anything to work with. See frown for an example of how we can do better, while still using an LALR(1) approach. Duncan

On 03/14/2013 03:53 PM, Duncan Coutts wrote:
Hi folks,
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Just thinking out loud here, but what about ditching the current format for something that's simpler to parse/generate? Like, say, JSON? Regards,

On Thu, Mar 14, 2013 at 11:01 AM, Bardur Arantsson
On 03/14/2013 03:53 PM, Duncan Coutts wrote:
Hi folks,
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Just thinking out loud here, but what about ditching the current format for something that's simpler to parse/generate? Like, say, JSON?
I thought I heard someone say that most existing cabal files can be converted to valid yaml by adding one token at the start? If the change was that simple it might be doable. I think the trick is that we'd need to expose this by only treating the file as yaml if the minimum cabal version is >= 1.17 (or so). In general these sorts of format changes are painful for users and I sense that now might be a bad time to change it (user morale is already a bit low with complaints of "cabal hell", let's not exacerbate that by breaking existing .cabal files). Jason

On Thu, 2013-03-14 at 11:15 -0700, Jason Dagit wrote:
On Thu, Mar 14, 2013 at 11:01 AM, Bardur Arantsson
wrote: On 03/14/2013 03:53 PM, Duncan Coutts wrote:
Hi folks,
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Just thinking out loud here, but what about ditching the current format for something that's simpler to parse/generate? Like, say, JSON?
Of course .cabal files are mainly written by humans, not machines, so we should optimise for them. The grammar I've got now really isn't that bad. In fact if we wanted to simplify it we'd rip out the bits that are designed to make it easier to generate by programs: we'd eliminate the explicit {} syntax and just use layout. Allowing either is what makes the grammar more complex. But as I say, I'm satisfied that the grammar is ok.
I thought I heard someone say that most existing cabal files can be converted to valid yaml by adding one token at the start? If the change was that simple it might be doable. I think the trick is that we'd need to expose this by only treating the file as yaml if the minimum cabal version is >= 1.17 (or so).
I know people have compared it to yaml and suggested we just use yaml, but I don't think it's that close syntactically. I did look into this when I started and I think there are too many differences to make it practical to switch to yaml (or a subset).
In general these sorts of format changes are painful for users and I sense that now might be a bad time to change it (user morale is already a bit low with complaints of "cabal hell", let's not exacerbate that by breaking existing .cabal files).
Right. I'm satisfied the format is basically ok, we don't need any breaking changes. -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On 03/14/2013 11:01 PM, Duncan Coutts wrote:
On Thu, 2013-03-14 at 11:15 -0700, Jason Dagit wrote:
On Thu, Mar 14, 2013 at 11:01 AM, Bardur Arantsson
wrote: On 03/14/2013 03:53 PM, Duncan Coutts wrote:
Hi folks,
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Just thinking out loud here, but what about ditching the current format for something that's simpler to parse/generate? Like, say, JSON?
Of course .cabal files are mainly written by humans, not machines, so we should optimise for them.
I though we were mostly talking about InstalledPackageInfo. That could be in $EASILY_PARSEABLE_FORMAT without really breaking anything, right? Another option if GHC really also needs to parse .cabal files: - Introduce a format for Cabal files that's trivial to hand-code a recursive descent parser for. - Add a command in Cabal to generate that format from a .cabal file. - Have "cabal sdist" automatically generate that file and put it into the uploaded archive. Regards,

On Fri, 2013-03-15 at 05:19 +0100, Bardur Arantsson wrote:
On 03/14/2013 11:01 PM, Duncan Coutts wrote:
On Thu, 2013-03-14 at 11:15 -0700, Jason Dagit wrote:
On Thu, Mar 14, 2013 at 11:01 AM, Bardur Arantsson
wrote: On 03/14/2013 03:53 PM, Duncan Coutts wrote:
Hi folks,
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Just thinking out loud here, but what about ditching the current format for something that's simpler to parse/generate? Like, say, JSON?
Of course .cabal files are mainly written by humans, not machines, so we should optimise for them.
I though we were mostly talking about InstalledPackageInfo. That could be in $EASILY_PARSEABLE_FORMAT without really breaking anything, right?
In principle it could be any format. But it is a format specified in the Cabal spec, and shared between all the Haskell implementations. Unless there's a compelling reason to change all that, I'd rather not.
Another option if GHC really also needs to parse .cabal files:
That's ok, it doesn't. GHC use Cabal when building ghc, but at runtime it's just using the InstalledPackageInfo type, parser (and perhaps some index utils). -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On 03/15/2013 04:33 PM, Duncan Coutts wrote:
On Fri, 2013-03-15 at 05:19 +0100, Bardur Arantsson wrote:
On 03/14/2013 11:01 PM, Duncan Coutts wrote:
On Thu, 2013-03-14 at 11:15 -0700, Jason Dagit wrote:
On Thu, Mar 14, 2013 at 11:01 AM, Bardur Arantsson
wrote: On 03/14/2013 03:53 PM, Duncan Coutts wrote:
Hi folks,
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Just thinking out loud here, but what about ditching the current format for something that's simpler to parse/generate? Like, say, JSON?
Of course .cabal files are mainly written by humans, not machines, so we should optimise for them.
I though we were mostly talking about InstalledPackageInfo. That could be in $EASILY_PARSEABLE_FORMAT without really breaking anything, right?
In principle it could be any format. But it is a format specified in the Cabal spec, and shared between all the Haskell implementations. Unless there's a compelling reason to change all that, I'd rather not.
Not having GHC core depend on parsec(*) sounds like a compelling reason to me...? (*) And the potential ensuing Cabal hell when a package depends on anything in GHC.*.

On 14 March 2013 22:53, Duncan Coutts
I've been doing regression testing against hackage and I'm satisfied that the new parser matches close enough. I've uncovered all kinds of horrors with .cabal files in the wild relying on quirks of the old parser. I've made adjustments for most of them but I will be breaking a half dozen old packages
When you say you've "made adjustments for" dodgy .cabal files in the wild, do you mean that you'll send those maintainers patches that make their cabal files less dodgy, or do you mean you've added hacks to your parser to reproduce the quirky behaviour? Conrad.

On Fri, 2013-03-15 at 12:37 +0800, Conrad Parker wrote:
On 14 March 2013 22:53, Duncan Coutts
wrote: I've been doing regression testing against hackage and I'm satisfied that the new parser matches close enough. I've uncovered all kinds of horrors with .cabal files in the wild relying on quirks of the old parser. I've made adjustments for most of them but I will be breaking a half dozen old packages
When you say you've "made adjustments for" dodgy .cabal files in the wild, do you mean that you'll send those maintainers patches that make their cabal files less dodgy, or do you mean you've added hacks to your parser to reproduce the quirky behaviour?
The latter, but the egregiousness of the hacks is actually not too bad in the end. I don't find it revolting. For the worst examples I didn't make adjustments and those ones will break. I think I've made a reasonable judgement about the where to draw the line between the two. I can look into generating warnings in those cases (which is probably better than me emailing them). Duncan

Duncan Coutts wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
[..]
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Reuse is good, but the implication I'm worried about is this: Can I upgrade the parsec package installed on my system by doing a user install from hackage ? Without an implementation of more flexible package installations (multiple versions installed simultaneously), any dependency of GHC has its version number essentially set into stone. From this point of view, this proposal is not about making Cabal depend on parsec , but about fixing the canonical version of parsec . Best regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

* Heinrich Apfelmus
Duncan Coutts wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
[..]
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages).
Reuse is good, but the implication I'm worried about is this: Can I upgrade the parsec package installed on my system by doing a user install from hackage ? Without an implementation of more flexible package installations (multiple versions installed simultaneously), any dependency of GHC has its version number essentially set into stone.
We've had that working for a long time. Right now I even have multiple installed versions of Cabal-the-library itself. It's not that Parsec would be automatically linked into each executable. It's just that ghc-the-program would have Parsec linked into it. Roman

On 14 Mar 2013, at 14:53, Duncan Coutts wrote:
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP.
I fully agree that a real parser is needed for Cabal files. I implemented one myself, many years ago, using the polyparse library, and using a hand-written lexer. Feel free to reuse it (attached, together with a sample program) if you like, although I expect it has bit-rotted a little over time. Regards, Malcolm

On Fri, 2013-03-15 at 12:57 +0000, Malcolm Wallace wrote:
On 14 Mar 2013, at 14:53, Duncan Coutts wrote:
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP.
I fully agree that a real parser is needed for Cabal files. I implemented one myself, many years ago, using the polyparse library, and using a hand-written lexer. Feel free to reuse it (attached, together with a sample program) if you like, although I expect it has bit-rotted a little over time.
Thanks Malcolm. I should point out that I would also be perfectly happy to use polyparse. The practical constraint is that Cabal can only depend on other Core libs. My assumption was that moving parsec from HP to core was easier than adding polyparse into core. But if someone wanted to suggest ripping ReadP out of base and replacing it with polyparse, I would certainly not complain. Duncan

I'd love to have a proper parser and source-location-aware AST for sake of
editor/IDE tools, so +1 from me. If you don't end up doing this after all,
I'd still like to see your parser in a separate package, although I
understand if you don't feel like maintaining two parsers especially given
the tedious process for verifying they work similarly. I guess it could
still be useful in the same way we find haskell-src-exts useful despite
some incompatibilities with GHC.
On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts wrote: Hi folks, I want to give you advance notice that I would like to make Cabal depend
on parsec. The implication is that GHC would therefore depend on parsec
and thus it would become a core package, rather than just a HP package.
So this would affect both GHC and the HP, though I hope not too much. The rationale is that Cabal needs to parse things, like .cabal files and
currently we do not have a decent parser in the core libraries. By
decent I mean one that can produce error messages with source locations
and that doesn't have unpredictable memory use. The only parser in the
core libraries at the moment is Text.ParserCombinators.ReadP from the
base package and that fails my "decent" criteria on both counts. Its
idea of an error message is (), and on some largish .cabal files we take
100s of MB to parse (I realise that the ReadP in the base package is a
cutdown version so I don't mean to malign all ReadP-style libs out
there). Partly due to the performance problem, the terrible .cabal file error
messages, and partly because Doaitse Swierstra keeps asking me if .cabal
files have a grammar, I've been writing a new .cabal parser. It uses an
alex lexer and a parsec parser. It's fast and the error messages are
pretty good. I have reverse engineered a grammar that closely matches
the existing parser and .cabal files in the wild, though I'm not sure
Doaitse will be satisfied with the approach I've taken to handling
layout. Why did I choose parsec? Practicality dictates that I can only use
things in the core libraries, and the nearest thing we have to that is
the parser lib that is in the HP. I tried to use happy but I could not
construct a grammar/lexer combo to handle the layout (also, happy is not
exactly known for its great error messages). I've been doing regression testing against hackage and I'm satisfied
that the new parser matches close enough. I've uncovered all kinds of
horrors with .cabal files in the wild relying on quirks of the old
parser. I've made adjustments for most of them but I will be breaking a
half dozen old packages (most of those don't actually build correctly
because though their syntax errors are not picked up by the parser, they
do cause failure eventually). So far I've just done the outline parser, not the individual field
parsers. I'll be doing those next and then integrate. So this change is
still a bit of a ways off, but I thought it'd be useful to warn people
now. Duncan _______________________________________________
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel
participants (12)
-
Administrator
-
Bardur Arantsson
-
Conrad Parker
-
dag.odenhall@gmail.com
-
Duncan Coutts
-
Duncan Coutts
-
Gregory Collins
-
Heinrich Apfelmus
-
Jason Dagit
-
Malcolm Wallace
-
Roman Cheplyaka
-
Simon Peyton-Jones