Advance notice that I'd like to make Cabal depend on parsec

Hi folks, I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much. The rationale is that Cabal needs to parse things, like .cabal files and currently we do not have a decent parser in the core libraries. By decent I mean one that can produce error messages with source locations and that doesn't have unpredictable memory use. The only parser in the core libraries at the moment is Text.ParserCombinators.ReadP from the base package and that fails my "decent" criteria on both counts. Its idea of an error message is (), and on some largish .cabal files we take 100s of MB to parse (I realise that the ReadP in the base package is a cutdown version so I don't mean to malign all ReadP-style libs out there). Partly due to the performance problem, the terrible .cabal file error messages, and partly because Doaitse Swierstra keeps asking me if .cabal files have a grammar, I've been writing a new .cabal parser. It uses an alex lexer and a parsec parser. It's fast and the error messages are pretty good. I have reverse engineered a grammar that closely matches the existing parser and .cabal files in the wild, though I'm not sure Doaitse will be satisfied with the approach I've taken to handling layout. Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP. I tried to use happy but I could not construct a grammar/lexer combo to handle the layout (also, happy is not exactly known for its great error messages). I've been doing regression testing against hackage and I'm satisfied that the new parser matches close enough. I've uncovered all kinds of horrors with .cabal files in the wild relying on quirks of the old parser. I've made adjustments for most of them but I will be breaking a half dozen old packages (most of those don't actually build correctly because though their syntax errors are not picked up by the parser, they do cause failure eventually). So far I've just done the outline parser, not the individual field parsers. I'll be doing those next and then integrate. So this change is still a bit of a ways off, but I thought it'd be useful to warn people now. Duncan

On Thu, 2013-03-14 at 14:53 +0000, Duncan Coutts wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
It's already been pointed out to me that this also implies the following dependencies: text, deepseq, mtl, transformers deepseq is a core package already I think, though ghc doesn't actually depend on it currently. I should also say that I want to make Cabal depend on bytestring and text too. -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts wrote: Hi folks, I want to give you advance notice that I would like to make Cabal depend
on parsec. The implication is that GHC would therefore depend on parsec
and thus it would become a core package, rather than just a HP package.
So this would affect both GHC and the HP, though I hope not too much. +1 from me, although the amount of potential knock-on work might be
discouraging. The current cabal-install bootstrap process (which is
currently pretty easy and is necessary at times) will get a bunch more deps
as a result of this change, no?
--
Gregory Collins

On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts
wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
+1 from me, although the amount of potential knock-on work might be discouraging. The current cabal-install bootstrap process (which is currently pretty easy and is necessary at times) will get a bunch more deps as a result of this change, no?
Yes it will, but given that we do have a script it's not too bad I think. And overall I think its worth it to have the better error messages, performance and memory use. Do you have any idea how slow it is to parse all the .cabal files on hackage, and how much memory that takes? You'd be horrified :-) Duncan

This GHC dependency on Cabal is putting a rather troubling constraint
in Cabal's evolution, which in my opinion is a serious problem. When I
first took a look at the dependencies between GHC and Cabal I found it
a bit strange that GHC would depend on Cabal as I would expect GHC to
be as low in the dependency tree as possible to avoid exactly these
kinds of problems.
These GHC dependencies on Cabal are in fact small (see
http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
for a summary) and with a little bit of refactoring it would be
possible to split these dependencies into a very small shared package
with minimal or no further dependencies. This would liberate Cabal to
make the necessary refactoring.
IMHO, the addition of these new dependencies to Cabal should go
together with splitting the GHC-Cabal shared dependencies into a
separate package so that there would be no additional coordination
needed from then on between these two development efforts (except when
dealing with this new package).
On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote:
On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts
wrote:
Hi folks,
I want to give you advance notice that I would like to make Cabal depend on parsec. The implication is that GHC would therefore depend on parsec and thus it would become a core package, rather than just a HP package. So this would affect both GHC and the HP, though I hope not too much.
+1 from me, although the amount of potential knock-on work might be discouraging. The current cabal-install bootstrap process (which is currently pretty easy and is necessary at times) will get a bunch more deps as a result of this change, no?
Yes it will, but given that we do have a script it's not too bad I think. And overall I think its worth it to have the better error messages, performance and memory use. Do you have any idea how slow it is to parse all the .cabal files on hackage, and how much memory that takes? You'd be horrified :-)
Duncan
_______________________________________________ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel

On Thu, 2013-03-14 at 12:22 -0300, Administrator wrote:
This GHC dependency on Cabal is putting a rather troubling constraint in Cabal's evolution, which in my opinion is a serious problem. When I first took a look at the dependencies between GHC and Cabal I found it a bit strange that GHC would depend on Cabal as I would expect GHC to be as low in the dependency tree as possible to avoid exactly these kinds of problems.
The problem is that a compiler is a rather sophisticated application and so though you'd like it to have minimal deps, it needs to do so much stuff that it ends up needing lots of deps to support its features. Things would be easier if that were not the case, and it's made harder by the fact that ghc is not just a program, but it's exposed as a library, which exposes all of its dependencies.
These GHC dependencies on Cabal are in fact small (see http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png for a summary) and with a little bit of refactoring it would be possible to split these dependencies into a very small shared package with minimal or no further dependencies. This would liberate Cabal to make the necessary refactoring.
Except that the bits of Cabal that ghc needs are exactly the bits that will now need parsec, text etc. The shared part would be the part that defines the InstalledPackageInfo and the parser for that. Also, though the ghc library has only relatively small dependencies on Cabal, the ghc build process uses Cabal extensively, and currently the system is that libraries that ghc needs to build get included as core libraries and shipped with ghc. That itself could change but it's also more work.
IMHO, the addition of these new dependencies to Cabal should go together with splitting the GHC-Cabal shared dependencies into a separate package so that there would be no additional coordination needed from then on between these two development efforts (except when dealing with this new package).
So I would consider this if I thought it'd make a difference. In particular at some point we'll want to split the Cabal lib into the bit that just defines types and parsers etc, and the part that is a build system. But even that wouldn't save us any dependencies in this situation. Duncan

Yes I think that'd be a great plan. It's bizarre that GHC depends on *all* of Cabal, but only uses a tiny part of it (more or less the Package data type I think).
Simon
| -----Original Message-----
| From: cabal-devel-bounces@haskell.org [mailto:cabal-devel-bounces@haskell.org]
| On Behalf Of Administrator
| Sent: 14 March 2013 15:23
| To: Duncan Coutts
| Cc: Lentczner; cabal-devel; Haskell Libraries; ghc-devs@haskell.org
| Subject: Re: Advance notice that I'd like to make Cabal depend on parsec
|
| This GHC dependency on Cabal is putting a rather troubling constraint
| in Cabal's evolution, which in my opinion is a serious problem. When I
| first took a look at the dependencies between GHC and Cabal I found it
| a bit strange that GHC would depend on Cabal as I would expect GHC to
| be as low in the dependency tree as possible to avoid exactly these
| kinds of problems.
|
| These GHC dependencies on Cabal are in fact small (see
| http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png
| for a summary) and with a little bit of refactoring it would be
| possible to split these dependencies into a very small shared package
| with minimal or no further dependencies. This would liberate Cabal to
| make the necessary refactoring.
|
| IMHO, the addition of these new dependencies to Cabal should go
| together with splitting the GHC-Cabal shared dependencies into a
| separate package so that there would be no additional coordination
| needed from then on between these two development efforts (except when
| dealing with this new package).
|
|
| On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts
|

On Thu, 2013-03-14 at 16:44 +0000, Simon Peyton-Jones wrote:
Yes I think that'd be a great plan. It's bizarre that GHC depends on *all* of Cabal, but only uses a tiny part of it (more or less the Package data type I think).
The sensible way to split it (I think) would be like this: cabal-lib: Distribution.* -- containing definitions of types and parsers & pretty printers -- including the InstalledPackageInfo cabal-build-simple Distribution.Simple.* -- the build system for "Simple" packages cabal -- the program, what is currently called cabal-install And then the ghc package would only depend on the cabal-lib package. But it's that package that is going to use bytestring, text, parsec etc, for its type definitions and parser. The InstalledPackageInfo and its parser is what ghc and ghc-pkg primarily use (though there's the opportunity to share code for handling package indexes) and that type and that parser are also going to end up using text and parsec etc. It'd be possible to split things out further and have InstalledPackageInfo and the types it uses and a special parser just for that with fewer dependencies, but I'm not sure that's really worth it and it would duplicate things (the types and/or parsers shared by InstalledPackageInfo and the source package description). So all in all, the split I suggest above makes sense for its own reasons but it wouldn't help ghc here, and a further split just to help ghc would be rather annoying. Duncan
| -----Original Message----- | From: cabal-devel-bounces@haskell.org [mailto:cabal-devel-bounces@haskell.org] | On Behalf Of Administrator | Sent: 14 March 2013 15:23 | To: Duncan Coutts | Cc: Lentczner; cabal-devel; Haskell Libraries; ghc-devs@haskell.org | Subject: Re: Advance notice that I'd like to make Cabal depend on parsec | | This GHC dependency on Cabal is putting a rather troubling constraint | in Cabal's evolution, which in my opinion is a serious problem. When I | first took a look at the dependencies between GHC and Cabal I found it | a bit strange that GHC would depend on Cabal as I would expect GHC to | be as low in the dependency tree as possible to avoid exactly these | kinds of problems. | | These GHC dependencies on Cabal are in fact small (see | http://hackage.haskell.org/trac/ghc/attachment/ticket/7740/ghc-2.png | for a summary) and with a little bit of refactoring it would be | possible to split these dependencies into a very small shared package | with minimal or no further dependencies. This would liberate Cabal to | make the necessary refactoring. | | IMHO, the addition of these new dependencies to Cabal should go | together with splitting the GHC-Cabal shared dependencies into a | separate package so that there would be no additional coordination | needed from then on between these two development efforts (except when | dealing with this new package). | | | On Thu, Mar 14, 2013 at 12:12 PM, Duncan Coutts |
wrote: | > On Thu, 2013-03-14 at 16:06 +0100, Gregory Collins wrote: | >> On Thu, Mar 14, 2013 at 3:53 PM, Duncan Coutts | > > wrote: | >> | >> > Hi folks, | >> > | >> > I want to give you advance notice that I would like to make Cabal depend | >> > on parsec. The implication is that GHC would therefore depend on parsec | >> > and thus it would become a core package, rather than just a HP package. | >> > So this would affect both GHC and the HP, though I hope not too much. | >> | >> | >> +1 from me, although the amount of potential knock-on work might be | >> discouraging. The current cabal-install bootstrap process (which is | >> currently pretty easy and is necessary at times) will get a bunch more deps | >> as a result of this change, no? | > | > Yes it will, but given that we do have a script it's not too bad I | > think. And overall I think its worth it to have the better error | > messages, performance and memory use. Do you have any idea how slow it | > is to parse all the .cabal files on hackage, and how much memory that | > takes? You'd be horrified :-) | > | > Duncan | > | > | > _______________________________________________ | > cabal-devel mailing list | > cabal-devel@haskell.org | > http://www.haskell.org/mailman/listinfo/cabal-devel | | _______________________________________________ | cabal-devel mailing list | cabal-devel@haskell.org | http://www.haskell.org/mailman/listinfo/cabal-devel

* Duncan Coutts
The InstalledPackageInfo and its parser is what ghc and ghc-pkg primarily use (though there's the opportunity to share code for handling package indexes) and that type and that parser are also going to end up using text and parsec etc.
Correct me if I'm wrong, but isn't it just a strange coincidence that InstalledPackageInfo is serialised in the format similar to .cabal format? InstalledPackageInfos aren't supposed to be edited by hand and do not need good error reporting. They can be serialized using any serialization library. (Then again, "any serialization library" like aeson would probably bring more dependencies than you're considering...) Roman

On Thu, 2013-03-14 at 21:29 +0200, Roman Cheplyaka wrote:
* Duncan Coutts
[2013-03-14 17:12:14+0000] The InstalledPackageInfo and its parser is what ghc and ghc-pkg primarily use (though there's the opportunity to share code for handling package indexes) and that type and that parser are also going to end up using text and parsec etc.
Correct me if I'm wrong, but isn't it just a strange coincidence that InstalledPackageInfo is serialised in the format similar to .cabal format?
It's not a very strange coincidence. The type is not specific to ghc, it's defined in a compiler-neutral way by the original Cabal spec. So since both the source package and installed package info was defined in the Cabal spec, using the same kind of external syntax and sharing many of the same types, then they both ended up in the Cabal lib and share the same parsers & pretty printers.
InstalledPackageInfos aren't supposed to be edited by hand and do not need good error reporting. They can be serialized using any serialization library.
Right, it doesn't need good error reporting (though it's nice if it's fast, which it isn't currently). The main advantage of the current arrangement is that the source and installed package descriptions get to share the same types and parser/pretty printer. I think there's a slightly more general point here though. Why is it that we don't have any good parser in the core packages? It's not just Cabal that needs to parse things. We have two useless parsers in the base package, ReadS and ReadP. Haskell is famous for its parser combinators and yet our core infrastructure is stuck with only useless ones! Duncan

On 14 Mar 2013, at 14:53, Duncan Coutts wrote:
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP.
I fully agree that a real parser is needed for Cabal files. I implemented one myself, many years ago, using the polyparse library, and using a hand-written lexer. Feel free to reuse it (attached, together with a sample program) if you like, although I expect it has bit-rotted a little over time. Regards, Malcolm

On Fri, 2013-03-15 at 12:57 +0000, Malcolm Wallace wrote:
On 14 Mar 2013, at 14:53, Duncan Coutts wrote:
Why did I choose parsec? Practicality dictates that I can only use things in the core libraries, and the nearest thing we have to that is the parser lib that is in the HP.
I fully agree that a real parser is needed for Cabal files. I implemented one myself, many years ago, using the polyparse library, and using a hand-written lexer. Feel free to reuse it (attached, together with a sample program) if you like, although I expect it has bit-rotted a little over time.
Thanks Malcolm. I should point out that I would also be perfectly happy to use polyparse. The practical constraint is that Cabal can only depend on other Core libs. My assumption was that moving parsec from HP to core was easier than adding polyparse into core. But if someone wanted to suggest ripping ReadP out of base and replacing it with polyparse, I would certainly not complain. Duncan
participants (7)
-
Administrator
-
Duncan Coutts
-
Duncan Coutts
-
Gregory Collins
-
Malcolm Wallace
-
Roman Cheplyaka
-
Simon Peyton-Jones