aeson and dlist in HP 2013.4.0.0

older
2013.4.0.0 proposal - getting to...

Mark Lentczner

17 Nov 2013 17 Nov '13

8:20 p.m.

We seem to have several interlocking decisions to make about *aeson* and *dlist* in the platform: *Background:* *let me know if I got any of this wrong* - There is clear agreement to have *aeson* in the platform. - *aeson* 0.6.2.1 would require *dlist* & *blaze-builder* to be added. - There is a patch to *aeson*, already in head, that makes *aeson* use the builder in *bytestring*, and drop the dependency on *blaze-builder*. This is good, as a *bytestring* in GHC 7.8 will make *blaze-builder*obsolete. - *aeson* head uses a new Scientific type for arbitrary precision floating point. This will break users of the Number constructor of the Value type. However, it is believed that most users are probably using existing functions to parse to an from standard numeric types when needed, and those continue to work. Unclear how much impact this will have. Also, Scientific could be in it's own package (which would need to be added to the platform), or simply exposed from *aeson*. - *dlist* is stable, but could use some love - which was lovingly offered, and we could have a bump which adds some useful typeclass instances. - It was noted that *dlist* could be replaced by Endo, or explicit use of simple types. There didn't seem to be much support for the idea of altering *aeson* to do so. *Options for aeson:* 1. skip it in this release 2. include *aeson* 0.6.2.1 - requiring both *dlist* and *blaze-builder* 3. include *aeson* 0.6.2.x, a version with the patch that uses *bytestring*'s builder, and so require only *dlist* 4. include *aeson* 0.7.0.0 - requiring *dlist* and possibly *scientific* *Options for dlist, if required by the aeson choice, or because we now like it anyway:* 1. include *dlist* 0.5 2. include *dlist* 0.6, with new typeclass instances added *Options for scientific, if required by the aeson choice:* 1. include Scientific type in *aeson* 2. include a new *scientific* pacakge *Discussion:* I think the best option, if we are ready to embrace aeson, is jump in with both feet: aeson 0.7.0.0, & dlist 0.6. Leaving the issue of is scientific ready to be included as it's own package, or should it just be exported by aeson for now. Thoughts? — Mark

Attachments:

attachment.html (text/html — 2.9 KB)

Show replies by date

Sean Leather

18 Nov 18 Nov

8:04 a.m.

On Sun, Nov 17, 2013 at 10:20 PM, Mark Lentczner wrote:

...

Thoughts?

I'm fine with any option regarding dlist. Just to update people on my planned changes for v0.6, here is the ChangeLoghttps://github.com/spl/dlist/blob/master/ChangeLog.md : * Maintenance and development taken over by Sean Leather * Migrate repository from http://code.haskell.org/~dons/code/dlist/ to https://github.com/spl/dlist * Add `Eq`, `Ord`, `Read`, `Show`, `Alternative`, `Foldable`, `Traversable` instances * Deprecate functions in favor of their type class equivalents: `concat`, `map`, `foldr` * Deprecate `DL`, `unDL` and add `apply` ([#4]( https://github.com/spl/dlist/issues/4)) * Stop supporting `base < 2` * Update tests to run `cabal test` using parallel QuickCheck (`pqc`) * Add scripts for running `hpc` * Update documentation Other than minor documentation updates, all of these changes are implemented. I have no further plans for v0.6. Regards, Sean

Joachim Breitner

9:21 a.m.

Hi, Am Montag, den 18.11.2013, 10:04 +0200 schrieb Sean Leather:

...

* Maintenance and development taken over by Sean Leather * Migrate repository from http://code.haskell.org/~dons/code/dlist/ to https://github.com/spl/dlist * Add `Eq`, `Ord`, `Read`, `Show`, `Alternative`, `Foldable`, `Traversable` instances

Given that the point of dlist is to speed up code where lists are insufficient, would it make sense to only provide those instances that can be implemented without converting to and from lists? If there are instances that cannot be implemented idiomatically with dlists, maybe they should be left out, to signal the user that he will not get the benefits of DLists for these. In particular, it seems that the Show instance could be improved. Currently it says showsPrec p dl = showParen (p > 10) $ showString "fromList " . shows (toList dl) i.e. goes via a list. But this is constructing a ShowS (which is String -> String) value, i.e. another difference list. Shouldn’t it be possible to stay in the world of difference lists when implementing this? (Or maybe I’m overly worried, and I admit that I did not run benchmarks so far.) Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org

Sean Leather

3:15 p.m.

Hi Joachim, I did not want to derail the conversation about the pros and cons of dlist, so I started a separate thread about it on the haskell-platform list [1], but perhaps I should have included the libraries list [2]. Since a few people have mentioned concerns about dlist/list conversion, I will respond below. On Mon, Nov 18, 2013 at 11:21 AM, Joachim Breitner wrote:

...

Am Montag, den 18.11.2013, 10:04 +0200 schrieb Sean Leather:

...
* Maintenance and development taken over by Sean Leather * Migrate repository from http://code.haskell.org/~dons/code/dlist/ to https://github.com/spl/dlist * Add `Eq`, `Ord`, `Read`, `Show`, `Alternative`, `Foldable`, `Traversable` instances

Given that the point of dlist is to speed up code where lists are insufficient,

To be a bit more precise, it is not that lists are "insufficient," the problem is the `(++)` operator (a.k.a. append). To be even more precise, the problem is left-nested appends, e.g. the expression `(x ++ y) ++ z` may have a worse traversal time than `x ++ (y ++ z)`. Such an arrangement can (and probably will) result in multiple traversals of the left argument(s). would it make sense to only provide those instances that

...

can be implemented without converting to and from lists?

In my opinion, no. It makes sense to have as many reasonable instances as possible to make the library more attractive and usable. The fact that the instances convert to and from lists does not detract from their usefulness because the conversions are not necessarily inefficient (see next response). If there are instances that cannot be implemented idiomatically with

...

dlists, maybe they should be left out, to signal the user that he will not get the benefits of DLists for these.

I think it is an unproven myth that conversion between lists and dlists is always inefficient. Consider the conversion functions:

...

fromList :: [a] -> DList a fromList = DL . (++)

...

toList :: DList a -> [a] toList = ($[]) . unDL

Converting from a list is like prepending (++) to a list. This introduces a linear traversal of the argument (assuming a complete evaluation of the converted list). Converting to a list is like "finishing it off" with an empty list. The operation itself is trivial, but traversing the result would evaluate all the `(++)` from the previous `fromList`s. Fortunately, all of the `fromList`ed lists are left-arguments to `(++)`, so each will only be traversed once (which is the primary reason of dlists). Looking at the instances in the master branch, most of them use `toList`, which we have established is trivial. The only instances to use `fromList` are `Read` and `Traversable`. Each of these reuses the list instance and involves at least one list traversal, so the extra traversal implied by `fromList` means a constant factor increase in time. In particular, it seems that the Show instance could be improved.

...

Currently it says showsPrec p dl = showParen (p > 10) $ showString "fromList " . shows (toList dl) i.e. goes via a list. But this is constructing a ShowS (which is String -> String) value, i.e. another difference list. Shouldn’t it be possible to stay in the world of difference lists when implementing this?

Perhaps you might expect this:

...

showsPrec' :: Int -> DList Char -> ShowS showsPrec' p dl = showParen (p > 10) $ showString "fromList " . unDL dl

But that won't work, because we have a `Show a => DList a`, not a `DList Char`. The `Show` instance for lists is for `Show a => [a]`, not `[Char]`. See Bas' dstring library for `DList Char`. The underlying representation of dlists is `[a] -> [a]`, so what we might want is a function with the type `Show a => ([a] -> [a]) -> ShowS`. But we don't need or want to map `String` to `Show a => [a]` as one might think the higher-order function requires. What we do is finish off the function with `[]` (which is appended to the end of the resulting list). Then, we're left with something of type `Show a => [a]`, to which we can apply `showList`. Looking back at the instance, `toList` finishes the function and `shows` for `[a]` is equivalent to `showList` for `a`. (Or maybe I’m overly worried, and I admit that I did not run benchmarks

...

so far.)

I definitely think benchmarks [3] would help resolve these questions. And there might be cases where rewrite rules and fusion play a role. But until then, I remain unconvinced that the added instances are anything but helpful. Regards, Sean [1] http://projects.haskell.org/pipermail/haskell-platform/2013-November/002750.... [2] I assumed the discussion for a library being added (or not) to the Platform should happen on haskell-platform and not libraries. But perhaps libraries has more eyes and interest. The fact that many of these emails are being sent to both lists adds to my confusion about where discussion should take place. [3] https://github.com/spl/dlist/issues/3

John Lato

19 Nov 19 Nov

1:32 a.m.

Hi Sean, Since I've raised this issue before as well, I decided to write some tests. At this time, I've written/run criterion tests, and I have some hand-wavey theoretical arguments. The theoretical stuff came first (presented below) and isn't influenced by the test results, which I haven't attempted to analyze beyond the most superficial level. If anyone else really cares strongly, please feel free to develop this further. https://github.com/JohnLato/dlist-test/ http://htmlpreview.github.io/?https://github.com/JohnLato/dlist-test/blob/ma... I think more research is warranted, since there are a few anomalies I can't currently explain. On Mon, Nov 18, 2013 at 7:15 AM, Sean Leather wrote:

...

Hi Joachim,

On Mon, Nov 18, 2013 at 11:21 AM, Joachim Breitner wrote:

Am Montag, den 18.11.2013, 10:04 +0200 schrieb Sean Leather:

...
...
* Maintenance and development taken over by Sean Leather * Migrate repository from http://code.haskell.org/~dons/code/dlist/ to https://github.com/spl/dlist * Add `Eq`, `Ord`, `Read`, `Show`, `Alternative`, `Foldable`, `Traversable` instances

Given that the point of dlist is to speed up code where lists are insufficient,

To be a bit more precise, it is not that lists are "insufficient," the problem is the `(++)` operator (a.k.a. append). To be even more precise, the problem is left-nested appends, e.g. the expression `(x ++ y) ++ z` may have a worse traversal time than `x ++ (y ++ z)`. Such an arrangement can (and probably will) result in multiple traversals of the left argument(s).

would it make sense to only provide those instances that

...
can be implemented without converting to and from lists?

In my opinion, no. It makes sense to have as many reasonable instances as possible to make the library more attractive and usable. The fact that the instances convert to and from lists does not detract from their usefulness because the conversions are not necessarily inefficient (see next response).

If there are instances that cannot be implemented idiomatically with

...
dlists, maybe they should be left out, to signal the user that he will not get the benefits of DLists for these.

I think it is an unproven myth that conversion between lists and dlists is always inefficient. Consider the conversion functions:

...
fromList :: [a] -> DList a fromList = DL . (++)

...
toList :: DList a -> [a] toList = ($[]) . unDL

Converting from a list is like prepending (++) to a list. This introduces a linear traversal of the argument (assuming a complete evaluation of the converted list).

Converting to a list is like "finishing it off" with an empty list. The operation itself is trivial, but traversing the result would evaluate all the `(++)` from the previous `fromList`s. Fortunately, all of the `fromList`ed lists are left-arguments to `(++)`, so each will only be traversed once (which is the primary reason of dlists).

This overlooks the cost of evaluating the DList function itself, which can be significant. DLists are usually formed by snoc'ing/appending k elements/chunks, which means that the total DList is a composition of k separate functions. This structure must be traversed in order to evaluate the resulting list, which makes 'toList' have complexity O(k). If the DList was formed by repeated 'snoc' calls (maybe a common case, maybe not), k==n. calling 'toList' isn't so bad on its own, as typically it only happens once. My concern stems from situations such as using DList as a key in a map, for which many comparisons will be performed and toList will be called multiple times on some elements. Even considering this, I'm not particularly opposed to these instances, but I think users should be aware that they can lead to repeated non-trivial computations. John L.

Sean Leather

9:37 a.m.

Hi John, On Tue, Nov 19, 2013 at 3:32 AM, John Lato wrote:

...

Since I've raised this issue before as well, I decided to write some tests.

At this time, I've written/run criterion tests, and I have some hand-wavey theoretical arguments. The theoretical stuff came first (presented below) and isn't influenced by the test results, which I haven't attempted to analyze beyond the most superficial level. If anyone else really cares strongly, please feel free to develop this further.

https://github.com/JohnLato/dlist-test/

http://htmlpreview.github.io/?https://github.com/JohnLato/dlist-test/blob/ma...

Great! Thanks! I think more research is warranted, since there are a few anomalies I can't

...

currently explain.

Definitely. On Mon, Nov 18, 2013 at 7:15 AM, Sean Leather wrote:

...

...
Converting from a list is like prepending (++) to a list. This introduces a linear traversal of the argument (assuming a complete evaluation of the converted list).

Converting to a list is like "finishing it off" with an empty list. The operation itself is trivial, but traversing the result would evaluate all the `(++)` from the previous `fromList`s. Fortunately, all of the `fromList`ed lists are left-arguments to `(++)`, so each will only be traversed once (which is the primary reason of dlists).

This overlooks the cost of evaluating the DList function itself, which can be significant. DLists are usually formed by snoc'ing/appending k elements/chunks, which means that the total DList is a composition of k separate functions. This structure must be traversed in order to evaluate the resulting list, which makes 'toList' have complexity O(k). If the DList was formed by repeated 'snoc' calls (maybe a common case, maybe not), k==n.

True, and nicely put. calling 'toList' isn't so bad on its own, as typically it only happens

...

once. My concern stems from situations such as using DList as a key in a map, for which many comparisons will be performed and toList will be called multiple times on some elements.

Indeed. I would hope there would be sharing in those cases, but I'm not sure right now. Even considering this, I'm not particularly opposed to these instances, but

...

I think users should be aware that they can lead to repeated non-trivial computations.

Agreed. Regards, Sean

Don Stewart

26 Nov 26 Nov

1:30 p.m.

Indeed, avoiding expensive instances was part of the design On Nov 18, 2013 5:21 PM, "Joachim Breitner" wrote:

...

Hi,

Am Montag, den 18.11.2013, 10:04 +0200 schrieb Sean Leather:

...
* Maintenance and development taken over by Sean Leather * Migrate repository from http://code.haskell.org/~dons/code/dlist/ to https://github.com/spl/dlist * Add `Eq`, `Ord`, `Read`, `Show`, `Alternative`, `Foldable`, `Traversable` instances

Given that the point of dlist is to speed up code where lists are insufficient, would it make sense to only provide those instances that can be implemented without converting to and from lists?

If there are instances that cannot be implemented idiomatically with dlists, maybe they should be left out, to signal the user that he will not get the benefits of DLists for these.

In particular, it seems that the Show instance could be improved. Currently it says showsPrec p dl = showParen (p > 10) $ showString "fromList " . shows (toList dl) i.e. goes via a list. But this is constructing a ShowS (which is String -> String) value, i.e. another difference list. Shouldn’t it be possible to stay in the world of difference lists when implementing this?

(Or maybe I’m overly worried, and I admit that I did not run benchmarks so far.)

Greetings, Joachim

-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nomeata@debian.org

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Ivan Lazar Miljenovic

18 Nov 18 Nov

10:06 a.m.

On 18 November 2013 19:04, Sean Leather wrote:

...

On Sun, Nov 17, 2013 at 10:20 PM, Mark Lentczner wrote:

...
Thoughts?

I'm fine with any option regarding dlist.

Just to update people on my planned changes for v0.6, here is the ChangeLog:

* Maintenance and development taken over by Sean Leather * Migrate repository from http://code.haskell.org/~dons/code/dlist/ to https://github.com/spl/dlist * Add `Eq`, `Ord`, `Read`, `Show`, `Alternative`, `Foldable`, `Traversable` instances * Deprecate functions in favor of their type class equivalents: `concat`, `map`, `foldr` * Deprecate `DL`, `unDL` and add `apply`

The only possible issue I can think of with this from my own experience in dealing with my own code with abstract data types with no exposed constructors: it made some debugging issues difficult to diagnose because the equivalent of fromList and toList led to some subtle behaviours. As such, what about having an .Internals module that exposes these? (Admittedly in my case, the issue was more because of the behaviour of show and read, which isn't as big a deal with a datatype that doesn't change the ordering of list elements.)

...

([#4](https://github.com/spl/dlist/issues/4)) * Stop supporting `base < 2` * Update tests to run `cabal test` using parallel QuickCheck (`pqc`) * Add scripts for running `hpc` * Update documentation

Other than minor documentation updates, all of these changes are implemented. I have no further plans for v0.6.

Regards, Sean

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Sean Leather

3:37 p.m.

Hi Ivan, On Mon, Nov 18, 2013 at 12:06 PM, Ivan Lazar Miljenovic wrote:

...

On 18 November 2013 19:04, Sean Leather wrote:

...
* Deprecate `DL`, `unDL` and add `apply`

The only possible issue I can think of with this from my own experience in dealing with my own code with abstract data types with no exposed constructors: it made some debugging issues difficult to diagnose because the equivalent of fromList and toList led to some subtle behaviours. As such, what about having an .Internals module that exposes these?

I think the type DList should be abstract. Sometimes, there are occasions in which it is useful to have access to constructors. Examples include debugging (yours), implementing functions not included in the library (e.g. this comes up sometimes with Data.Map/Data.Set), and optimization. In the case of DList, the underlying type is so simple that exposing it does not add enough value to warrant a second module. My guess is that the most likely reason for using the constructor is to bypass `fromList`, but this is much more likely to lead to unexpected values [1] than anything useful (or optimized). Regards, Sean [1] https://github.com/spl/dlist/issues/4

Niklas Hambüchen

4:21 p.m.

On 18/11/13 15:37, Sean Leather wrote:

...

Examples include debugging (yours), implementing functions not included in the library (e.g. this comes up sometimes with Data.Map/Data.Set), and optimization.

...

In the case of DList, the underlying type is so simple that exposing it does not add enough value to warrant a second module.

These two points you give seem very contradicting.

...

From my experience, whatever it is, in the end somebody wants to give an instance to it.

I would therefore vote to expose internals of *all* data types exposed. Not doing so is assuming the ability to foretell everything a user will ever do with it, which is never true.

Peter Simons

4:40 p.m.

Niklas Hambüchen writes:

...

I would therefore vote to expose internals of *all* data types exposed. Not doing so is assuming the ability to foretell everything a user will ever do with it, which is never true.

I agree. I had plenty of trouble because libraries hid types and/or functions that I needed access to, but I never had any trouble because a library exposed a type that I didn't need access to. Take care, Peter

Gregory Collins

5:14 p.m.

On Mon, Nov 18, 2013 at 5:40 PM, Peter Simons wrote:

...

...but I never had any trouble because a library exposed a type that I didn't need access to.

I've definitely had the experience of pattern matching on somebody's constructor and then having my code break when a new record field was added. The convention I use, and which I have seen others use, is that datatypes are exported abstractly in "public" APIs but are also (optionally) exported from an ".Internal" module. The "contract" you give to API users about what kinds of changes should be breaking can then be a little stronger in the public API. Consumers of modules marked "internal" deserve what they get :) Other concrete examples where this has helped me as a library author in the past: In the snap-core web programming package, I have redefined the "Snap" monad half a dozen times but consumers of the public library API have to my recollection never been affected by these changes. This would not be possible without the datatype abstraction. As others have posted, keeping datatypes abstract also lets you maintain invariants that unfettered access to the datatype constructors might allow you to violate. For dozens of reasons it's a software engineering "best practice". G -- Gregory Collins

Erik Hesselink

6:44 p.m.

On Monday, November 18, 2013, Peter Simons wrote:

...

Niklas Hambüchen writes:

...
I would therefore vote to expose internals of *all* data types exposed. Not doing so is assuming the ability to foretell everything a user will ever do with it, which is never true.

I agree. I had plenty of trouble because libraries hid types and/or functions that I needed access to, but I never had any trouble because a library exposed a type that I didn't need access to.

+1 Erik

Christian Maeder

5:07 p.m.

Am 18.11.2013 17:21, schrieb Niklas Hambüchen:

...

On 18/11/13 15:37, Sean Leather wrote:

...
Examples include debugging (yours), implementing functions not included in the library (e.g. this comes up sometimes with Data.Map/Data.Set), and optimization.

...

...
In the case of DList, the underlying type is so simple that exposing it does not add enough value to warrant a second module.

These two points you give seem very contradicting.

...
From my experience, whatever it is, in the end somebody wants to give an instance to it.

I would therefore vote to expose internals of *all* data types exposed.

You do not believe in abstract data types, do you? With the constructor exposed you can spoil invariants, i.e. for DList: wrongEmptyList = DL $ const [] toList $ append wrongEmptyList $ fromList [1] yields "[]". Cheers Christian

...

Not doing so is assuming the ability to foretell everything a user will ever do with it, which is never true. _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Niklas Hambüchen

6 p.m.

On 18/11/13 17:07, Christian Maeder wrote:

...

You do not believe in abstract data types, do you?

With the constructor exposed you can spoil invariants

I understand it that everybody in this thread is talking about exporting it in an .Internal module as is common practice. Like Gregory said: Consumers of modules marked "internal" deserve what they get.

Sean Leather

7:54 p.m.

On Mon, Nov 18, 2013 at 6:21 PM, Niklas Hambüchen wrote:

...

On 18/11/13 15:37, Sean Leather wrote:

...
Examples include debugging (yours), implementing functions not included in the library (e.g. this comes up sometimes with Data.Map/Data.Set), and optimization.

...

...
In the case of DList, the underlying type is so simple that exposing it does not add enough value to warrant a second module.

These two points you give seem very contradicting.

The points are contradictory, but my message was not. I accept that there are reasons to expose constructors, but in this case, I think (1) the reason to hide the DList constructor is strong and (2) the reasons to expose the constructor are weak. Of course, this is somewhat subjective, but I have given evidence why the constructor should be hidden, and I have not seen any non-speculative evidence why it should be exposed.

...

From my experience, whatever it is, in the end somebody wants to give an instance to it.

In case you haven't seen the code [1], none of the included instances need the constructor. And, since dlists are easily (and, as I argued earlier, probably efficiently) converted to/from lists, those supposed instances that somebody wants to give can be implemented in the same way. Access to the constructor does not help there. I would therefore vote to expose internals of *all* data types exposed.

...

Not doing so is assuming the ability to foretell everything a user will ever do with it, which is never true.

Actually, I do want to prevent one thing that a user will do with the library: construct a `DList` from an arbitrary `[a] -> [a]` function. That's the point, and the library user loses little and gains a strong assurance of safety. I think I have said enough for this thread. If anyone can come up with a concrete (and safe) example of using the DList constructor outside the library, I would consider that as evidence for exposing it. It's exposed in v0.5 and will be deprecated in v0.6, so there's plenty of time to try. It's always possible to change the code later, too. As I said before [2], I think dlist is perfectly suitable for the Platform, but since it has so far been considered (1) only as a minor dependency to aeson (the formally proposed addition) and (2) somewhat controversial, the aeson proposal might be better simplified to aeson-without-dlist. Regardless of the final choice, however, I'm happy to develop and maintain dlist. Regards, Sean [1] https://github.com/spl/dlist/blob/master/Data/DList.hs [2] http://projects.haskell.org/pipermail/haskell-platform/2013-November/002750....

Duncan Coutts

20 Nov 20 Nov

3:37 p.m.

On Sun, 2013-11-17 at 12:20 -0800, Mark Lentczner wrote:

...

*Options for aeson:*

1. skip it in this release 2. include *aeson* 0.6.2.1 - requiring both *dlist* and *blaze-builder* 3. include *aeson* 0.6.2.x, a version with the patch that uses *bytestring*'s builder, and so require only *dlist* 4. include *aeson* 0.7.0.0 - requiring *dlist* and possibly *scientific*

I would strongly argue against including blaze-builder in the platform. The whole point of the work Simon Meier (and Johan and myself) have done over the last year or so with adding Builder into the bytestring package was so that we could get a good implementation with a good API into the platform (and in the appropriate place). It would be silly at this point to confuse matters for users. So I don't mind if there is an aeson-0.6.2.x or a 0.7.x release, so long as it's using the bytestring builder and not blaze-builder. And of course note that the current 2013.2 HP already includes bytestring-0.10 which has the new builder API. -- Duncan Coutts, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Bryan O'Sullivan

5:20 p.m.

On Wed, Nov 20, 2013 at 7:37 AM, Duncan Coutts wrote:

...

I would strongly argue against including blaze-builder in the platform.

Right, I don't think anyone wants that. It's no longer a dependency of the current development version of aeson, if the platform bytestring is new enough.

Mark Lentczner

26 Nov 26 Nov

4:19 a.m.

My view of the conversation so far: 1) Aeson option 3 or 4 is desirable - we all want aeson, we don't want blaze-builder 2) dlist has generated a fair bit of discussion, but no clear agreement. 3) no one has weighed in on scientific We need to come to rough consensus on these issues soon if we are to get to aeson in this round. My opinion, with my "HP Release Manager" hat on: I'd like to see a version of aeson-0.7 that didn't use dlist (just, ick, duplicated what it used in it's innards), and either had the Scientific removed, or as part of aeson itself (rather than as a separate package). I recognize the sub-standardness of this solution, though it is forward compatible with future evolutions (such as exposing dlist, and/or scientific). - Mark

Gregory Collins

12:46 p.m.

Is there a way to include packages as second-class in the platform? I.e. "this package isn't 'blessed' but for now it's required by a package we really do want." Then we could work later to either bring the dependencies into the platform or rework the platform packages to exclude the non-platform dependency. On Tue, Nov 26, 2013 at 5:19 AM, Mark Lentczner wrote:

...

My view of the conversation so far:

1) Aeson option 3 or 4 is desirable - we all want aeson, we don't want blaze-builder 2) dlist has generated a fair bit of discussion, but no clear agreement. 3) no one has weighed in on scientific

We need to come to rough consensus on these issues soon if we are to get to aeson in this round.

My opinion, with my "HP Release Manager" hat on: I'd like to see a version of aeson-0.7 that didn't use dlist (just, ick, duplicated what it used in it's innards), and either had the Scientific removed, or as part of aeson itself (rather than as a separate package). I recognize the sub-standardness of this solution, though it is forward compatible with future evolutions (such as exposing dlist, and/or scientific).

- Mark

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

-- Gregory Collins

Mark Lentczner

4:01 p.m.

On Tue, Nov 26, 2013 at 4:46 AM, Gregory Collins wrote:

...

Is there a way to include packages as second-class in the platform?

We quasi-did that with primitives. But as I recall, the discussion at the time acknowledged the ineffectiveness of the approach: The point of the platform is that if you compile against the platform, it should be stable - and there is really no effective way to keep people from compiling against them, or even warn them: We could hide them in the package database, but cabal will "unearth" them, without warning, if your project requires it. I suppose this points at an interesting possible cabal enhancement: If it chooses a package that is hidden in the db, it warns you at configure time, even though at build time it is going to use it. - Mark

Bryan O'Sullivan

27 Nov 27 Nov

5:27 p.m.

On Mon, Nov 25, 2013 at 8:19 PM, Mark Lentczner wrote:

...

1) Aeson option 3 or 4 is desirable - we all want aeson, we don't want blaze-builder 2) dlist has generated a fair bit of discussion, but no clear agreement. 3) no one has weighed in on scientific

I spent a bit of time digging into aeson's use of dlist this morning (I didn't add it myself, hence the need to dig), and it actually uses almost the entire API. I do not plan to duplicate dlist into aeson, so the remaining options are to bring dlist into the platform (which is fine by me), or omit aeson. The dlist package is somewhat widely used, and it's very stable, so I am unfussed about its inclusion. Regarding scientific, it solves a real problem with correctness in previous versions of aeson. It is admittedly both new and nichey, but I would be fine to see it go into the platform for the same reason we were okay including the primitive package. It is depended upon by both the latest or HEAD versions of attoparsec and aeson, so the option to fold its code into aeson doesn't really exist.

Bardur Arantsson

6:27 p.m.

On 2013-11-27 18:27, Bryan O'Sullivan wrote:

...

On Mon, Nov 25, 2013 at 8:19 PM, Mark Lentczner wrote:

...
1) Aeson option 3 or 4 is desirable - we all want aeson, we don't want blaze-builder 2) dlist has generated a fair bit of discussion, but no clear agreement. 3) no one has weighed in on scientific

I spent a bit of time digging into aeson's use of dlist this morning (I didn't add it myself, hence the need to dig), and it actually uses almost the entire API. I do not plan to duplicate dlist into aeson, so the remaining options are to bring dlist into the platform (which is fine by me), or omit aeson. The dlist package is somewhat widely used, and it's very stable, so I am unfussed about its inclusion.

I think the objection people have is that there are (were?) some last-minute changes to dlist as indicated here: http://permalink.gmane.org/gmane.comp.lang.haskell.libraries/20698 I think many of the objections were raised before the details of the changes were known, so meh. +0, I guess. (I was one of the original objectors based on the requirement for API stability before inclusion in the HP.)

...

Regarding scientific, it solves a real problem with correctness in previous versions of aeson. It is admittedly both new and nichey, but I would be fine to see it go into the platform for the same reason we were okay including the primitive package. It is depended upon by both the latest or HEAD versions of attoparsec and aeson, so the option to fold its code into aeson doesn't really exist.

Mark Lentczner

7:28 p.m.

Okay - to put a concrete proposal on this: 1) Include dlist-0.6, which would have the additions that Sean L. has proposed 2) Include scientific as package 3) Include an aeson that is updated to use the above two packages (and won't use blaze-builder) *Sean* - do you have a time frame for such changes? *Bas* - do you feel the API of scientific is solid enough to go into the platform? Are you willing to commit to the stability needs of the platform(*) *Bryan* - are you ready to turn out an aeson version that uses the above once they are out? *Everyone else* - After all the conversations, do we feel the rough consensus is to go ahead with these. In particular the changes that Sean proposed for dlist, and the inclusion of scientific. I realize this is Thanksgiving week in the U.S. - but let's hope we can come to a close on this by next Tuesday - Dec 3rd. Preferably sooner! - Mark (*): Meaning, the package would remain backward compatible for a few revs with ample deprecation time before altering APIs that would break code -- even through major version number changes.

Carter Schonwald

9:40 p.m.

I'm looking at the scientific package, and I thought to myself "huh, it doesn't seem to normalize the numbers to scientific notation" it has the right semantics for how printing and operations *mean*, but it doesn't seem to internally keep them as "normalized" floats. heres a snippet of a ghci transcript to illustrate this Prelude Data.Scientific> coefficient $ scientific 1000 2 1000 Prelude Data.Scientific> coefficient $ scientific 1 2 1 Prelude Data.Scientific> scientific 1000 2 100000.0 Prelude Data.Scientific> scientific 100 3 100000.0 Prelude Data.Scientific> coefficient $ scientific 10 4 10 Prelude Data.Scientific> scientific 10 4 100000.0 Prelude Data.Scientific> scientific 10 4 == scientific 1 5 True shoudn't an "exact" model of scientific notation format numbers keep them in normalized form? (ie clip the trailing zeros and such) On Wed, Nov 27, 2013 at 2:28 PM, Mark Lentczner wrote:

...

Okay - to put a concrete proposal on this:

1) Include dlist-0.6, which would have the additions that Sean L. has proposed 2) Include scientific as package 3) Include an aeson that is updated to use the above two packages (and won't use blaze-builder)

*Sean* - do you have a time frame for such changes? *Bas* - do you feel the API of scientific is solid enough to go into the platform? Are you willing to commit to the stability needs of the platform(*) *Bryan* - are you ready to turn out an aeson version that uses the above once they are out?

*Everyone else* - After all the conversations, do we feel the rough consensus is to go ahead with these. In particular the changes that Sean proposed for dlist, and the inclusion of scientific.

I realize this is Thanksgiving week in the U.S. - but let's hope we can come to a close on this by next Tuesday - Dec 3rd. Preferably sooner!

- Mark

(*): Meaning, the package would remain backward compatible for a few revs with ample deprecation time before altering APIs that would break code -- even through major version number changes.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Henning Thielemann

10:13 p.m.

On Wed, 27 Nov 2013, Carter Schonwald wrote:

...

I'm looking at the scientific package, and I thought to myself "huh, it doesn't seem to normalize the numbers to scientific notation" it has the right semantics for how printing and operations *mean*, but it doesn't seem to internally keep them as "normalized" floats.

The mantissa is stored as Integer. I think that is the only way to keep all digits. But with that representation it is certainly not sensible to normalize to the most significant digit. A reasonable normalization might be to divide the mantissa by a power of 10 such that the mantissa is not a multiple of 10.

Carter Schonwald

10:18 p.m.

agreed, I think that would be a sensible choice. The key point being, every representable number should have a unique representation. On Wed, Nov 27, 2013 at 5:13 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:

...

On Wed, 27 Nov 2013, Carter Schonwald wrote:

I'm looking at the scientific package, and I thought to myself "huh, it

...
doesn't seem to normalize the numbers to scientific notation" it has the right semantics for how printing and operations *mean*, but it doesn't seem to internally keep them as "normalized" floats.

The mantissa is stored as Integer. I think that is the only way to keep all digits. But with that representation it is certainly not sensible to normalize to the most significant digit. A reasonable normalization might be to divide the mantissa by a power of 10 such that the mantissa is not a multiple of 10.

Bryan O'Sullivan

10:32 p.m.

On Wed, Nov 27, 2013 at 11:28 AM, Mark Lentczner wrote:

...

1) Include dlist-0.6, which would have the additions that Sean L. has proposed 2) Include scientific as package 3) Include an aeson that is updated to use the above two packages (and won't use blaze-builder)

4) Include a suitably new version of attoparsec (as yet unreleased)

Christian Maeder

28 Nov 28 Nov

9:24 a.m.

Hi, If the proposal is to include (at least) 3 packages that are not ready yet (on hackage) and attoparsec raises the question about how two different parser packages should coexist in the HP then, I think, aeson disqualifies itself to be included into the next HP 2013.4.0.0. Let's discuss and reconsider it in 2014. Cheers Christian Am 27.11.2013 23:32, schrieb Bryan O'Sullivan:

...

On Wed, Nov 27, 2013 at 11:28 AM, Mark Lentczner mailto:mark.lentczner@gmail.com> wrote:

1) Include dlist-0.6, which would have the additions that Sean L. has proposed 2) Include scientific as package 3) Include an aeson that is updated to use the above two packages (and won't use blaze-builder)

4) Include a suitably new version of attoparsec (as yet unreleased)

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Roman Cheplyaka

10:40 a.m.

* Christian Maeder [2013-11-28 10:24:49+0100]

...

and attoparsec raises the question about how two different parser packages should coexist in the HP

I must have missed that — what is the question, exactly? Roman

Christian Maeder

11:27 a.m.

Am 28.11.2013 11:40, schrieb Roman Cheplyaka:

...

* Christian Maeder [2013-11-28 10:24:49+0100]

...
and attoparsec raises the question about how two different parser packages should coexist in the HP

I must have missed that — what is the question, exactly?

What is the recommended parser package of the HP to use when I start writing parsers? C.

...

Roman

Roman Cheplyaka

1:52 p.m.

* Christian Maeder [2013-11-28 12:27:30+0100]

...

Am 28.11.2013 11:40, schrieb Roman Cheplyaka:

...
* Christian Maeder [2013-11-28 10:24:49+0100]

...
and attoparsec raises the question about how two different parser packages should coexist in the HP

I must have missed that — what is the question, exactly?

What is the recommended parser package of the HP to use when I start writing parsers?

Depends on what kind of parser you're writing. Parsec is for things that humans typically write — like programs and config files. When the speed is not an issue, and good error messages are required. Attoparsec is for machine-generated formats, where you need performance but error messages are not so critical. So there's not that much overlap between these packages, although they both deal with parsing. Roman

Erik Hesselink

11:10 a.m.

On Thu, Nov 28, 2013 at 10:24 AM, Christian Maeder wrote:

...

If the proposal is to include (at least) 3 packages that are not ready yet (on hackage) and attoparsec raises the question about how two different parser packages should coexist in the HP then, I think, aeson disqualifies itself to be included into the next HP 2013.4.0.0. Let's discuss and reconsider it in 2014.

I think this is a bad idea, since nothing will have changed substantially between now and 2014, so we'll only be postponing a discussion we might as well have now. Erik

Christian Maeder

11:45 a.m.

Am 28.11.2013 12:10, schrieb Erik Hesselink:

...

On Thu, Nov 28, 2013 at 10:24 AM, Christian Maeder wrote:

...
If the proposal is to include (at least) 3 packages that are not ready yet (on hackage) and attoparsec raises the question about how two different parser packages should coexist in the HP then, I think, aeson disqualifies itself to be included into the next HP 2013.4.0.0. Let's discuss and reconsider it in 2014.

I think this is a bad idea, since nothing will have changed substantially between now and 2014, so we'll only be postponing a discussion we might as well have now.

I don't mind discussing (now or later), but shouldn't the exact implications be made clear before? Was the inclusion of attoparsec discussed before? (I may have missed it, though.) If the packages to be updated (aeson) or extended (dlist) or made "suitably" new (attoparsec) are not on hackage yet, how can the "requirement for API stability before inclusion in the HP" be met? Cheers Christian

...

Erik

Sven Panne

12:42 p.m.

Just two add my 2c: Given all these new packages which would need to be pulled into the HP just for aeson, let's not include aeson for 2013.4.0.0 and release 2013.4.0.0 soon without the need for lengthy discussions. One of the main points of the HP is stability, and doing things in a useless rush now just to get one more package in is not the right way to proceed IMHO. We had *tons* of discussions in the past about minor details and now we should pull lots of largely unreviewed/unreleased/unstable things in a hurry? That would miss the main motivation behind the HP. IMHO a package which is (about to be) included in the API has quite a few constraints: Its API has to be stable, but its dependencies have to be stable, too, and they have to be in the HP as well. You can't simply use tons of other stuff just because it's convenient. Things have to be more self-contained than in "normal" packages. Yes, that's a bit of a burden for the maintainer, but OTOH that's the price one has to play for almost universal adoption of one's package.

Bas van Dijk

4:42 p.m.

On 28 November 2013 13:42, Sven Panne wrote:

...

Just two add my 2c: Given all these new packages which would need to be pulled into the HP just for aeson, let's not include aeson for 2013.4.0.0 and release 2013.4.0.0 soon without the need for lengthy discussions.

As the proposer for inclusion of aeson in the HP I'm beginning to agree. There's another reason I would like to postpone the aeson inclusion: I just started working on improving the encoding performance of aeson. This requires some significant changes to the API. Therefore I think it would be better to see how well this new API works out. If it works out, release it as aeson-7 (or aeson-8) and include that release in the HP after next. This way we have time to discuss the new dependencies and the HP remains stable. The following is a brief explanation of the new aeson API (you can stop reading here if you're not interested in it): The idea is to use the same trick that is used in the upcoming binary package[1]. First of all toJSON will return a JsonBuilder instead of a Value: class ToJSON a where toJSON :: a -> JsonBuilder A JsonBuilder is basically a difference list: newtype JsonBuilder = JsonBuilder (IStream -> IStream) instance Monoid JsonBuilder where ... The "list", here represented as an IStream, is a sequence of instructions to the encoder: data IStream = INull IStream | ITrue IStream | IFalse IStream | IDoubleQuote IStream | IChar {-# UNPACK #-} !Char IStream | IString !String IStream | IText !Text IStream | IInt {-# UNPACK #-} !Int IStream | IInt8 {-# UNPACK #-} !Int8 IStream | IInt16 {-# UNPACK #-} !Int16 IStream | IInt32 {-# UNPACK #-} !Int32 IStream | IInt64 {-# UNPACK #-} !Int64 IStream | IWord {-# UNPACK #-} !Word IStream | IWord8 {-# UNPACK #-} !Word8 IStream | IWord16 {-# UNPACK #-} !Word16 IStream | IWord32 {-# UNPACK #-} !Word32 IStream | IWord64 {-# UNPACK #-} !Word64 IStream | IFloat {-# UNPACK #-} !Float IStream | IDouble {-# UNPACK #-} !Double IStream | IInteger !Integer IStream | IScientific !Scientific IStream | IComma IStream | IBeginArray IStream | IEndArray IStream | IBeginObject IStream | IEndObject IStream | IColon IStream | IValue !Value IStream -- Fused: | IBeginObject_IDoubleQuote IStream | IComma_IDoubleQuote IStream -- TODO; more | IEnd Converting a JsonBuilder to a Builder (note that I'm using the new bytestring Builder here) is simply a matter of executing the right Builder for each instruction: toBuilder :: JsonBuilder -> Builder toBuilder (JsonBuilder g) = go (g IEnd) where go :: IStream -> Builder go is = case is of INull is' -> nullB <> go is' ITrue is' -> trueB <> go is' IFalse is' -> falseB <> go is' IDoubleQuote is' -> char8 '"' <> go is' IChar c is' -> char c <> go is' IString cs is' -> string cs <> go is' IText t is' -> text t <> go is' IInt i is' -> intDec i <> go is' IInt8 i8 is' -> int8Dec i8 <> go is' IInt16 i16 is' -> int16Dec i16 <> go is' IInt32 i32 is' -> int32Dec i32 <> go is' IInt64 i64 is' -> int64Dec i64 <> go is' IWord w is' -> wordDec w <> go is' IWord8 w8 is' -> word8Dec w8 <> go is' IWord16 w16 is' -> word16Dec w16 <> go is' IWord32 w32 is' -> word32Dec w32 <> go is' IWord64 w64 is' -> word64Dec w64 <> go is' IFloat f is' -> floatDec f <> go is' IDouble d is' -> doubleDec d <> go is' IInteger i is' -> integerDec i <> go is' IScientific s is' -> fromScientific s <> go is' IComma is' -> char8 ',' <> go is' IBeginArray is' -> char8 '[' <> go is' IEndArray is' -> char8 ']' <> go is' IBeginObject is' -> char8 '{' <> go is' IEndObject is' -> char8 '}' <> go is' IColon is' -> char8 ':' <> go is' IValue v is' -> fromValue v <> go is' -- Fused: IBeginObject_IDoubleQuote is'-> fixed2('{','"')<> go is' IComma_IDoubleQuote is'-> fixed2(',','"')<> go is' -- TODO: more IEnd -> mempty nullB :: Builder nullB = fixed4 ('n',('u',('l','l'))) {-# INLINE nullB #-} trueB :: Builder trueB = fixed4 ('t',('r',('u','e'))) {-# INLINE trueB #-} falseB :: Builder falseB = fixed5 ('f',('a',('l',('s','e')))) {-# INLINE falseB #-} fixed2 :: (Char, Char) -> Builder fixed2 = P.primFixed (P.char8 >*< P.char8) {-# INLINE fixed2 #-} fixed4 :: (Char, (Char, (Char, Char))) -> Builder fixed4 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8) {-# INLINE fixed4 #-} fixed5 :: (Char, (Char, (Char, (Char, Char)))) -> Builder fixed5 = P.primFixed (P.char8 >*< P.char8 >*< P.char8 >*< P.char8 >*< P.char8) {-# INLINE fixed5 #-} This representation allows a lot of optimizations. For example we can define rewrite rules that "fuse" the Builders of common sequences like: {-# RULES "IBeginObject_IDoubleQuote" forall is. IBeginObject (IDoubleQuote is) = IBeginObject_IDoubleQuote is #-} {-# RULES "IComma_IDoubleQuote" forall is. IComma (IDoubleQuote is) = IComma_IDoubleQuote is #-} The encoder can handle these common sequences more efficiently. Of course the JsonBuilder is abstract to the user. There will be a safe API to construct well-formed JsonBuilders. (While writing this I realize that users will be able to use the Monoid instance for JsonBuilders which is undesirable. I will solve this by wrapping the JsonBuilder returned from toJSON in another newtype which doesn't have a Monoid instance) What do we loose? In the current API of aeson, toJSON will directly return a Value. This Value can then be inspected or extended. In order to do the same in the new API the JsonBuilder first has to be parsed to a Value which is less efficient. However, if the new API proves to be significantly more efficient for encoding I think this extra parsing cost is warranted since it's far less common than encoding. A first version of this API will soon be ready and I will push that to my github. Hopefully I can come up with some convincing benchmarks! Bas

Bardur Arantsson

7:19 p.m.

On 2013-11-28 17:42, Bas van Dijk wrote:

...

On 28 November 2013 13:42, Sven Panne wrote:

...
Just two add my 2c: Given all these new packages which would need to be pulled into the HP just for aeson, let's not include aeson for 2013.4.0.0 and release 2013.4.0.0 soon without the need for lengthy discussions.

As the proposer for inclusion of aeson in the HP I'm beginning to agree.

There's another reason I would like to postpone the aeson inclusion: I just started working on improving the encoding performance of aeson. This requires some significant changes to the API. Therefore I think it would be better to see how well this new API works out. If it works out, release it as aeson-7 (or aeson-8) and include that release in the HP after next. This way we have time to discuss the new dependencies and the HP remains stable.

[--snip lots of interesting details--] You mentioned generating JSON, so I just thought I'd mention that it might also be possible to speed up *parsing* hugely, assuming that only a few fields/values are needed/evaluated. There's a very interesting paper called "Semi-Indexing Semi-Structured Data in Tiny Space" (G. Ottaviano, R. Grossi, 2011) which basically skips the whole "parsing" overhead in favor of only "scanning" overhead in its approach to parsing -- which seems to compare very favorably to C/C++ code for parsing JSON. In addition it uses space-efficient data structures for all intermediate data, so it may even pay to build a semi-index and then use that to parse even in one-off situations. (Credit where credit's due: I think it was Edward Kmett who posted a comment with this reference on Reddit. I think he mentioned something about pursuing this for Lens in his Copious Spare Time(TM)?). I'm not sure how this work could be integrated with Aeson, but I'm betting somebody out there has good ideas. Aeson is already extremely good, but let's make it even better! ... and by "us", I mean "you, dear Haskell community". Regards,

Sean Leather

1:35 p.m.

On Wed, Nov 27, 2013 at 9:28 PM, Mark Lentczner wrote:

...

Okay - to put a concrete proposal on this:

1) Include dlist-0.6, which would have the additions that Sean L. has proposed

[...]

...

*Sean* - do you have a time frame for such changes?

[...] dlist-0.6 is ready for imminent release. Just waiting on the Hackage admins. I also ran the test suite in the aeson HEAD with the dlist HEAD just to check. Regards, Sean

4232

Age (days ago)

4243

Last active (days ago)

List overview

Download

37 comments

19 participants

participants (19)

Bardur Arantsson
Bas van Dijk
Bryan O'Sullivan
Carter Schonwald
Christian Maeder
Don Stewart
Duncan Coutts
Erik Hesselink
Gregory Collins
Henning Thielemann
Ivan Lazar Miljenovic
Joachim Breitner
John Lato
Mark Lentczner
Niklas Hambüchen
Peter Simons
Roman Cheplyaka
Sean Leather
Sven Panne