Proposal: Allow gunfold for Data.Map, Data.IntMap, etc.

I would like to propose improving the Data instances for a number of currently completely opaque data types in the containers package, by using virtual constructors. The instance for Data.Map already uses fromList for gfoldl, it just stops there. Extending it to be able to gunfold and mention the name of that constructor would enable generic traversal libraries like uniplate, etc. to work over the contents of the Map, rather than bailing out in fear or crashing at the sight of a mkNoRepType. An example of the changes for Data.Map are highlighted below. instance (Data k, Data a, Ord k) => Data (Map k a) where gfoldl f z m = z fromList `f` toList m toConstr _ = fromListConstr gunfold k z c = case constrIndex c of 1 -> k (z fromList) _ -> error "gunfold" dataTypeOf _ = mapDataType dataCast2 f = gcast2 f fromListConstr :: Constr fromListConstr = mkConstr mapDataType "fromList" [] Prefix mapDataType :: DataType mapDataType = mkDataType "Data.Map.Map" [fromListConstr] I've used this approach for years on my own libraries to great effect. Discussion Period: 3 Weeks (I added a week for ICFP) -Edward

Hi Edward,
I would like to propose improving the Data instances for a number of currently completely opaque data types in the containers package, by using virtual constructors.
The instance for Data.Map already uses fromList for gfoldl, it just stops there.
Extending it to be able to gunfold and mention the name of that constructor would enable generic traversal libraries like uniplate, etc. to work over the contents of the Map, rather than bailing out in fear or crashing at the sight of a mkNoRepType.
An example of the changes for Data.Map are highlighted below.
instance (Data k, Data a, Ord k) => Data (Map k a) where gfoldl f z m = z fromList `f` toList m toConstr _ = fromListConstr gunfold k z c = case constrIndex c of 1 -> k (z fromList) _ -> error "gunfold" dataTypeOf _ = mapDataType dataCast2 f = gcast2 f
fromListConstr :: Constr fromListConstr = mkConstr mapDataType "fromList" [] Prefix
mapDataType :: DataType mapDataType = mkDataType "Data.Map.Map" [fromListConstr]
I've used this approach for years on my own libraries to great effect.
+1 here. I am not very familiar with the Data instances -- is it true that the parameter of the `fromList` in the Data instance will often be sorted (i.e., result of `toList` or `filter . toList`)? If so, we could use fromMaybeAscList which would look like fromMaybeAscList list | isDistinctAsc list = fromDistinctAscList list | otherwise = fromList list There is a big gain in using a linear-time fromDistinctAscList over O(N log N) fromList, but there is a linear-time check and the list must be kept around until isDistinctAsc finishes. Cheers, Milan

On Wed, Aug 29, 2012 at 3:24 PM, Milan Straka
Hi Edward,
I would like to propose improving the Data instances for a number of currently completely opaque data types in the containers package, by using virtual constructors.
The instance for Data.Map already uses fromList for gfoldl, it just stops there.
Extending it to be able to gunfold and mention the name of that constructor would enable generic traversal libraries like uniplate, etc. to work over the contents of the Map, rather than bailing out in fear or crashing at the sight of a mkNoRepType.
An example of the changes for Data.Map are highlighted below.
instance (Data k, Data a, Ord k) => Data (Map k a) where gfoldl f z m = z fromList `f` toList m toConstr _ = fromListConstr gunfold k z c = case constrIndex c of 1 -> k (z fromList) _ -> error "gunfold" dataTypeOf _ = mapDataType dataCast2 f = gcast2 f
fromListConstr :: Constr fromListConstr = mkConstr mapDataType "fromList" [] Prefix
mapDataType :: DataType mapDataType = mkDataType "Data.Map.Map" [fromListConstr]
I've used this approach for years on my own libraries to great effect.
+1 here.
I am not very familiar with the Data instances -- is it true that the parameter of the `fromList` in the Data instance will often be sorted (i.e., result of `toList` or `filter . toList`)? If so, we could use fromMaybeAscList which would look like fromMaybeAscList list | isDistinctAsc list = fromDistinctAscList list | otherwise = fromList list There is a big gain in using a linear-time fromDistinctAscList over O(N log N) fromList, but there is a linear-time check and the list must be kept around until isDistinctAsc finishes.
The users of Data.Data could in theory do anything they want to the keys, but I do confess for most scenarios they'll come back to you ordered. Hrmm. A more nuanced fromList construction could definitely help, though I suppose that could apply in the general case as well. We should be able to fuse this "try to construct linearly, but fall back on N-log-N" version of fromList in one pass even for normal uses of fromList. e.g. assume that you are constructing a sorted tree until you find a key out of order, then take the tree you've built so far and union it appropriately with the slower constructed fromList of the remainder. That way you don't have to retain the storage for both the list and the map, and we only force the list once. -Edward

On Wed, Aug 29, 2012 at 4:54 PM, Edward Kmett
We should be able to fuse this "try to construct linearly, but fall back on N-log-N" version of fromList in one pass even for normal uses of fromList.
e.g. assume that you are constructing a sorted tree until you find a key out of order, then take the tree you've built so far and union it appropriately with the slower constructed fromList of the remainder. That way you don't have to retain the storage for both the list and the map, and we only force the list once.
Could you instead of falling back to the slower fromList just keep building new trees and then union them all in the end. Real world data is often is semi-sorted order. I guess we would need to detect some worst case scenarios (i.e. sorted in reverse order) and fall back to the slower fromList in those cases. -- Johan

This might pay off as well, but I am leery that its a rather tricky
balancing act and will take a lot of profiling to find the right balance in
practical performance vs. asymptotic bounds.
Should we split the notion of improving the performance of fromList into a
separate project/proposal
that simply exposes synergies with this one?
-Edward
On Wed, Aug 29, 2012 at 8:02 PM, Johan Tibell
On Wed, Aug 29, 2012 at 4:54 PM, Edward Kmett
wrote: We should be able to fuse this "try to construct linearly, but fall back on N-log-N" version of fromList in one pass even for normal uses of fromList.
e.g. assume that you are constructing a sorted tree until you find a key out of order, then take the tree you've built so far and union it appropriately with the slower constructed fromList of the remainder. That way you don't have to retain the storage for both the list and the map, and we only force the list once.
Could you instead of falling back to the slower fromList just keep building new trees and then union them all in the end. Real world data is often is semi-sorted order. I guess we would need to detect some worst case scenarios (i.e. sorted in reverse order) and fall back to the slower fromList in those cases.
-- Johan

On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett
This might pay off as well, but I am leery that its a rather tricky balancing act and will take a lot of profiling to find the right balance in practical performance vs. asymptotic bounds.
Should we split the notion of improving the performance of fromList into a separate project/proposal that simply exposes synergies with this one?
Sure. Lets continue this thread on its original topic. I'm +1 on the proposal.

On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett
wrote: This might pay off as well, but I am leery that its a rather tricky balancing act and will take a lot of profiling to find the right balance in practical performance vs. asymptotic bounds.
Should we split the notion of improving the performance of fromList into a separate project/proposal that simply exposes synergies with this one?
Sure. Lets continue this thread on its original topic. I'm +1 on the proposal.
+1 from me too. A smarter fromList also sounds cool but I agree that should be a separate proposal, if only to avoid derailing this one. -Brent

For what it is worth, Milan already implemented the smarter fromList and it
is considerably faster. =)
If I'm reading his numbers right, its ~50% faster than the original in the
worst case, and its about ~10x faster on large sorted inputs, and within 5%
of the speed of fromDistinctAscList.
Overall, it is a huge win.
-Edward
On Thu, Aug 30, 2012 at 9:50 AM, Brent Yorgey
On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett
wrote: This might pay off as well, but I am leery that its a rather tricky balancing act and will take a lot of profiling to find the right balance in practical performance vs. asymptotic bounds.
Should we split the notion of improving the performance of fromList into a separate project/proposal that simply exposes synergies with this one?
Sure. Lets continue this thread on its original topic. I'm +1 on the proposal.
+1 from me too. A smarter fromList also sounds cool but I agree that should be a separate proposal, if only to avoid derailing this one.
-Brent
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

As the deadline has long since past and this appears to have been
relatively non-controversial, what do we need to do to get these into
containers?
-Edward
On Sat, Sep 1, 2012 at 8:23 AM, Edward Kmett
For what it is worth, Milan already implemented the smarter fromList and it is considerably faster. =)
If I'm reading his numbers right, its ~50% faster than the original in the worst case, and its about ~10x faster on large sorted inputs, and within 5% of the speed of fromDistinctAscList.
Overall, it is a huge win.
-Edward
On Thu, Aug 30, 2012 at 9:50 AM, Brent Yorgey
wrote: On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett
wrote: This might pay off as well, but I am leery that its a rather tricky balancing act and will take a lot of profiling to find the right balance in practical performance vs. asymptotic bounds.
Should we split the notion of improving the performance of fromList into a separate project/proposal that simply exposes synergies with this one?
Sure. Lets continue this thread on its original topic. I'm +1 on the proposal.
+1 from me too. A smarter fromList also sounds cool but I agree that should be a separate proposal, if only to avoid derailing this one.
-Brent
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hi Edward,
As the deadline has long since past and this appears to have been relatively non-controversial, what do we need to do to get these into containers?
the best way would be to create the patch and then create a pull request at https://github.com/haskell/containers . Johan and me will be notified then, will review the code and merge it. Cheers, Milan
-Edward
On Sat, Sep 1, 2012 at 8:23 AM, Edward Kmett
wrote: For what it is worth, Milan already implemented the smarter fromList and it is considerably faster. =)
If I'm reading his numbers right, its ~50% faster than the original in the worst case, and its about ~10x faster on large sorted inputs, and within 5% of the speed of fromDistinctAscList.
Overall, it is a huge win.
-Edward
On Thu, Aug 30, 2012 at 9:50 AM, Brent Yorgey
wrote: On Wed, Aug 29, 2012 at 06:06:25PM -0700, Johan Tibell wrote:
On Wed, Aug 29, 2012 at 5:57 PM, Edward Kmett
wrote: This might pay off as well, but I am leery that its a rather tricky balancing act and will take a lot of profiling to find the right balance in practical performance vs. asymptotic bounds.
Should we split the notion of improving the performance of fromList into a separate project/proposal that simply exposes synergies with this one?
Sure. Lets continue this thread on its original topic. I'm +1 on the proposal.
+1 from me too. A smarter fromList also sounds cool but I agree that should be a separate proposal, if only to avoid derailing this one.
-Brent
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

+1
Pedro
On Wed, Aug 29, 2012 at 6:34 PM, Edward Kmett
I would like to propose improving the Data instances for a number of currently completely opaque data types in the containers package, by using virtual constructors.
The instance for Data.Map already uses fromList for gfoldl, it just stops there.
Extending it to be able to gunfold and mention the name of that constructor would enable generic traversal libraries like uniplate, etc. to work over the contents of the Map, rather than bailing out in fear or crashing at the sight of a mkNoRepType.
An example of the changes for Data.Map are highlighted below.
instance (Data k, Data a, Ord k) => Data (Map k a) where gfoldl f z m = z fromList `f` toList m toConstr _ = fromListConstr gunfold k z c = case constrIndex c of 1 -> k (z fromList) _ -> error "gunfold" dataTypeOf _ = mapDataType dataCast2 f = gcast2 f
fromListConstr :: Constr fromListConstr = mkConstr mapDataType "fromList" [] Prefix
mapDataType :: DataType mapDataType = mkDataType "Data.Map.Map" [fromListConstr]
I've used this approach for years on my own libraries to great effect.
Discussion Period: 3 Weeks
(I added a week for ICFP)
-Edward
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (5)
-
Brent Yorgey
-
Edward Kmett
-
Johan Tibell
-
José Pedro Magalhães
-
Milan Straka