Suitable structure to represents lots of similar lists

I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case? (It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.) -- Dan

I've been meaning to generalize Data.Rope in my rope library to use Vector
rather than ByteString.
Ultimately it looks like a FingerTree of Vector's using length as the
monoid.
The vectors can be sliced cheaply and the fingertree as a whole supports
cheap splicing.
-Edward Kmett
On Wed, Apr 7, 2010 at 6:22 PM, Dan Piponi
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case?
(It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.) -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Dan Piponi wrote:
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case?
(It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.)
I'm not sure whether these general properties of your maps and filters can actually be exploited. The thing is that you still have to "touch" every element anyway, so you can as well allocate a new cons cell and garbage collect the old one while you're at it. But if you can skip large contiguous parts of the lists, then sharing may be worth thinking about. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

Id doesn´t have to create a copy of the original object ( I infer this from
referential transparency) so the new list must store the same original
reference. Any pure structure would conserve references after id. filter as
far as I know. Am I wrong?
2010/4/8 Dan Piponi
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case?
(It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.) -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I think Dan is talking about sharing the spine of the lists...
How about representing the lists using something along the lines of:
data List a = Nil | Leaf a | Cat (List a) (List a)
data Transformed a = Changed a | Unchanged a
extract :: Transformed a -> a
extract (Unchanged a) = a
extract (Changed a') = a'
-- If the first argument returns Nothing, that means 'identity'
mapShared :: (a -> Transformed a) -> List a -> List a
mapShared f xs = extract (mapShared' f xs)
cat :: List a -> Transformed (List a) -> Transformed (List a) ->
Transformed (List a)
cat xs (Unchanged _) (Unchanged _) = Unchanged xs
cat xs (Changed ys') (Unchanged zs) = Changed (Cat ys' zs)
cat xs (Unchanged ys) (Changed zs') = Changed (Cat ys zs')
cat xs (Changed ys') (Changed zs') = Changed (Cat ys' zs')
mapShared' :: (a -> Transformed a) -> List a -> Transformed (List a)
mapShared' f xs@Nil = Unchanged xs
mapShared' f xs@(Leaf a) = case f a of { Unchanged _ -> Unchanged xs ;
Changed a' -> Changed (Leaf a') }
mapShared' f xs@(Cat ys zs) = cat xs (mapShared' f ys) (mapShared' f zs)
filterShared :: (a -> Bool) -> List a -> List a
filterShared p xs = original xs (filterShared' p xs)
filterShared' :: (a -> Bool) -> List a -> Transformed (List a)
filterShared' p xs@Nil = Unchanged xs
filterShared' p xs@(Leaf x) = if p x then Unchanged xs else Changed Nil
filterShared' p xs@(Cat ys zs) = cat xs (filterShared' p ys)
(filterShared' p zs)
Perhaps this can be made into a monad or something like that, but I am
not sure... Perhaps rebalancing (or generally using a finger tree)
would also do well.
So, looks like we preserve whole 'subtrees' shared if they were not
'changed' by map or filter.
2010/4/8 Alberto G. Corona
Id doesn´t have to create a copy of the original object ( I infer this from referential transparency) so the new list must store the same original reference. Any pure structure would conserve references after id. filter as far as I know. Am I wrong?
2010/4/8 Dan Piponi
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case?
(It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.) -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Senior Developer, JetBrains

2010/4/8 Eugene Kirpichov
I think Dan is talking about sharing the spine of the lists...
How about representing the lists using something along the lines of:
data List a = Nil | Leaf a | Cat (List a) (List a)
data Transformed a = Changed a | Unchanged a extract :: Transformed a -> a extract (Unchanged a) = a extract (Changed a') = a'
-- If the first argument returns Nothing, that means 'identity' Whoops, this is an artifact of my brief thought to use Maybe instead of Transformed...
mapShared :: (a -> Transformed a) -> List a -> List a mapShared f xs = extract (mapShared' f xs)
cat :: List a -> Transformed (List a) -> Transformed (List a) -> Transformed (List a) cat xs (Unchanged _) (Unchanged _) = Unchanged xs cat xs (Changed ys') (Unchanged zs) = Changed (Cat ys' zs) cat xs (Unchanged ys) (Changed zs') = Changed (Cat ys zs') cat xs (Changed ys') (Changed zs') = Changed (Cat ys' zs')
mapShared' :: (a -> Transformed a) -> List a -> Transformed (List a) mapShared' f xs@Nil = Unchanged xs mapShared' f xs@(Leaf a) = case f a of { Unchanged _ -> Unchanged xs ; Changed a' -> Changed (Leaf a') } mapShared' f xs@(Cat ys zs) = cat xs (mapShared' f ys) (mapShared' f zs)
filterShared :: (a -> Bool) -> List a -> List a filterShared p xs = original xs (filterShared' p xs)
filterShared' :: (a -> Bool) -> List a -> Transformed (List a) filterShared' p xs@Nil = Unchanged xs filterShared' p xs@(Leaf x) = if p x then Unchanged xs else Changed Nil filterShared' p xs@(Cat ys zs) = cat xs (filterShared' p ys) (filterShared' p zs)
Perhaps this can be made into a monad or something like that, but I am not sure... Perhaps rebalancing (or generally using a finger tree) would also do well.
So, looks like we preserve whole 'subtrees' shared if they were not 'changed' by map or filter.
2010/4/8 Alberto G. Corona
: Id doesn´t have to create a copy of the original object ( I infer this from referential transparency) so the new list must store the same original reference. Any pure structure would conserve references after id. filter as far as I know. Am I wrong?
2010/4/8 Dan Piponi
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case?
(It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.) -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Senior Developer, JetBrains
-- Eugene Kirpichov Senior Developer, JetBrains

2010/4/8 Eugene Kirpichov
I think Dan is talking about sharing the spine of the lists...
How about representing the lists using something along the lines of:
data List a = Nil | Leaf a | Cat (List a) (List a)
data Transformed a = Changed a | Unchanged a extract :: Transformed a -> a extract (Unchanged a) = a extract (Changed a') = a'
-- If the first argument returns Nothing, that means 'identity' mapShared :: (a -> Transformed a) -> List a -> List a mapShared f xs = extract (mapShared' f xs)
cat :: List a -> Transformed (List a) -> Transformed (List a) -> Transformed (List a) cat xs (Unchanged _) (Unchanged _) = Unchanged xs cat xs (Changed ys') (Unchanged zs) = Changed (Cat ys' zs) cat xs (Unchanged ys) (Changed zs') = Changed (Cat ys zs') cat xs (Changed ys') (Changed zs') = Changed (Cat ys' zs')
mapShared' :: (a -> Transformed a) -> List a -> Transformed (List a) mapShared' f xs@Nil = Unchanged xs mapShared' f xs@(Leaf a) = case f a of { Unchanged _ -> Unchanged xs ; Changed a' -> Changed (Leaf a') } mapShared' f xs@(Cat ys zs) = cat xs (mapShared' f ys) (mapShared' f zs)
filterShared :: (a -> Bool) -> List a -> List a filterShared p xs = original xs (filterShared' p xs) Aargh. I meant: filterShared p xs = extract (filterShared' p xs)
filterShared' :: (a -> Bool) -> List a -> Transformed (List a) filterShared' p xs@Nil = Unchanged xs filterShared' p xs@(Leaf x) = if p x then Unchanged xs else Changed Nil filterShared' p xs@(Cat ys zs) = cat xs (filterShared' p ys) (filterShared' p zs)
Perhaps this can be made into a monad or something like that, but I am not sure... Perhaps rebalancing (or generally using a finger tree) would also do well.
So, looks like we preserve whole 'subtrees' shared if they were not 'changed' by map or filter.
2010/4/8 Alberto G. Corona
: Id doesn´t have to create a copy of the original object ( I infer this from referential transparency) so the new list must store the same original reference. Any pure structure would conserve references after id. filter as far as I know. Am I wrong?
2010/4/8 Dan Piponi
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter. The map will be applying a function that's effectively the identity on most elements of the list, and filter will be using a function that usually gives True. This means that there is potential for a large amount of sharing between these lists that could be exploited, something that ordinary lists would do badly. Does anyone have a recommendation for a pure functional data structure for this case?
(It reminds me a bit of my own antidiagonal type, but that's not well adapted to the highly dynamic situation I'm describing.) -- Dan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Senior Developer, JetBrains
-- Eugene Kirpichov Senior Developer, JetBrains

Eugene Kirpichov wrote:
I think Dan is talking about sharing the spine of the lists...
How about representing the lists using something along the lines of:
data List a = Nil | Leaf a | Cat (List a) (List a)
data Transformed a = Changed a | Unchanged a [...]
cat :: List a -> Transformed (List a) -> Transformed (List a) -> Transformed (List a) cat xs (Unchanged _) (Unchanged _) = Unchanged xs cat xs (Changed ys') (Unchanged zs) = Changed (Cat ys' zs) cat xs (Unchanged ys) (Changed zs') = Changed (Cat ys zs') cat xs (Changed ys') (Changed zs') = Changed (Cat ys' zs')
mapShared' :: (a -> Transformed a) -> List a -> Transformed (List a) mapShared' f xs@Nil = Unchanged xs mapShared' f xs@(Leaf a) = case f a of { Unchanged _ -> Unchanged xs ; Changed a' -> Changed (Leaf a') } mapShared' f xs@(Cat ys zs) = cat xs (mapShared' f ys) (mapShared' f zs)
[...]
So, looks like we preserve whole 'subtrees' shared if they were not 'changed' by map or filter.
Yes, but do you actually gain in terms of space usage? Oh! It appears to me that sometimes you do, namely when the list was heavily shared before applying map and filter. But if it's used ephemerally, you don't gain anything. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com

2010/4/9 Heinrich Apfelmus
Eugene Kirpichov wrote:
I think Dan is talking about sharing the spine of the lists...
How about representing the lists using something along the lines of:
data List a = Nil | Leaf a | Cat (List a) (List a)
data Transformed a = Changed a | Unchanged a [...]
cat :: List a -> Transformed (List a) -> Transformed (List a) -> Transformed (List a) cat xs (Unchanged _) (Unchanged _) = Unchanged xs cat xs (Changed ys') (Unchanged zs) = Changed (Cat ys' zs) cat xs (Unchanged ys) (Changed zs') = Changed (Cat ys zs') cat xs (Changed ys') (Changed zs') = Changed (Cat ys' zs')
mapShared' :: (a -> Transformed a) -> List a -> Transformed (List a) mapShared' f xs@Nil = Unchanged xs mapShared' f xs@(Leaf a) = case f a of { Unchanged _ -> Unchanged xs ; Changed a' -> Changed (Leaf a') } mapShared' f xs@(Cat ys zs) = cat xs (mapShared' f ys) (mapShared' f zs)
[...]
So, looks like we preserve whole 'subtrees' shared if they were not 'changed' by map or filter.
Yes, but do you actually gain in terms of space usage?
Oh! It appears to me that sometimes you do, namely when the list was heavily shared before applying map and filter. But if it's used ephemerally, you don't gain anything.
Yes, in an ephemeral scenario (i.e. compute sum (mapShared (mapShared (filterShared .... ( xs) )))) we seemingly don't gain anything at all, but it is precisely the scenario where we don't need sharing :) We do gain something if the program's working set of live objects includes many results of mapping and filtering the same list at once: then their sublists will be shared.
Regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Senior Developer, JetBrains

Dan Piponi
I have a situation where I have a bunch of lists and I'll frequently be making new lists from the old ones by applying map and filter.
(While keeping the old ones around, I presume?) One option (or source of inspiration) might be lazy bytestrings, which are stored as lists of chunks, each chunk being a strict bytesting. A strict bytestring is a pointer to an array, an offset and a length. Thus, your initial lists could be contigous arrays, derivied list would be a sequence of chunks, each chunk either containing substrings of the original list or modified elements. So if f is id for positions 0..3 and 9..10, you could have: orig = Chunk (PS v 0 10) Empty map' f orig => Chunk (PS v 0 3) (Chunk (PS w 0 4) (Chunk (PS v 9 10) Empty)) Possibly vector generalizes this to datatypes beyond Word8, I haven't looked. -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (6)
-
Alberto G. Corona
-
Dan Piponi
-
Edward Kmett
-
Eugene Kirpichov
-
Heinrich Apfelmus
-
Ketil Malde