Re: [Haskell-cafe] strange stack overflow with Data.Map

On Thu, Dec 29, 2005 at 01:42:29PM +0100, Christian Maeder wrote:
David Roundy wrote:
stats elems = foldl add_elem Map.empty elems add_elem m x = Map.insertWith (+) x 1 m
main = print $ stats $ take 1000000 $ repeat 1
This program has a space leak and runs out of stack space. I'm guessing that I'm being bit here by an unnatural amount of laziness in Map.insertWith, but I can't see how to fix this. :( I'm sure it's something elementary...
try:
stats = foldl' add_elem Map.empty add_elem m x = myinsertWith (+) x 1 m
myinsertWith f k v m = case Map.lookup k m of Nothing -> Map.insert k v m Just w -> let r = f v w in r `seq` Map.insert k r m
Thanks, that does it! I had tried something like that, but without the foldl', and had tried foldl', but without the myinsertWith. The reason I didn't spend much time using this implementation is that for large maps, it will take twice as long as Map.insertWith, so I assumed (incorrectly) that Map.insertWith should be written so that it behaves in a non-leaky manner. Should the Map modules have an additional Map.insertWith' that behaves strictly, or might it be the case that you always want strict behavior when using insertWith? -- David Roundy http://www.darcs.net

On 12/29/05, David Roundy
On Thu, Dec 29, 2005 at 01:42:29PM +0100, Christian Maeder wrote:
stats elems = foldl add_elem Map.empty elems add_elem m x = Map.insertWith (+) x 1 m
main = print $ stats $ take 1000000 $ repeat 1
This program has a space leak and runs out of stack space. I'm guessing that I'm being bit here by an unnatural amount of laziness in Map.insertWith, but I can't see how to fix this. :( I'm sure it's something elementary...
David Roundy wrote: try:
stats = foldl' add_elem Map.empty add_elem m x = myinsertWith (+) x 1 m
myinsertWith f k v m = case Map.lookup k m of Nothing -> Map.insert k v m Just w -> let r = f v w in r `seq` Map.insert k r m
Thanks, that does it! I had tried something like that, but without the foldl', and had tried foldl', but without the myinsertWith. The reason I didn't spend much time using this implementation is that for large maps, it will take twice as long as Map.insertWith, so I assumed (incorrectly) that Map.insertWith should be written so that it behaves in a non-leaky manner.
I'm not sure I understand this ... Do you say that the leaky program is faster than the non-leaky one ?
Should the Map modules have an additional Map.insertWith' that behaves strictly, or might it be the case that you always want strict behavior when using insertWith?
IMO, there is always a possibility for needing the lazy version. I also have to say that making a strict version for each "...With" does not fill my heart with joy. :) Such cases call for cheapness analysis, or optimistic evaluation, or some other kind of solution implemented at the compiler level, IMHO. Cheers, JP.

On Thu, Dec 29, 2005 at 04:24:02PM +0100, Jean-Philippe Bernardy wrote:
On 12/29/05, David Roundy
wrote: On Thu, Dec 29, 2005 at 01:42:29PM +0100, Christian Maeder wrote:
stats elems = foldl add_elem Map.empty elems add_elem m x = Map.insertWith (+) x 1 m
main = print $ stats $ take 1000000 $ repeat 1
This program has a space leak and runs out of stack space. I'm guessing that I'm being bit here by an unnatural amount of laziness in Map.insertWith, but I can't see how to fix this. :( I'm sure it's something elementary...
David Roundy wrote: try:
stats = foldl' add_elem Map.empty add_elem m x = myinsertWith (+) x 1 m
myinsertWith f k v m = case Map.lookup k m of Nothing -> Map.insert k v m Just w -> let r = f v w in r `seq` Map.insert k r m
Thanks, that does it! I had tried something like that, but without the foldl', and had tried foldl', but without the myinsertWith. The reason I didn't spend much time using this implementation is that for large maps, it will take twice as long as Map.insertWith, so I assumed (incorrectly) that Map.insertWith should be written so that it behaves in a non-leaky manner.
I'm not sure I understand this ... Do you say that the leaky program is faster than the non-leaky one ?
No, what I mean is that Map.insertWith only traverses the Map once, while myinsertWith has to traverse it twice, so for a large map, each call to myinsertWith will take twice as long as Map.insertWith. Or to put it a different way, if myinsertWith were part of the Map module, it could be twice as fast.
Should the Map modules have an additional Map.insertWith' that behaves strictly, or might it be the case that you always want strict behavior when using insertWith?
IMO, there is always a possibility for needing the lazy version. I also have to say that making a strict version for each "...With" does not fill my heart with joy. :) Such cases call for cheapness analysis, or optimistic evaluation, or some other kind of solution implemented at the compiler level, IMHO.
I agree, there is a possibility for wanting the lazy version. If you only want to provite one insertWith, I suppose you'd want to figure out whether the lazy or strict version is more commonly needed (or "safer"), and make people wanting the other version implement it themselves. In my opinion, the strict version seems nicer, largely because HNF is cheap for most "large" computations I do, so strictness wouldn't cost much. On the subject of excessive functions, what are the uses of insertWithKey? It seems like anything you do with insertWithKey you could just as efficiently do with insertWith... it seems pretty trivial to write insertWithKey f k x = insertWith (f k) k x There may be a use to all the WithKey variants, but I'd much rather have strict varients... -- David Roundy http://www.darcs.net

On 12/29/05, David Roundy
On the subject of excessive functions, what are the uses of insertWithKey? It seems like anything you do with insertWithKey you could just as efficiently do with insertWith... it seems pretty trivial to write
insertWithKey f k x = insertWith (f k) k x
This is the case if the key type has structural equality. Otherwise, the key passed to the combining function, which comes from the map, can have a different value. Not that I support this style of coding, but dropping the withKey functions cannot be done so lightly.
There may be a use to all the WithKey variants, but I'd much rather have strict varients...
It's a change to consider. We might want to deprecate the withKey variants (and implicitly the non-structural-equalilty types for keys). The rationale behind this is that the components of the map are already separated in to Key and Value, so the chosen key have no reason to carry extra information: it's what the value is for. How do you people feel about that? Cheers, JP.

Jean-Philippe Bernardy wrote:
insertWithKey f k x = insertWith (f k) k x
This is the case if the key type has structural equality. Otherwise, the key passed to the combining function, which comes from the map, can have a different value. Not that I support this style of coding,
I don't think, that we need to consider old and new keys (different but yielding EQ).
It's a change to consider. We might want to deprecate the withKey variants (and implicitly the non-structural-equalilty types for keys).
At least {fold,map,filter,partition}WithKey functions are (I think) more useful than {insert,adjust,update}WithKey, so there will be no general rule for withKey-variants. (I've never used {union,difference,intersection]WithKey, how important are these?) I would not impose an implicit requirement for strong structural equality/order on the keys. But it would suffice to document which of two "equal" keys is dropped or kept. Maybe it is even possible to keep "Set a" somehow in sync with "Map a ()" or an identity "Map a a". (And "Set (MapEntryPair a b)" to "Map a b") On the other hand, if we would forget about biasing also for sets than insertion could be optimized to do nothing with sets that contain already the element. (Maybe that should even be changed so in Data.Set for efficiency reasons. An extra function might do the job of actually replacing an equal element.) Christian P.S. left-biasing for Set.intersection still needs to be checked in (I've sent a patch some time ago)

On Friday 30 Dec 2005 10:33 am, Christian Maeder wrote:
I would not impose an implicit requirement for strong structural equality/order on the keys. But it would suffice to document which of two "equal" keys is dropped or kept. Maybe it is even possible to keep "Set a" somehow in sync with "Map a ()" or an identity "Map a a". (And "Set (MapEntryPair a b)" to "Map a b")
On the other hand, if we would forget about biasing also for sets than insertion could be optimized to do nothing with sets that contain already the element. (Maybe that should even be changed so in Data.Set for efficiency reasons. An extra function might do the job of actually replacing an equal element.)
I think it's going to be very difficult to properly address all these issues without creating a huge API with many different flavours of essentially the same functions. I agonised over considerations like this for Data.Tree.AVL. Eventually decided it was a hopeless task and deleted all abstracted Map/Set related functions and instead concentrated on providing just a raw API for AVL trees (only) that gave users the freedom to make their own decisions about this kind of thing (primarily by requiring them to supply the appropriate "combining comparison" as an argument). Even so, the AVL API so still somewhat big for a single data structure library (but I don't think there's much redundancy there). Whilst this made life somewhat simpler for me, it still leaves the question of what an API that is simple, abstracted and time/space efficient should look like. Perhaps these are contradictory goals and it's just not possible to produce such thing. We'll have to see what Jean-Philippe and anybody else who wants to contribute come up with. So best of luck :-) As for the structural equality issue, I have always assumed that if (==) returned True or `compare` returned EQ this implied structural equality. But I'm not sure that's documented anywhere, and as it happens it's not true for the Eq and Ord instances defined AVL trees. This is something that had been worrying me (maybe I should remove these instances). Regards -- Adrian Hey

On 30/12/05, Adrian Hey
As for the structural equality issue, I have always assumed that if (==) returned True or `compare` returned EQ this implied structural equality. But I'm not sure that's documented anywhere, and as it happens it's not true for the Eq and Ord instances defined AVL trees. This is something that had been worrying me (maybe I should remove these instances).
This isn't even quite true for all the prelude types. In particular -0.0 and 0.0 are distinct Float values structurally, but will compare equal, though one might make exceptions here due to floating point values being thought of as some approximation of real numbers where the laws are satisfied. However, I think it's safe to only require that (==) be a suitable equivalence relation on the values of a type. This is especially true when the representation is hidden, so that no stronger tests for comparison could be constructed by the library user. It seems like it would usually be handy to have instances of Eq not differentiate between values which "represent the same thing" in different ways. Earlyish versions of Miranda had 'laws' which would be applied automatically to normalise values of a particular type, so as to get a type which wasn't freely generated by its constructors. They were quite strong, and one could easily enforce a variety of invariants, like keeping lists sorted and trees balanced. At some point they were removed, but I haven't seen a really good explanation as to why this happened. I suppose that there are other ways to achieve those effects, but it's neat to be able to use pattern matching on what would otherwise have to be an abstract data type. - Cale

On Friday 30 Dec 2005 10:33 am, Christian Maeder wrote:
P.S. left-biasing for Set.intersection still needs to be checked in (I've sent a patch some time ago)
Perhaps I'm missing something, but union seems to have the same problem. Maybe someone should review all this code to see if implementations are consistent with the Haddock (preferably someone who understands more about how it works than I do :-). Regards -- Adrian Hey

Yes, union suffers from the same problem; both in Set and Map as well.
The testing code can now detect problems of left-biasing in presence
of non-structural equality. (That's what I wanted to implement before
applying the fix).
On 12/31/05, Adrian Hey
On Friday 30 Dec 2005 10:33 am, Christian Maeder wrote:
P.S. left-biasing for Set.intersection still needs to be checked in (I've sent a patch some time ago)
Perhaps I'm missing something, but union seems to have the same problem. Maybe someone should review all this code to see if implementations are consistent with the Haddock (preferably someone who understands more about how it works than I do :-).
Regards -- Adrian Hey

On Sunday 01 Jan 2006 3:28 pm, Jean-Philippe Bernardy wrote:
Yes, union suffers from the same problem; both in Set and Map as well. The testing code can now detect problems of left-biasing in presence of non-structural equality. (That's what I wanted to implement before applying the fix).
Actually I was wondering whether or not the term "left-biasing" was particularly useful. It could be misleading for some functions. For example.. -- | /O(n*log n)/. Create a set from a list of elements. fromList :: Ord a => [a] -> Set a fromList xs = foldlStrict ins empty xs where ins t x = insert x t My interpretation of left biasing would be that if the input list contains multiple occurences of "equal" values then the first occurence is the one which is inserted in the Set and the rest are discarded. But AFAICS the this is not the case with the current Data.Set implementation (or with the clone I just produced). Both are right biased (at least according to my intuitive understanding of the term). I think to get left biasing we need to either.. Use foldr (instead of foldl) with insert as it is. or.. Use foldl but with a "right biased" insert function (I.E. One which prefers to retain existing elements). BTW, I think I would prefer the default behaviour of insert to be to retain existing elements because this allows the optimisation of not updating the data structure representing the set at all (which should help keep heap burn rate down). But I guess this is in effect assuming equality implies structural equality so others may not like it. Sigh.. <whine> I really wish this whole issue of how to properly deal with "things that are equal but may still be different somehow" would just go away :-) It's hard to be precise about what choice you've made, and whatever choice you do make usually seems arbitrary. Even when it doesn't make any semantic difference, the choice you make may well have an observable effect on space and time behavior. </whine> In the toy FPL I implemented a while ago I actually had combining comparison hardwired into the rts. This tested for structural equality and if two values (acyclic graphs) were equal it replaced the "new" one with an indirection node pointing to the "old" one (I could tell their relative age by the addresses). But I still can't quite decide if this was an elegant solution or an evil hack :-) Regards -- Adrian Hey

On Mon, Jan 02, 2006 at 05:28:39PM +0000, Adrian Hey wrote:
BTW, I think I would prefer the default behaviour of insert to be to retain existing elements because this allows the optimisation of not updating the data structure representing the set at all (which should help keep heap burn rate down).
I hope you don't propose this for Map. It would break some nice uses, like representing variable scopes in an interpreter. The "replacing" insert allows you to easily implement hiding variables in outer scopes. Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

On Monday 02 Jan 2006 7:42 pm, Tomasz Zielonka wrote:
On Mon, Jan 02, 2006 at 05:28:39PM +0000, Adrian Hey wrote:
BTW, I think I would prefer the default behaviour of insert to be to retain existing elements because this allows the optimisation of not updating the data structure representing the set at all (which should help keep heap burn rate down).
I hope you don't propose this for Map. It would break some nice uses, like representing variable scopes in an interpreter. The "replacing" insert allows you to easily implement hiding variables in outer scopes.
You mean am I proposing that Maps should retain old association values and discard the new ones? If so the answer is no, of course not. But I see no reason why they shouldn't do this with keys, which are "equal" according to the instance of Ord. (If this is any way dangerous then the key type should not be an instance of Ord IMO.) Regards -- Adrian Hey

Adrian Hey wrote:
On Monday 02 Jan 2006 7:42 pm, Tomasz Zielonka wrote:
I hope you don't propose this for Map. It would break some nice uses, like representing variable scopes in an interpreter. The "replacing" insert allows you to easily implement hiding variables in outer scopes.
You mean am I proposing that Maps should retain old association values and discard the new ones? If so the answer is no, of course not.
Actually, for maps it makes (even more) sense to have an additional "insert-only-if-not-member" function, and be it only for (defining) "fromList". Maybe such a function should simply be called "add" to encourage its use. {- | insert pair only if key is not a member yet (left-biased) leave map unchanged otherwise. -} add :: Ord a => Map a b -> a -> b -> Map a b Properties: insert k v m = add (delete k m) k v add m k v = if member k m then m else insert k v m (On sets "add" and "insert" could be used interchangeably) For maps I could also imagine that it is possible to supply a more efficient implementation based on arrays if pairs are inserted at most once (by only "adding" and never deleting). (To really avoid array duplication then, the changing key sets would need to be kept separately apart from further requirements on the keys for array access) Cheers Christian

Christian Maeder writes:
Adrian Hey wrote:
You mean am I proposing that Maps should retain old association values and discard the new ones? If so the answer is no, of course not.
Actually, for maps it makes (even more) sense to have an additional "insert-only-if-not-member" function, and be it only for (defining) "fromList". Maybe such a function should simply be called "add" to encourage its use.
{- | insert pair only if key is not a member yet (left-biased) leave map unchanged otherwise. -} add :: Ord a => Map a b -> a -> b -> Map a b
Properties:
insert k v m = add (delete k m) k v
add m k v = if member k m then m else insert k v m
Or, add m k v = insertWith (\_ oldV -> oldV) k v m
--
David Menendez

On Tue, Jan 03, 2006 at 12:03:06AM +0000, Adrian Hey wrote:
On Monday 02 Jan 2006 7:42 pm, Tomasz Zielonka wrote:
I hope you don't propose this for Map. It would break some nice uses, like representing variable scopes in an interpreter. The "replacing" insert allows you to easily implement hiding variables in outer scopes.
You mean am I proposing that Maps should retain old association values and discard the new ones? If so the answer is no, of course not.
Good. I was pretty sure that you are not proposing it, but I wanted to be sure.
But I see no reason why they shouldn't do this with keys, which are "equal" according to the instance of Ord. (If this is any way dangerous then the key type should not be an instance of Ord IMO.)
No objections here. Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

Jean-Philippe Bernardy wrote:
Yes, union suffers from the same problem; both in Set and Map as well.
In Data.Map "left-biasing" is correct wrt the values (but if "Map a ()" should correspond to "Set a" then the keys need also be considered.)
The testing code can now detect problems of left-biasing in presence of non-structural equality. (That's what I wanted to implement before applying the fix).
Good! The problem in Data.Set seems to be limited to insert, union, unions, intersection, fromList and fromAscList. I find only union and intersection more problematic because biasing depends on the size and is thus hard to predict (if needed). Currently, Set.fromList is right-biased (and the only function that uses Set.insert). How about an additional left-biased function of type "Set a -> a -> Set a"? What name for it? C.

Adrian Hey wrote:
On Friday 30 Dec 2005 10:33 am, Christian Maeder wrote:
P.S. left-biasing for Set.intersection still needs to be checked in (I've sent a patch some time ago)
Perhaps I'm missing something, but union seems to have the same problem. Maybe someone should review all this code to see if implementations are consistent with the Haddock (preferably someone who understands more about how it works than I do :-).
Yes, you're right! union is not left-biased although that is explicitly documented so in the header. Either we give up the control of biasing or fix union, too. (I've found no other function were biasing would matter, unions will be ok if union is.) I could supply a patch for union if needed. Cheers Christian

David Roundy wrote:
Should the Map modules have an additional Map.insertWith' that behaves strictly, or might it be the case that you always want strict behavior when using insertWith?
I think so. Once strictness is lost, there's nothing the user of a library could do about it. If a container is too strict however, lazyness can be recovered by wrapping values in an additional data type. So strict variants of updating functions would be a big win, and if two versions of every function are deemed too much of a burden, I'd rather get rid of the lazy version. Udo. -- Ours is a world where people don't know what they want and are willing to go through hell to get it. -- Don Marquis

On 12/29/05, Udo Stenzel
and if two versions of every function are deemed too much of a burden, I'd rather get rid of the lazy version.
Huh, then why use haskell, there are strict purely functional languages? /snarky philosophical comment -David Mercer Tucson, AZ

Laziness and strictness are both important depending on the situation.
I'd recommend keeping both variants around. Having to wrap values in
an extra data type just to keep laziness sort of defeats the point of
Haskell being lazy in the first place. It also makes it somewhat
awkward to use in the cases where laziness is important, which is, I
suspect, quite a lot of the time where the codomain of the Map is a
recursive datatype.
Having lazy and strict variants of data structure libraries (i.e.
splitting the modules into *.Lazy and *.Strict) seems like a good idea
though, but I can imagine some cases arising where it would be
convenient to swap between the two, so it would be good to keep the
type the same anyway. The parent module would then reexport one of the
variants. (Obviously the lazy one, since this is Haskell, right? ;)
- Cale
On 29/12/05, Udo Stenzel
David Roundy wrote:
Should the Map modules have an additional Map.insertWith' that behaves strictly, or might it be the case that you always want strict behavior when using insertWith?
I think so. Once strictness is lost, there's nothing the user of a library could do about it. If a container is too strict however, lazyness can be recovered by wrapping values in an additional data type. So strict variants of updating functions would be a big win, and if two versions of every function are deemed too much of a burden, I'd rather get rid of the lazy version.
Udo. -- Ours is a world where people don't know what they want and are willing to go through hell to get it. -- Don Marquis
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFDtAMdc1ZCC9bsOpURAs7ZAJ9pS/L7pf9tpRoTGi3HDR0gyindOQCfakED Xdpm15vvyqF51QAmCJeCm1U= =tZhs -----END PGP SIGNATURE-----
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (9)
-
Adrian Hey
-
Cale Gibbard
-
Christian Maeder
-
David Menendez
-
David Mercer
-
David Roundy
-
Jean-Philippe Bernardy
-
Tomasz Zielonka
-
Udo Stenzel