Re: Libraries Digest, Vol 93, Issue 21

Message: 3 Date: Sun, 22 May 2011 06:52:49 -0400 From: "Edward Z. Yang"
Subject: Re: Proposal: Value strict versions of containers To: Milan Straka Cc: libraries Message-ID: <1306061483-sup-1713@ezyang> Content-Type: text/plain; charset=UTF-8 Excerpts from Milan Straka's message of Sun May 22 06:50:07 -0400 2011:
- the types Data.IntMap.IntMap, Data.IntMap.Lazy.IntMap and Data.IntMap.Strict.IntMap should be equal -- so I can use strict methods on IntMap someone else created with lazy methods. That is simple in absence of type classes.
Comment: This would mean you would get no static assurances against mixing up a strict IntMap with a lazy IntMap; for all intents and purposes, this is equivalent to implementing strict versions of all functions on top of a lazy data type.
Another consequence to this design is that, should someone want to have
class MapC m key value | m -> key, m -> value where
it becomes impossible to directly write
instance MapC (Data.IntMap.Strict.IntMap Int a) Int a where
instance MapC (Data.IntMap.Lazy.IntMap Int a) Int a where
because the two instances are the same. Regardless, +1 for Milan's proposal either with one type or separate types for strict and lazy. John Lato

On Mon, May 23, 2011 at 5:13 AM, John Lato
Message: 3 Date: Sun, 22 May 2011 06:52:49 -0400 From: "Edward Z. Yang"
Subject: Re: Proposal: Value strict versions of containers To: Milan Straka Cc: libraries Message-ID: <1306061483-sup-1713@ezyang> Content-Type: text/plain; charset=UTF-8 Excerpts from Milan Straka's message of Sun May 22 06:50:07 -0400 2011:
- the types Data.IntMap.IntMap, Data.IntMap.Lazy.IntMap and Data.IntMap.Strict.IntMap should be equal -- so I can use strict methods on IntMap someone else created with lazy methods. That is simple in absence of type classes.
Comment: This would mean you would get no static assurances against mixing up a strict IntMap with a lazy IntMap; for all intents and purposes, this is equivalent to implementing strict versions of all functions on top of a lazy data type.
Another consequence to this design is that, should someone want to have
class MapC m key value | m -> key, m -> value where
it becomes impossible to directly write
instance MapC (Data.IntMap.Strict.IntMap Int a) Int a where
instance MapC (Data.IntMap.Lazy.IntMap Int a) Int a where
because the two instances are the same.
Regardless, +1 for Milan's proposal either with one type or separate types for strict and lazy.
I'm perfectly fine with adding a separate module that exports the strict versions of the functions. The main problem with the separate versions is that a strict value container is a valid instance of almost no interesting classes. It violates the Functor laws, the Foldable laws, and Traversable laws, because of its treatment of bottoms. I would be +1 for the single type version, but I would have serious misgivings about a version with two types as the temptation to add those bogus instances would be strong. On the other hand, a case where it would seem to make sense to have a version with two types would be were you make two versions, split on whether or not they are both spine-strict or not, rather than if they are value-strict. In that case, both are valid instances of Functor, Foldable, Traversable, but the spine-lazy version will tend to be pretty slow due to the fact that most of its functions will have to keep dispatching via indirect jumps as they traverse down the spine. -Edward

On Tue, May 31, 2011 at 2:08 AM, Edward Kmett
On Mon, May 23, 2011 at 5:13 AM, John Lato
wrote: Message: 3 Date: Sun, 22 May 2011 06:52:49 -0400 From: "Edward Z. Yang"
Subject: Re: Proposal: Value strict versions of containers To: Milan Straka Cc: libraries Message-ID: <1306061483-sup-1713@ezyang> Content-Type: text/plain; charset=UTF-8 Excerpts from Milan Straka's message of Sun May 22 06:50:07 -0400 2011:
- the types Data.IntMap.IntMap, Data.IntMap.Lazy.IntMap and Data.IntMap.Strict.IntMap should be equal -- so I can use strict methods on IntMap someone else created with lazy methods. That is simple in absence of type classes.
Comment: This would mean you would get no static assurances against mixing up a strict IntMap with a lazy IntMap; for all intents and purposes, this is equivalent to implementing strict versions of all functions on top of a lazy data type.
Another consequence to this design is that, should someone want to have
class MapC m key value | m -> key, m -> value where
it becomes impossible to directly write
instance MapC (Data.IntMap.Strict.IntMap Int a) Int a where
instance MapC (Data.IntMap.Lazy.IntMap Int a) Int a where
because the two instances are the same.
Regardless, +1 for Milan's proposal either with one type or separate types for strict and lazy.
I'm perfectly fine with adding a separate module that exports the strict versions of the functions.
The main problem with the separate versions is that a strict value container is a valid instance of almost no interesting classes.
It violates the Functor laws, the Foldable laws, and Traversable laws, because of its treatment of bottoms. I would be +1 for the single type version, but I would have serious misgivings about a version with two types as the temptation to add those bogus instances would be strong.
Programmers using strict containers need to make their own proofs for bottom anyway, so I don't see it as particularly onerous to do so for these instances provided that they're correct for non-bottom values and that it's documented. With a strict container, I'd rather have a Functor that doesn't handle bottoms properly than no Functor at all. And with a single Map with strict and lazy interfaces, then wouldn't there properly be no strict Functor at all? If the base map is strict, then the Functor instance would have to go through boxed values to handle bottom properly. If the base map is lazy, the Functor instance would be lazy too, which would negate the benefits of a strict interface. Is this correct? John Lato

On Tue, May 31, 2011 at 6:03 AM, John Lato
On Tue, May 31, 2011 at 2:08 AM, Edward Kmett
wrote: It violates the Functor laws, the Foldable laws, and Traversable laws, because of its treatment of bottoms. I would be +1 for the single type version, but I would have serious misgivings about a version with two types as the temptation to add those bogus instances would be strong.
With a strict container, I'd rather have a Functor that doesn't handle
bottoms properly than no Functor at all.
This is where our fundamental disagreement lies. I find such an instance abhorrent, because now I am forced to second guess whether or not I have a real Functor instance everywhere. Your local strictness consideration infects all the third party code that just needed functoriality. And with a single Map with strict and lazy interfaces, then wouldn't there
properly be no strict Functor at all? If the base map is strict, then the Functor instance would have to go through boxed values to handle bottom properly.
I would argue that the Functor instance would have to be the non-strict version, because the other violates the laws. This is why I advocated in favor of the single-data-type two-module approach. I don't really like polymorphic value-strict containers because their benefits are very fragile. You lose the benefit if you store tuples in them or any other composite data type else that risks capturing a closure. This is why I advocated for a single data type with two modules, as is used by Data.HashMap. In general, I find that putting a ! annotation on a polymorphic type is a bad idea. It doesn't ensure that it is fully evaluated, merely that it is in whnf. The reason I don't object to the ! annotation on the key in Data.Map and Data.Set is the very limited justification that the compare function is required to be able to do something with the key, so it'll need to do look at it somehow (discounting the () case). Using a value-strict map underneath strikes me as a bad solution, because the non-strict case requires an extra layer of indirection, ensuring crappy performance for the lazy version. While both versions can perform well when there is no 'underlying container' involved, just one type. Using strict methods to plug leaks is fine, but it really is the exception rather than the rule and so the feature should "pay its own way" -- to borrow a phrase from Kent Dybvig -- rather than impose a tax on every other use case. The reason I don't like the two-data-types solution, is that I really don't think that in addition to making up pseudo-Functor/Foldable/Traversable instances that I need to second guess in any code that uses them, it also eliminates the ability to fix up leaks on a case by case basis. If you use some 'underlying value-strict' container, then I can't just switch back and forth, instead I must map a Boxed type through, and wrap it in a newtype wrapper to call a single lazy method, or unwrap, and map the unboxing method through everywhere in order to force a single value. Both of these strike me as remarkably poor solutions. With the two module, one type approach, I can happily use the default Map type, and add a ' here or there (or use a qualified Strict.foo instead of foo) to fix leaky behavior.
If the base map is lazy, the Functor instance would be lazy too, which would negate the benefits of a strict interface. Is this correct?
Yes it would be lazy, but I don't see how it negates any benefits from your strict API. Just expose a Strict.map. -Edward

On Tue, May 31, 2011 at 6:16 PM, Edward Kmett
On Tue, May 31, 2011 at 6:03 AM, John Lato
wrote: On Tue, May 31, 2011 at 2:08 AM, Edward Kmett
wrote: It violates the Functor laws, the Foldable laws, and Traversable laws, because of its treatment of bottoms. I would be +1 for the single type version, but I would have serious misgivings about a version with two types as the temptation to add those bogus instances would be strong.
With a strict container, I'd rather have a Functor that doesn't handle
bottoms properly than no Functor at all.
This is where our fundamental disagreement lies. I find such an instance abhorrent, because now I am forced to second guess whether or not I have a real Functor instance everywhere. Your local strictness consideration infects all the third party code that just needed functoriality.
I hope we can agree to disagree on this, although I would like to state that I'm generally in favor of law-abiding instances. Except for the particular case of bottoms and strict types.
And with a single Map with strict and lazy interfaces, then wouldn't there
properly be no strict Functor at all? If the base map is strict, then the Functor instance would have to go through boxed values to handle bottom properly.
I would argue that the Functor instance would have to be the non-strict version, because the other violates the laws. This is why I advocated in favor of the single-data-type two-module approach.
Yes, that's what I meant.
I don't really like polymorphic value-strict containers because their benefits are very fragile. You lose the benefit if you store tuples in them or any other composite data type else that risks capturing a closure. This is why I advocated for a single data type with two modules, as is used by Data.HashMap. In general, I find that putting a ! annotation on a polymorphic type is a bad idea. It doesn't ensure that it is fully evaluated, merely that it is in whnf. The reason I don't object to the ! annotation on the key in Data.Map and Data.Set is the very limited justification that the compare function is required to be able to do something with the key, so it'll need to do look at it somehow (discounting the () case).
Using a value-strict map underneath strikes me as a bad solution, because the non-strict case requires an extra layer of indirection, ensuring crappy performance for the lazy version. While both versions can perform well when there is no 'underlying container' involved, just one type. Using strict methods to plug leaks is fine, but it really is the exception rather than the rule and so the feature should "pay its own way" -- to borrow a phrase from Kent Dybvig -- rather than impose a tax on every other use case.
The reason I don't like the two-data-types solution, is that I really don't think that in addition to making up pseudo-Functor/Foldable/Traversable instances that I need to second guess in any code that uses them, it also eliminates the ability to fix up leaks on a case by case basis.
I don't like the two-data-types solution because it's duplication and is much more work to maintain. Not that I'll be maintaining it, but I still find it ugly.
If you use some 'underlying value-strict' container, then I can't just switch back and forth, instead I must map a Boxed type through, and wrap it in a newtype wrapper to call a single lazy method, or unwrap, and map the unboxing method through everywhere in order to force a single value. Both of these strike me as remarkably poor solutions.
I'm not backing any particular approach, I was simply trying to explore the ramifications of different designs which have been proposed. I think most people have expressed a preference for the Data.HashMap style of implementation now anyway. Personally, what I would like to see are hooks for reduction strategies. I often want this for tuples.
With the two module, one type approach, I can happily use the default Map type, and add a ' here or there (or use a qualified Strict.foo instead of foo) to fix leaky behavior.
If the base map is lazy, the Functor instance would be lazy too, which would negate the benefits of a strict interface. Is this correct?
Yes it would be lazy, but I don't see how it negates any benefits from your strict API. Just expose a Strict.map.
That only works if you use Strict.map instead of fmap. If you're using a third-party algorithm, that may not be an option. John
participants (2)
-
Edward Kmett
-
John Lato