Faster `elem` "Hack" for Data.Set?

In a recent "Folding the unfoldable" "gist": https://oleg.fi/gists/posts/2022-01-25-folding-unfoldable.html Oleg describes a way to make unboxed Vectors Foldable: There is another way to make Foldable work, with a data Hack a b where Hack :: U.Vector a -> Hack a a This is a two type-parameter wrapper, but the types are always the same! (I wish that could be a newtype). The Foldable instance is simply: instance U.Unbox a => Foldable (Hack a) where foldr f z (Hack v) = U.foldr f z v foldl' f z (Hack v) = U.foldl' f z v In the associated Reddit thread https://www.reddit.com/r/haskell/comments/sd6gel/comment/hudmsis/?utm_source=share&utm_medium=web2x&context=3 it was observed that a similar approach can be used to give Data.Set a performant `elem` method (code copied below my signature). It is somewhat tempting to consider whether the "Hack" ought to be built-in directly into the real Data.Set: https://www.reddit.com/r/haskell/comments/sd6gel/comment/huil070/?utm_source=share&utm_medium=web2x&context=3 type Set a = SetImpl a a data SetImpl a b where Bin :: {-# UNPACK #-} !Size -> !a -> !(SetImpl a) -> !(SetImpl a) -> SetImpl a a Tip :: SetImpl a a type Size = Int instance Ord a => Foldable (SetImpl a) where ... elem _ Tip = False elem x (Bin _ y l r) = case compare x y of EQ -> True LT -> elem x l GT -> elem x r ... This representation does not seem to carry any obvious runtime overhead, and gives Set a performant `elem` method. The only change from the status quo would be that folds would not be available for empty or singleton sets with non-Ord elements (currently possible with Set). How much of a loss would it be to constrain the element type of empty and singleton sets used in folds? Are there other issue that make the above impractical? -- Viktor. type OSet a = OrdSet a a data OrdSet a b where OSet :: Set.Set a -> OrdSet a a instance Ord a => Foldable (OrdSet a) where fold (OSet xs) = fold xs foldMap f (OSet xs) = foldMap f xs foldMap' f (OSet xs) = foldMap' f xs foldr f z (OSet xs) = foldr f z xs foldr' f z (OSet xs) = foldr' f z xs foldl f z (OSet xs) = foldl f z xs foldl' f z (OSet xs) = foldl' f z xs foldr1 f (OSet xs) = foldr1 f xs foldl1 f (OSet xs) = foldl1 f xs toList (OSet xs) = toList xs null (OSet xs) = null xs length (OSet xs) = length xs -- The point of the exercise is `elem` elem e (OSet xs) = Set.member e xs maximum (OSet xs) = maximum xs minimum (OSet xs) = minimum xs sum (OSet xs) = sum xs product (OSet xs) = product xs instance Eq a => Eq (OSet a) where (OSet xs) == (OSet ys) = xs == ys instance (Eq a, Ord a) => Ord (OSet a) where compare (OSet xs) (OSet ys) = compare xs ys empty :: Ord a => OSet a empty = OSet Set.empty singleton :: Ord a => a -> OSet a singleton = OSet . Set.singleton fromList :: Ord a => [a] -> OSet a fromList = OSet . Set.fromList (\\) :: Ord a => OSet a -> OSet a -> OSet a (OSet xs) \\ (OSet ys) = OSet (xs Set.\\ ys) alterF :: (Ord a, Functor f) => (Bool -> f Bool) -> a -> OSet a -> f (OSet a) alterF f x (OSet xs) = fmap OSet $ Set.alterF f x xs ...

The tighter `Foldable` instance probably wouldn't hurt too much in
practice. I have some other concerns, however. One is that
`containers` has a long history of attempting to remain compatible
with potential future Haskell 98-like Haskell implementations (likely
with a few days of work to patch up mistakes when the time comes).
Using a GADT to define Set would break that quite thoroughly. My
second concern is more practical:
type Set a = SetImpl a a
is all sorts of problematic. It forces anyone who wants to use what's
always been the `Set` type constructor (Set :: Type -> Type) to dig
into the implementation and use `SetImpl :: Type -> Type -> Type`
instead. That's quite a different beast. I don't know all the things
that will break, but they'll surely include
https://hackage.haskell.org/package/constrained-monads-0.5.0.0/docs/Control-...
. I think the real answer is to remove `elem` from `Foldable` and put
it somewhere more appropriate.
On Sat, Jan 29, 2022 at 7:26 PM Viktor Dukhovni
In a recent "Folding the unfoldable" "gist":
https://oleg.fi/gists/posts/2022-01-25-folding-unfoldable.html
Oleg describes a way to make unboxed Vectors Foldable:
There is another way to make Foldable work, with a
data Hack a b where Hack :: U.Vector a -> Hack a a
This is a two type-parameter wrapper, but the types are always the same! (I wish that could be a newtype). The Foldable instance is simply:
instance U.Unbox a => Foldable (Hack a) where foldr f z (Hack v) = U.foldr f z v foldl' f z (Hack v) = U.foldl' f z v
In the associated Reddit thread
it was observed that a similar approach can be used to give Data.Set a performant `elem` method (code copied below my signature).
It is somewhat tempting to consider whether the "Hack" ought to be built-in directly into the real Data.Set:
type Set a = SetImpl a a data SetImpl a b where Bin :: {-# UNPACK #-} !Size -> !a -> !(SetImpl a) -> !(SetImpl a) -> SetImpl a a Tip :: SetImpl a a type Size = Int
instance Ord a => Foldable (SetImpl a) where ... elem _ Tip = False elem x (Bin _ y l r) = case compare x y of EQ -> True LT -> elem x l GT -> elem x r ...
This representation does not seem to carry any obvious runtime overhead, and gives Set a performant `elem` method. The only change from the status quo would be that folds would not be available for empty or singleton sets with non-Ord elements (currently possible with Set).
How much of a loss would it be to constrain the element type of empty and singleton sets used in folds? Are there other issue that make the above impractical?
-- Viktor.
type OSet a = OrdSet a a data OrdSet a b where OSet :: Set.Set a -> OrdSet a a
instance Ord a => Foldable (OrdSet a) where fold (OSet xs) = fold xs foldMap f (OSet xs) = foldMap f xs foldMap' f (OSet xs) = foldMap' f xs foldr f z (OSet xs) = foldr f z xs foldr' f z (OSet xs) = foldr' f z xs foldl f z (OSet xs) = foldl f z xs foldl' f z (OSet xs) = foldl' f z xs foldr1 f (OSet xs) = foldr1 f xs foldl1 f (OSet xs) = foldl1 f xs toList (OSet xs) = toList xs null (OSet xs) = null xs length (OSet xs) = length xs -- The point of the exercise is `elem` elem e (OSet xs) = Set.member e xs maximum (OSet xs) = maximum xs minimum (OSet xs) = minimum xs sum (OSet xs) = sum xs product (OSet xs) = product xs
instance Eq a => Eq (OSet a) where (OSet xs) == (OSet ys) = xs == ys
instance (Eq a, Ord a) => Ord (OSet a) where compare (OSet xs) (OSet ys) = compare xs ys
empty :: Ord a => OSet a empty = OSet Set.empty
singleton :: Ord a => a -> OSet a singleton = OSet . Set.singleton
fromList :: Ord a => [a] -> OSet a fromList = OSet . Set.fromList
(\\) :: Ord a => OSet a -> OSet a -> OSet a (OSet xs) \\ (OSet ys) = OSet (xs Set.\\ ys)
alterF :: (Ord a, Functor f) => (Bool -> f Bool) -> a -> OSet a -> f (OSet a) alterF f x (OSet xs) = fmap OSet $ Set.alterF f x xs
... _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On 1/30/22 9:59 AM, David Feuer wrote: [snip]
. I think the real answer is to remove `elem` from `Foldable` and put it somewhere more appropriate.
In a separate class? That would only help `elem` but not any other methods of Foldable. I'd rather love to see MonoFoldable[1] to be moved to base then. [1]: https://hackage.haskell.org/package/mono-traversable-1.0.15.3/docs/Data-Mono...

MonoFoldable isn't the answer either. That also has the problem of
requiring the same constraint for folding as for membership tests.
On Sun, Jan 30, 2022, 12:12 AM PHO
On 1/30/22 9:59 AM, David Feuer wrote: [snip]
. I think the real answer is to remove `elem` from `Foldable` and put it somewhere more appropriate.
In a separate class? That would only help `elem` but not any other methods of Foldable. I'd rather love to see MonoFoldable[1] to be moved to base then.
[1]:
https://hackage.haskell.org/package/mono-traversable-1.0.15.3/docs/Data-Mono... _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

I haven't found a downside to requiring that the elements of sets be `Ord` to be `MonoFoldable`. Yes, the dictionary is ignored by all the generic methods, but you still needed it to construct the `Set`.
`MonoFoldable` includes some questionable design choices, but the `Ord` requirement for `Set` is not one.
Sent from my phone with K-9 Mail.
On 30 January 2022 05:19:46 UTC, David Feuer
MonoFoldable isn't the answer either. That also has the problem of requiring the same constraint for folding as for membership tests.
On Sun, Jan 30, 2022, 12:12 AM PHO
wrote: On 1/30/22 9:59 AM, David Feuer wrote: [snip]
. I think the real answer is to remove `elem` from `Foldable` and put it somewhere more appropriate.
In a separate class? That would only help `elem` but not any other methods of Foldable. I'd rather love to see MonoFoldable[1] to be moved to base then.
[1]:
https://hackage.haskell.org/package/mono-traversable-1.0.15.3/docs/Data-Mono... _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Sat, Jan 29, 2022 at 07:59:51PM -0500, David Feuer wrote:
One is that `containers` has a long history of attempting to remain compatible with potential future Haskell 98-like Haskell implementations (likely with a few days of work to patch up mistakes when the time comes). Using a GADT to define Set would break that quite thoroughly.
Yes, that's definitely an issue.
My second concern is more practical:
type Set a = SetImpl a a
is all sorts of problematic. It forces anyone who wants to use what's always been the `Set` type constructor (Set :: Type -> Type) to dig into the implementation and use `SetImpl :: Type -> Type -> Type` instead. That's quite a different beast.
Well, they could use the unsaturated type alias in some cases, but there are likely situations where that's not an option.
I don't know all the things that will break, but they'll surely include https://hackage.haskell.org/package/constrained-monads-0.5.0.0/docs/Control-....
Yes, that would break irreperably
I think the real answer is to remove `elem` from `Foldable` and put it somewhere more appropriate.
Perhaps so, though my guess is that this is unlikely to happen... Overall, your objections are likely sufficient. -- Viktor.

On Sat, Jan 29, 2022 at 07:25:16PM -0500, Viktor Dukhovni wrote:
it was observed that a similar approach can be used to give Data.Set a performant `elem` method
But why? It is impossible to program polymorphically in a meaningful way with Foldable, because Foldable has no coherent operational semantics. We discussed this at: https://github.com/haskell/core-libraries-committee/issues/20 If we could "improve" Set's `elem` instance what would that gain us? That we could write a Foldable-polymorphic function that works well on Set and badly on []? Then wy not just specialise to Set in the first place? Trying to "improve" Foldable instances seems like a fool's errand to me, especially if it trades off simplicity somewhere else. Tom
participants (5)
-
David Feuer
-
Keith
-
PHO
-
Tom Ellis
-
Viktor Dukhovni