O(n) algorithm for determining subset

Can someone tell me if this is correct. I'm guessing that if I represent two sets of integers by Word32 (where ith bit set means i is in the set), then an algorithm to determine if x is a subset of y would go something like (y - x) == (y `exclusiveOr` x)

It's simpler: "x&~y == 0"
2009/11/15 Michael Mossey
Can someone tell me if this is correct. I'm guessing that if I represent two sets of integers by Word32 (where ith bit set means i is in the set), then an algorithm to determine if x is a subset of y would go something like
(y - x) == (y `exclusiveOr` x)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Eugene Kirpichov Web IR developer, market.yandex.ru

On Mon, 16 Nov 2009 00:03:54 +0300 Eugene Kirpichov
Can someone tell me if this is correct. I'm guessing that if I represent two sets of integers by Word32 (where ith bit set means i is in the set), then an algorithm to determine if x is a subset of y would go something like
(y - x) == (y `exclusiveOr` x)
EK> It's simpler: "x&~y == 0" Depending on the OP data set the simple bit vector may be a good strategy, but I wonder if an inversion list would be better? Do you expect long runs of adjacent numbers and gaps? Inversion lists tend to encode those much more efficiently than bit vectors. I'm just now learning Haskell so I don't know enough to show an implementation but inversion lists are very simple conceptually. If your set is (2,3,4,5,8,9,10) the inversion list would be (2,6,8,11). It's easy to generate it from a set of Ord elements. I'd be curious to see an implementation of the above and other set operations with inversion lists, in fact, so I can learn from it. I couldn't find any implementation online for Haskell. Thanks Ted

Ted Zlatanov schrieb:
On Mon, 16 Nov 2009 00:03:54 +0300 Eugene Kirpichov
wrote: EK> 2009/11/15 Michael Mossey
: Can someone tell me if this is correct. I'm guessing that if I represent two sets of integers by Word32 (where ith bit set means i is in the set), then an algorithm to determine if x is a subset of y would go something like
(y - x) == (y `exclusiveOr` x)
EK> It's simpler: "x&~y == 0"
Depending on the OP data set the simple bit vector may be a good strategy, but I wonder if an inversion list would be better? Do you expect long runs of adjacent numbers and gaps? Inversion lists tend to encode those much more efficiently than bit vectors.
I'm just now learning Haskell so I don't know enough to show an implementation but inversion lists are very simple conceptually. If your set is (2,3,4,5,8,9,10) the inversion list would be (2,6,8,11). It's easy to generate it from a set of Ord elements.
Is "inversion list" = "Gray code" ?

On Thu, 19 Nov 2009 01:13:23 +0100 Henning Thielemann
On Mon, 16 Nov 2009 00:03:54 +0300 Eugene Kirpichov
wrote: EK> 2009/11/15 Michael Mossey
: Can someone tell me if this is correct. I'm guessing that if I represent two sets of integers by Word32 (where ith bit set means i is in the set), then an algorithm to determine if x is a subset of y would go something like
(y - x) == (y `exclusiveOr` x)
EK> It's simpler: "x&~y == 0"
Depending on the OP data set the simple bit vector may be a good strategy, but I wonder if an inversion list would be better? Do you expect long runs of adjacent numbers and gaps? Inversion lists tend to encode those much more efficiently than bit vectors.
I'm just now learning Haskell so I don't know enough to show an implementation but inversion lists are very simple conceptually. If your set is (2,3,4,5,8,9,10) the inversion list would be (2,6,8,11). It's easy to generate it from a set of Ord elements.
HT> Is "inversion list" = "Gray code" ? No. An inversion list contains the boundaries of all the continuous intervals in the input. In other words, any time you see input of [n,n+1, n+2...n+k,n+m...] where m > k+1 your inversion list gets the entries n and n+k+1. As I said I'm just starting to learn Haskell so I can't give a proper implementation: isSuccessor:: (Num a) => a -> a -> Bool isSuccessor x y = x+1 == y inversion :: (Num a) => [a] -> [a] inversion [] = [] inversion (x:xs) = x:(inversion . dropWhile isSuccessor xs) The above is obviously broken, but the idea is to drop successive elements. I don't know how to do that in a predicate for dropWhile so I may need a different function from Data.List or elsewhere. Ted

Am Donnerstag 19 November 2009 03:57:56 schrieb Ted Zlatanov:
On Thu, 19 Nov 2009 01:13:23 +0100 Henning Thielemann
wrote: HT> Ted Zlatanov schrieb:
On Mon, 16 Nov 2009 00:03:54 +0300 Eugene Kirpichov
wrote: EK> 2009/11/15 Michael Mossey
: Can someone tell me if this is correct. I'm guessing that if I represent two sets of integers by Word32 (where ith bit set means i is in the set), then an algorithm to determine if x is a subset of y would go something like
(y - x) == (y `exclusiveOr` x)
EK> It's simpler: "x&~y == 0"
Depending on the OP data set the simple bit vector may be a good strategy, but I wonder if an inversion list would be better? Do you expect long runs of adjacent numbers and gaps? Inversion lists tend to encode those much more efficiently than bit vectors.
I'm just now learning Haskell so I don't know enough to show an implementation but inversion lists are very simple conceptually. If your set is (2,3,4,5,8,9,10) the inversion list would be (2,6,8,11). It's easy to generate it from a set of Ord elements.
HT> Is "inversion list" = "Gray code" ?
No. An inversion list contains the boundaries of all the continuous intervals in the input. In other words, any time you see input of [n,n+1, n+2...n+k,n+m...] where m > k+1 your inversion list gets the entries n and n+k+1.
As I said I'm just starting to learn Haskell so I can't give a proper implementation:
isSuccessor:: (Num a) => a -> a -> Bool isSuccessor x y = x+1 == y
inversion :: (Num a) => [a] -> [a] inversion [] = [] inversion (x:xs) = x:(inversion . dropWhile isSuccessor xs)
The above is obviously broken, but the idea is to drop successive elements. I don't know how to do that in a predicate for dropWhile so I may need a different function from Data.List or elsewhere.
Ted
like: h :: Num a => a -> [a] -> [a] h x (y:ys) | x+1 == y = x:ys h x zs = x:(x+1):zs invlist = foldr h [] ghci> invlist [2,3,4,5,8,9,10] [2,6,8,11] ghci> invlist [4] [4,5] ghci> invlist [1,2,3,4,7,8,9,13,14,15,20,22] [1,5,7,10,13,16,20,21,22,23] ?

On Thu, 19 Nov 2009 04:54:13 +0100 Daniel Fischer

On Fri, 20 Nov 2009 15:30:49 -0600 Ted Zlatanov

Hi Ted, Some tips:
invlist_negate [] = [0] invlist_negate (0:xs) = xs invlist_negate xs = 0:xs
You are doing this for generic Num instances, so 0 probably isn't the lower bound. Haskell has another type class for this: Bounded. Then you can use minBound instead of 0. Also the first line is just a special case of the last one. invlist_negate :: (Bounded a, Num a) => [a] -> [a] invlist_negate (minBound : xs) = xs invlist_negate xs = minBound : xs Try doing invlist_member together with a function invlist_notMember (just like there is notElem for lists), I think that would clean things up a bit. Keep on going, there's lots of fun ahead! greetings, Sjoerd -- Sjoerd Visscher sjoerd@w3future.com

You probably don't want that minBound in the pattern, but rather as a
comparison in a guard.
Dan
On Tue, Dec 1, 2009 at 6:14 PM, Sjoerd Visscher
Hi Ted,
Some tips:
invlist_negate [] = [0] invlist_negate (0:xs) = xs invlist_negate xs = 0:xs
You are doing this for generic Num instances, so 0 probably isn't the lower bound. Haskell has another type class for this: Bounded. Then you can use minBound instead of 0. Also the first line is just a special case of the last one.
invlist_negate :: (Bounded a, Num a) => [a] -> [a] invlist_negate (minBound : xs) = xs invlist_negate xs = minBound : xs
Try doing invlist_member together with a function invlist_notMember (just like there is notElem for lists), I think that would clean things up a bit.
Keep on going, there's lots of fun ahead!
greetings, Sjoerd
-- Sjoerd Visscher sjoerd@w3future.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am Dienstag 01 Dezember 2009 23:31:10 schrieb Ted Zlatanov:
On Fri, 20 Nov 2009 15:30:49 -0600 Ted Zlatanov
wrote: TZ> A nice property of inversion lists is that inverting them simply TZ> requires removing or adding the minimum possible value at the beginning TZ> of the list. A membership test requires traversal but since the list is TZ> sorted we know when to stop if it doesn't exist. Set union and TZ> difference are pretty tricky but at least can stop early if one of two TZ> sets is finite. As I learn more Haskell I'll try implementing these TZ> step by step. Is there any interest in making this an actual module or TZ> is it not very useful in the context of Haskell?
OK, here's my current state. The first two functions are not mine, of course (the original invlist' was called h, that's all).
invlist_negate just appends 0 or removes it. I think the first condition (on an empty list) is not necessary. Should I keep it for clarity?
I don't think it's necessary. As Sjoerd mentioned, the use of 0 is problematic.
For invlist_member I basically pass the current state down into the recursion chain on invlist_member'. The function will end early if it can, or it will scan all the way down the list in the worst case (when the goal value is greater than the last interval marker). I think I could do it with a foldl but I wasn't able to figure it out.
Any suggestions are welcome. I can definitely reimplement the invlist function similarly to invlist_member, by passing an exclusion state boolean down, which will make the code longer. I'm not sure if it will be bad for performance. I think it will be better because I'll be able to do it lazily as opposed to the foldr below which needs a finite list.
No, quite the opposite. foldr is wonderful for lazy list processing. I just need to make my function a wee bit lazier: Prelude> let h x zs = x:case zs of { (y:ys) | x+1 == y -> ys; _ -> (x+1):zs; } Prelude> let invlist xs = foldr h [] xs Prelude> invlist [1,2,3,7,8,12,13,14,20] [1,4,7,9,12,15,20,21] Prelude> take 10 $ invlist [2,5 .. ] [2,3,5,6,8,9,11,12,14,15] Prelude> take 10 $ invlist [2,4 .. ] [2,3,4,5,6,7,8,9,10,11] Prelude> take 10 $ invlist [2 .. ] [2^CInterrupted. Prelude> take 10 $ invlist [2 :: Data.Int.Int16 .. ] [2,-32768] -- That's a problem here Hadn't thought about infinite (or even long) lists when I wrote it.
Can I do it with a direct foldl instead?
No, foldl cannot produce anything before the whole list has been traversed, so it can't deal with infinite lists at all.
I need to look at it carefully but maybe someone has a suggestion.
I plan to do set operations after I get the basics right :)
This should really have been in comp.lang.haskell.beginner. Sorry for the elementary stuff, I'm keeping the thread in c.l.h.cafe only to avoid double postings at this point.
Thanks Ted
invlist' :: Num a => a -> [a] -> [a] invlist' x (y:ys)
| x+1 == y = x:ys
invlist' x zs = x:(x+1):zs
invlist' :: Integral a => a -> [a] -> [a] invlist' x zs = x:ws where ws = case zs of (y:ys) | x+1 == y -> ys _ -> (x+1):zs -- we could use Num here, but for an implementation of sets via inversion lists, -- Integral is the appropriate setting. Perhaps even better use Integral and Bounded invlist' :: (Integral a, Bounded a) => a -> [a] -> [a] invlist' x _ | x == maxBound = [x] invlist' x zs = x:ws where ws = case zs of (y:ys) | x+1 == y -> ys _ -> (x+1):zs Now: Prelude> invlist [2 :: Data.Int.Int8 .. ] [2] :D
Don't forget a type signature here. Otherwise you'll get bitten by the monomorphism restriction. invlist :: (Integral a, Bounded a) => [a] -> [a]
invlist = foldr invlist' []
Problem. That is only sensible if we only consider sets of nonnegative numbers. For Integral and Bounded, invlist_negate (x:xs) | x == minBound = xs invlist_negate xs = minBound:xs
invlist_negate [] = [0] invlist_negate (0:xs) = xs invlist_negate xs = 0:xs
invlist_member _ [] = False
-- bootstrap membership test with a False exclusion state (an inversion list begins with the first included value) invlist_member goal (x:xs) = invlist_member' goal False (x:xs)
invlist_member' goal exclude (x:xs) = if goal < x then exclude else invlist_member' goal (not exclude) xs -- flip the exclusion state of the list
invlist_member' _ exclude _ = exclude -- if we can't match one more element, return the current exclusion state
invlist_member :: (Integral a, Bounded a) => a -> [a] -> Bool invlist_member goal = foldr (\n b -> if goal < n then False else not b) False *maybe* I'll think about unions, intersections and other set operations as foldr's.

On Wed, 2 Dec 2009 01:12:23 +0100 Daniel Fischer
participants (7)
-
Daniel Fischer
-
Daniel Peebles
-
Eugene Kirpichov
-
Henning Thielemann
-
Michael Mossey
-
Sjoerd Visscher
-
Ted Zlatanov