
Hi Cafe, We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail). Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient. Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf. What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#se... (I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...) Advice appreciated! Thanks, -Ryan

I am not aware of such a library, but IMHO this code will be very simple.
data Bits b => BitList b = BitList Int {- number of used bits in the next component -} b [b] Write an isomorphism between @BitList b@ and @ListStep (BitList b)@ where data ListStep e rc = Nil | Cons e rc
On 07.10.11 17:52, Ryan Newton wrote:
Hi Cafe,
We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail).
Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient.
Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf.
What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors: http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#se...
(I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...)
Advice appreciated!
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Yep, it is simple. But I prefer to only use well-tested data structure
libraries where I can! Here's an example simple implementation (partial --
missing some common functions):
module Data.BitList
( BitList
, cons, head, tail, empty
, pack, unpack, length, drop
)
where
import Data.Int
import Data.Bits
import Prelude as P hiding (head,tail,drop,length)
import qualified Data.List as L
import Test.HUnit
data BitList = One {-# UNPACK #-} !Int {-# UNPACK #-} !Int64
| More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList
instance Show BitList where
show bl = "BitList " ++ show (map (\b -> case b of True -> '1'; False ->
'0') (unpack bl))
-- show bl = "pack " ++ show (unpack bl)
empty :: BitList
empty = One 0 0
cons :: Bool -> BitList -> BitList
cons True x@(One 64 _ ) = More 1 1 x
cons False x@(One 64 _ ) = More 1 0 x
cons True x@(More 64 bv _) = More 1 1 x
cons False x@(More 64 bv _) = More 1 0 x
cons True (One i bv) = One (i+1) (bv `setBit` i)
cons False (One i bv) = One (i+1) (bv )
cons True (More i bv r) = More (i+1) (bv `setBit` i) r
cons False (More i bv r) = More (i+1) (bv ) r
-- TODO: May consider (More 0 _ _) representation to reduce extra
-- allocation when size of the BitList is fluctuating back and forth.
head :: BitList -> Bool
head (One 0 _ ) = error "tried to take head of an empty BitList"
head (More 0 _ r) = error "BitList: data structure invariant failure!"
head (One i bv ) = bv `testBit` (i-1)
head (More i bv r) = bv `testBit` (i-1)
tail :: BitList -> BitList
tail (One 0 _ ) = error "tried to take the tail of an empty BitList"
tail (One i bv ) = One (i-1) bv
tail (More 1 bv r) = r
tail (More i bv r) = More (i-1) bv r
pack :: [Bool] -> BitList
pack [] = One 0 0
pack (h:t) = cons h (pack t)
unpack :: BitList -> [Bool]
unpack (One 0 _) = []
unpack (One i bv) = (bv `testBit` (i-1)) : unpack (One (i-1) bv)
unpack (More 0 _ r) = unpack r
unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r)
drop :: Int -> BitList -> BitList
drop 0 bl = bl
drop n bl | n >= 64 = case bl of
One _ _ -> error "drop: not enough elements in BitList"
More i _ r -> drop (n-i) r
drop n bl = case bl of
One i bv -> One (i-n) bv
More i bv r -> More (i-n) bv r
length :: BitList -> Int
length (One i _) = i
length (More i _ r) = i + length r
-- TODO: index, take, etc
-- TODO: functor instance, etc.
--------------------------------------------------------------------------------
-- Testing:
t1 = pack (L.concat$ L.replicate 10 [True,False,True])
t2 = L.length $ unpack $ pack $ replicate 1000 True
t3 = pack $ replicate 1000 True
t4 = drop 500 t3
p3 = L.and (unpack t3)
p4 = L.and (unpack t4)
t5 = iterate tail t4 !! 250
t5a = length t5
t5b = L.length (unpack t5)
tests :: Test
tests =
TestList
[
show t1 ~=? "BitList \"101101101101101101101101101101\""
, t2 ~=? 1000
, t5a ~=? 250
, t5b ~=? 250
, p3 ~=? True
, p4 ~=? True
]
-- TODO: QuickCheck
On Sun, Oct 9, 2011 at 7:50 AM, Roman Beslik
I am not aware of such a library, but IMHO this code will be very simple.
data Bits b => BitList b = BitList Int {- number of used bits in the next component -} b [b] Write an isomorphism between @BitList b@ and @ListStep (BitList b)@ where data ListStep e rc = Nil | Cons e rc
On 07.10.11 17:52, Ryan Newton wrote:
Hi Cafe,
We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail).
Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient.
Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf.
What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors:
http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#se... (I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...)
Advice appreciated!
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing listHaskell-Cafe@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :)

Must it be a list?
What about a Bloom Filter?
On Sun, Oct 9, 2011 at 9:11 AM, Roman Beslik
Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- -- Regards, KC

On Sun, Oct 9, 2011 at 6:18 AM, Ryan Newton
Yep, it is simple. But I prefer to only use well-tested data structure libraries where I can! Here's an example simple implementation (partial -- missing some common functions):
module Data.BitList ( BitList , cons, head, tail, empty , pack, unpack, length, drop ) where
import Data.Int import Data.Bits import Prelude as P hiding (head,tail,drop,length) import qualified Data.List as L import Test.HUnit
data BitList = One {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 | More {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitList
I suggest data BitTail = Zero | More {-# UNPACK #-} !Int64 BitTail data BitList = Head {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 BitTail empty = Head 0 0 Zero or else just data BitList = Head {-# UNPACK #-} !Int {-# UNPACK #-} !Int64 [Int64] empty = Head 0 0 [] length (Head n _ xs) = n + 64 * List.length xs unpack :: BitList -> [Bool]
unpack (One 0 _) = [] unpack (One i bv) = (bv `testBit` (i-1)) : unpack (One (i-1) bv) unpack (More 0 _ r) = unpack r unpack (More i bv r) = (bv `testBit` (i-1)) : unpack (More (i-1) bv r)
I'd implement as view :: BitList -> Maybe (Bool, BitList) view (One 0 _) = Nothing view bl = Just (head bl, tail bl) unpack = unfoldr view
drop :: Int -> BitList -> BitList drop 0 bl = bl drop n bl | n >= 64 = case bl of One _ _ -> error "drop: not enough elements in BitList" More i _ r -> drop (n-i) r drop n bl = case bl of One i bv -> One (i-n) bv More i bv r -> More (i-n) bv r
This is wrong. drop 5 (More 1 0 (One 64 0)) -> More (-4) 0 (One 64 0) Fixed version (also gives same behavior as List.drop when n > length l) drop :: Int -> BitList -> BitList drop n (One i bv) | n >= i = empty | otherwise = One (i - n) bv drop n (More i bv r) | n >= i = drop (n - i) r | otherwise = More (i - n) bv r -- ryan

data *(Bits b) =>* BitList b Is deprecated and soon to be removed from the language.
I fail to understand. Why not just:
data BitList b = Nil | BitList Int b (BitList b) ??
2011/10/9 Roman Beslik
I am not aware of such a library, but IMHO this code will be very simple.
data Bits b => BitList b = BitList Int {- number of used bits in the next component -} b [b] Write an isomorphism between @BitList b@ and @ListStep (BitList b)@ where data ListStep e rc = Nil | Cons e rc
On 07.10.11 17:52, Ryan Newton wrote:
Hi Cafe,
We are lucky to have a plethora of data structures out there. But it does make choosing one off hackage difficult at times. In this case I'm *not* looking for a O(1) access bit vector (Data.Vector.Unboxed seems to be the choice there), but an efficient representation for a list of bits (cons,head,tail).
Let's say that you want to represent tree indices as you walk down a binary tree. [Bool] is a simple choice, you only add to the front of the list (0/1 = Left/Right), sharing the tails. But [Bool] is quite space inefficient.
Something like [Int] would allow packing the bits more efficiently. A Lazy ByteString could amortize the space overhead even more... but in both cases there's a tiny bit of work to do in wrapping those structures for per-bit access. That's probably the right thing but I wanted to check to see if there's something else recommended, perhaps more off-the-shelf.
What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors:
http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#se... (I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...)
Advice appreciated!
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing listHaskell-Cafe@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, Oct 9, 2011 at 12:11 PM, Roman Beslik
Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :)
There are probably more constructive, jibe-free ways to frame this suggestion... Regarding testing: my preference for using a preexisting solution is a product of 18 years of programming in Scheme without a large base of shared infrastructure -- I've seen way too much "roll your own X" leading to trouble. Regarding high-performance data-structures in Haskell: I wish high-level concepts were sufficient for their optimization. But if you look at all the tricks played by, for example, Johan Tibell and Greg Collins in their excellent hashmaps and hashtables libraries, that, alas, seems not to be the case yet. GHC is in a good position to do inlining and specialization (making the world safe for type classes), but it can't add unpack and strictness annotations, nor can it change data representations themselves. For example, to answer Yves question: I fail to understand. Why not just:
data BitList b = Nil | BitList Int b (BitList b) ??
That was a "data structure unrolling" to optimize the memory representation in the common case (<64 bits). Starting with: type I = Int64 -- or whatever data BitList = Nil | BL Int I BitList The recursive datatype can be inlined (once): data BitList = Nil | BL Int I (Nil | BL Int I BitList) *-- not real syntax * data BitList = Nil | BL2 Int I Nil | BL3 Int I Int I BitList* -- distribute * data BitList = Nil | BL2 Int I | BL3 Int I Int I BitList *-- prune Nil* This unrolled data structure has two advantages. It can directly represent the common case <64 bits with one object, and it can use half the tail pointers for longer lists. GHC could conceivably transform code automatically to enable this unrolling (but it can't now). However, there are some further observations that really require a human. Because we are using that extra Int to track the bit position inside the "I" the Nil case is redundant -- "BL2 0 0" can represent empty. Further one of the Ints in the BL3 case is always 64 (sizeof I) and needn't be represented. That gives us: data BitList = BL2 Int I | BL3 Int I I BitList *-- prune Nil* Which is pretty much what I used. Actually, I skipped the "double wide" second case because I was really only worried about simplifying the representation for shorter lists and that would indeed complicate the code.
data *(Bits b) =>* BitList b
FYI, in the bit of code I sent I didn't generalize over the Bits class because it's really an implementation detail what size chunk is used (just like in Lazy ByteStrings). I didn't want to pollute the interface. That said, the code should certainly be "CSE"d to make the "64/Int64" choice swappable. Best regards, -Ryan

Hi, Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton:
What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors:
http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#se... (I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...)
interesting idea. Should this be considered a bug in ghc? (Not that it cannot represent the result, but that it crashes even out of ghci): $ ghci GHCi, version 7.0.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> :m + Data.Bits Prelude Data.Bits> setBit 0 (2^63-1::Int) gmp: overflow in mpz type Abgebrochen Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

On Sunday 09 October 2011, 15:54:14, Joachim Breitner wrote:
Hi,
Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton:
What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors:
http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.h tml#setBit (I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...)
interesting idea. Should this be considered a bug in ghc? (Not that it cannot represent the result, but that it crashes even out of ghci):
$ ghci GHCi, version 7.0.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> :m + Data.Bits Prelude Data.Bits> setBit 0 (2^63-1::Int) gmp: overflow in mpz type Abgebrochen
says info gmp: `_mp_size' and `_mp_alloc' are `int', although `mp_size_t' is usually a `long'. This is done to make the fields just 32 bits on some 64 bits systems, thereby saving a few bytes of data space but still providing plenty of range. So it seems to be GMP itself.

On 9 October 2011 14:54, Joachim Breitner
Hi,
Am Freitag, den 07.10.2011, 10:52 -0400 schrieb Ryan Newton:
What about just using the Data.Bits instance of Integer? Well, presently, the setBit instance for very large integers creates a whole new integer, shifts, and xors:
http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Bits.html#se... (I don't know if it's possible to do better. From quick googling GMP seems to use an array of "limbs" rather than a chunked list, so maybe there's no way to treat large Integers as a list and update only the front...)
interesting idea. Should this be considered a bug in ghc? (Not that it cannot represent the result, but that it crashes even out of ghci):
$ ghci GHCi, version 7.0.4: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Prelude> :m + Data.Bits Prelude Data.Bits> setBit 0 (2^63-1::Int) gmp: overflow in mpz type Abgebrochen
Yes, that's a bug. GMP shouldn't call abort(), but it should be turned into a Haskell exception. It probably doesn't make much of a difference in practise, but "safe" could should never crash GHC.
participants (8)
-
Daniel Fischer
-
Joachim Breitner
-
KC
-
Roman Beslik
-
Ryan Ingram
-
Ryan Newton
-
Thomas Schilling
-
Yves Parès