Lazy lists simulated by unboxed mutable arrays

I wonder whether the following idea has been investigated or implemented somewhere: We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?

Hello Henning, Wednesday, May 28, 2008, 9:51:28 AM, you wrote:
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?
looks like lazy.bytestring generalized to any a -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Wed, 28 May 2008, Bulat Ziganshin wrote:
Hello Henning,
Wednesday, May 28, 2008, 9:51:28 AM, you wrote:
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?
looks like lazy.bytestring generalized to any a
As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.

Hello Henning, Wednesday, May 28, 2008, 12:22:54 PM, you wrote:
Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced.
As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.
well, i don't understand difference between your idea and lazybs implementation -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin
well, i don't understand difference between your idea and lazybs implementation
HT said earlier that:
This would still allow the nice tricks for recursive Fibonacci sequence definition.
Which I guess refers to something like: fibs = 1 : 1 : zipWith (+) fibs (tail fibs) I don't think you can do that with LBS, since you'd need to calculate a whole chunk at a time, and for any chunk size > 1, each chunk depends on itself. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Wed, 28 May 2008, Ketil Malde wrote:
Bulat Ziganshin
writes: well, i don't understand difference between your idea and lazybs implementation
HT said earlier that:
This would still allow the nice tricks for recursive Fibonacci sequence definition.
Which I guess refers to something like:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
I don't think you can do that with LBS, since you'd need to calculate a whole chunk at a time, and for any chunk size > 1, each chunk depends on itself.
Right, that's what I meant.

On Wed, 28 May 2008, Ketil Malde wrote:
Bulat Ziganshin
writes: well, i don't understand difference between your idea and lazybs implementation
HT said earlier that:
This would still allow the nice tricks for recursive Fibonacci sequence definition.
Which I guess refers to something like:
fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
I don't think you can do that with LBS, since you'd need to calculate a whole chunk at a time, and for any chunk size > 1, each chunk depends on itself.
I have now implemented a small prototype: http://code.haskell.org/storablevector/Data/StorableVector/Cursor.hs Actually you can run the Fibonacci example but it allocates three arrays: let f2 = zipNWith 15 (+) f0 f1; f1 = cons 1 f2; f0 = cons (0::Int) f1 in f0 I'm afraid the compiler cannot optimize the IORefs to unboxed values, even in registers, because in principle they can be modified from everywhere in the program. Is there a better way than using IORefs hidden by unsafePerformIO? {-# OPTIONS_GHC -cpp -fglasgow-exts #-} {- | Simulate a list with strict elements by a more efficient array structure. -} module Data.StorableVector.Cursor where import Control.Exception (assert, ) import Control.Monad.State (StateT(StateT), runStateT, ) import Data.IORef (IORef, newIORef, readIORef, writeIORef, ) import Foreign.Storable (Storable(peekElemOff, pokeElemOff)) import Foreign.ForeignPtr (ForeignPtr, mallocForeignPtrArray, withForeignPtr, ) import Control.Monad (when) import Data.Maybe (isNothing) import System.IO.Unsafe (unsafePerformIO) import Data.StorableVector.Utility (viewListL, mapSnd, ) import Prelude hiding (length, foldr, zipWith, ) -- | Cf. StreamFusion Data.Stream data Generator a = forall s. -- Seq s => Generator { next :: {-# UNPACK #-} !(StateT s Maybe a), -- compute next value state :: {-# UNPACK #-} !(IORef (Maybe s)) -- current state } {- | This simulates a @ data StrictList a = Elem !a (StrictList a) | End @ by an array and some unsafe hacks. -} data Buffer a = Buffer { memory :: {-# UNPACK #-} !(ForeignPtr a), size :: {-# UNPACK #-} !Int, -- size of allocated memory gen :: {-# UNPACK #-} !(Generator a), cursor :: {-# UNPACK #-} !(IORef Int) } {- | Vector is a part of a buffer. -} data Vector a = Vector { buffer :: {-# UNPACK #-} !(Buffer a), start :: {-# UNPACK #-} !Int, -- invariant: start <= cursor maxLen :: {-# UNPACK #-} !Int -- invariant: start+maxLen <= size } -- * construction {-# INLINE create #-} create :: (Storable a) => Int -> Generator a -> Buffer a create l g = unsafePerformIO (createIO l g) -- | Wrapper of mallocForeignPtrArray. createIO :: (Storable a) => Int -> Generator a -> IO (Buffer a) createIO l g = do fp <- mallocForeignPtrArray l cur <- newIORef 0 return $! Buffer fp l g cur {- | @ unfoldrNTerm 20 (\n -> Just (n, succ n)) 'a' @ -} unfoldrNTerm :: (Storable b) => Int -> (a -> Maybe (b, a)) -> a -> Vector b unfoldrNTerm i f x0 = unsafePerformIO (unfoldrNTermIO i f x0) unfoldrNTermIO :: (Storable b) => Int -> (a -> Maybe (b, a)) -> a -> IO (Vector b) unfoldrNTermIO i f x0 = do ref <- newIORef (Just x0) buf <- createIO i (Generator (StateT f) ref) return (Vector buf 0 i) {-# INLINE pack #-} pack :: (Storable a) => Int -> [a] -> Vector a pack n = unfoldrNTerm n viewListL {-# INLINE cons #-} {- | This is expensive and should not be used to construct lists iteratively! -} cons :: Storable a => a -> Vector a -> Vector a cons x xs = unfoldrNTerm (succ (maxLen xs)) (\(mx0,xs0) -> fmap (mapSnd ((,) Nothing)) $ maybe (viewL xs0) (\x0 -> Just (x0, xs0)) mx0) $ (Just x, xs) {-# INLINE zipWith #-} zipWith :: (Storable a, Storable b, Storable c) => (a -> b -> c) -> Vector a -> Vector b -> Vector c zipWith f ps0 qs0 = zipNWith (min (maxLen ps0) (maxLen qs0)) f ps0 qs0 {-# INLINE zipNWith #-} zipNWith :: (Storable a, Storable b, Storable c) => Int -> (a -> b -> c) -> Vector a -> Vector b -> Vector c zipNWith n f ps0 qs0 = unfoldrNTerm n (\(ps,qs) -> do (ph,pt) <- viewL ps (qh,qt) <- viewL qs return (f ph qh, (pt,qt))) (ps0,qs0) -- * inspection -- | evaluate next value in a buffer advanceIO :: Storable a => Buffer a -> IO () advanceIO (Buffer p sz (Generator n s) cr) = do c <- readIORef cr assert (c < sz) $ do writeIORef cr (succ c) ms <- readIORef s case ms of Nothing -> return () Just s0 -> case runStateT n s0 of Nothing -> writeIORef s Nothing Just (a,s1) -> writeIORef s (Just s1) >> withForeignPtr p (\q -> pokeElemOff q c a) {-# INLINE switchL #-} switchL :: Storable a => b -> (a -> Vector a -> b) -> Vector a -> b switchL n j v = unsafePerformIO (switchLIO n j v) switchLIO :: Storable a => b -> (a -> Vector a -> b) -> Vector a -> IO b switchLIO n j v@(Vector buf st ml) = nullIO v >>= \ isNull -> if isNull then return n else do c <- readIORef (cursor buf) assert (st <= c) $ when (st == c) (advanceIO buf) x <- withForeignPtr (memory buf) (\p -> peekElemOff p st) let tl = assert (ml>0) $ Vector buf (succ st) (pred ml) return (j x tl) {-# INLINE viewL #-} viewL :: Storable a => Vector a -> Maybe (a, Vector a) viewL = switchL Nothing (curry Just) {-# INLINE foldr #-} foldr :: (Storable a) => (a -> b -> b) -> b -> Vector a -> b foldr k z = let recurse = switchL z (\h t -> k h (recurse t)) in recurse {-# INLINE unpack #-} unpack :: (Storable a) => Vector a -> [a] unpack = foldr (:) [] instance (Show a, Storable a) => Show (Vector a) where showsPrec p x = showsPrec p (unpack x) {-# INLINE null #-} null :: Vector a -> Bool null = unsafePerformIO . nullIO nullIO :: Vector a -> IO Bool nullIO (Vector (Buffer _ sz (Generator _ s) _) st _) = do b <- readIORef s return (st >= sz || isNothing b)

Henning Thielemann
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. [...]
looks like lazy.bytestring generalized to any a
As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.
You've lost me at least. Presumably, you don't want single-element UArrays, so you want a chunky data structure, but fine-grained laziness. As far as I can tell, this implies that the UArray is split (at the position indicated by the cursor?) into an evaluated part, and an unevaluated part. Now, if you evaluate one more element, you need to calculate its value and create a new UArray (and cursor) to replace the old one(s). Unless you do the whole thing in the ST or IO monads, I can't see how you can implement this efficiently. Where did I misunderstand you? Perhaps you can sketch some code? -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde
You've lost me at least.
...but perhaps I can find my way back on my own? Today, you can choose between Array, with lazy elements, or UArray, with strict elements. Lazy arrays have the elements defined in advance, strict ones have them calculated in advance - with the tremendous advantage of being able to eliminate the pointer for each element. Otherwise a pointer is needed to point to a potentially unevaluated thunk. Perhaps there is a middle ground here, if you add the restriction that the elements are generated in order? This way, you only need one pointer to an unevaluated thunk (which will yield all remaining elements as needed), and an unboxed array can contain the calculated values. This would be very nice for e.g. sequence alignment, where the alignment matrix is self-referencing, but the pointers represent a very real cost to an already expensive (resource-intesive) solution. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Wed, 28 May 2008, Ketil Malde wrote:
Ketil Malde
writes: You've lost me at least.
...but perhaps I can find my way back on my own?
Today, you can choose between Array, with lazy elements, or UArray, with strict elements.
... and ByteStrings, where in principle the elements could be initialized in any order, but actually the ByteString functions prefer a left-to-right order. They are clearly intended as list replacement, so my proposed "cursor arrays" would do as well.
Lazy arrays have the elements defined in advance, strict ones have them calculated in advance - with the tremendous advantage of being able to eliminate the pointer for each element. Otherwise a pointer is needed to point to a potentially unevaluated thunk.
Perhaps there is a middle ground here, if you add the restriction that the elements are generated in order?
Exactly. Thus I compared my suggestion with element-strict lists.
This way, you only need one pointer to an unevaluated thunk (which will yield all remaining elements as needed), and an unboxed array can contain the calculated values.
That's it!
This would be very nice for e.g. sequence alignment, where the alignment matrix is self-referencing, but the pointers represent a very real cost to an already expensive (resource-intesive) solution.
I'm thinking about signal processing, where data is processed in time order in many cases. Thank you for clarification!

On Wed, 2008-05-28 at 10:22 +0200, Henning Thielemann wrote:
On Wed, 28 May 2008, Bulat Ziganshin wrote:
Hello Henning,
Wednesday, May 28, 2008, 9:51:28 AM, you wrote:
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?
looks like lazy.bytestring generalized to any a
As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.
How about a chunky lazy array? The problem with lazy bytestring is that each chunk is strict but if you made each chunk a H98 lazy array that might work nicely. Duncan

On Wed, 28 May 2008, Duncan Coutts wrote:
On Wed, 2008-05-28 at 10:22 +0200, Henning Thielemann wrote:
On Wed, 28 May 2008, Bulat Ziganshin wrote:
Hello Henning,
Wednesday, May 28, 2008, 9:51:28 AM, you wrote:
We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?
looks like lazy.bytestring generalized to any a
As far as I know, ByteString.Lazy is chunky, that is laziness occurs only every 1000th byte or so. My suggestion aims at laziness at element level but still more efficiency than a list.
How about a chunky lazy array? The problem with lazy bytestring is that each chunk is strict but if you made each chunk a H98 lazy array that might work nicely.
This would actually yield the intended result, but I'm afraid there are still too much pointers and allocations around. The nice thing about my suggestion is, that no allocation would be needed to construct new elements in the list, just running the function to evaluate the next elements and advance the cursor. It's a hack of course, like ByteString internally is, it's entirely for efficiency reasons. I interpret your answers as "what you suggest is not obviuous and has probably not been proposed before" so I'll take some time for a demonstration implementation.

Hello Andrew, Wednesday, May 28, 2008, 10:37:47 PM, you wrote:
looks like lazy.bytestring generalized to any a That sounds like a darn useful thing to have...
well, support on only Word8 as base type isn't some fundamental limit, just creators of bytestring package was not very interested in support of other base types
[OTOH, currently unboxed arrays are available only for a select few types, so good luck implementing this in a clean way!]
a few years ago, storable array was slower than special primitives created for UArray implementation. now they should have equal performance, so it's possible to make UArray implementation that will work with any Storable type. Simon, can you please confirm or refute this? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello Andrew,
Wednesday, May 28, 2008, 10:37:47 PM, you wrote:
looks like lazy.bytestring generalized to any a That sounds like a darn useful thing to have...
well, support on only Word8 as base type isn't some fundamental limit, just creators of bytestring package was not very interested in support of other base types
[OTOH, currently unboxed arrays are available only for a select few types, so good luck implementing this in a clean way!]
a few years ago, storable array was slower than special primitives created for UArray implementation. now they should have equal performance, so it's possible to make UArray implementation that will work with any Storable type. Simon, can you please confirm or refute this?
Sure. I've been working on a general vector library based on a similar type to STUArray, but pure, with a bytestring-like interface. http://code.haskell.org/~dons/uvector Generalised unlifted vectors. Performance looks rather good (typically loops unfold into register variables). This is a restricted form of the general data parallel arrays library. -- Don

Don Stewart
http://code.haskell.org/~dons/code/uvector (I presume? The other URL gives a 404) -k -- If I haven't seen further, it is by standing in the footprints of giants

On Thu, 29 May 2008, Bulat Ziganshin wrote:
Hello Andrew,
Wednesday, May 28, 2008, 10:37:47 PM, you wrote:
looks like lazy.bytestring generalized to any a That sounds like a darn useful thing to have...
well, support on only Word8 as base type isn't some fundamental limit, just creators of bytestring package was not very interested in support of other base types
Just in order to remind you, ByteString for any Storable types does already exist: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/storablevector Current repository is here: http://code.haskell.org/storablevector/

lemming:
I wonder whether the following idea has been investigated or implemented somewhere: We could simulate a list with strict elements, i.e. data StrictList a = Elem !a (StrictList a) | End
I've used the above structure itself, as a useful alternative to fully lazy lists.
by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?
That sounds a lot like the semi-eager structure, Lazy ByteStrings, which do cache sized chunks of evaluation before suspending. With the cursor in place, it would behave more like the buffer abstraction to lazy bytestrings in Data.Binary? -- Don

On Wed, 28 May 2008, Don Stewart wrote:
by an unboxed array with a cursor to the next element to be evaluated and a function that generates the next element. Whenever an element with an index beyond the cursor is requested, sufficiently many new elements are written to the array and the cursor is advanced. This would still allow the nice tricks for recursive Fibonacci sequence definition. This will obviously save memory, but can we also expect that it is noticeably faster than (StrictList a) ?
That sounds a lot like the semi-eager structure, Lazy ByteStrings, which do cache sized chunks of evaluation before suspending. With the cursor in place, it would behave more like the buffer abstraction to lazy bytestrings in Data.Binary?
Can you code fibs = 0 : 1 : zipWith (+) fibs (tail fibs) with hte 'buffer'? I'm afraid you cannot simultaneously read and write from 'buffer', cannot 'drop' and so on, right? What I have in mind is some combination of a 'Data.Stream' for generating the data (playing the role of the unevaluated thunk), a memory chunk for buffering calculated data and a wrapper which provides a view on a sub-array. Ideally 'fibs' would be translated to something like int *p = malloc ...; int *p0 = p; *p = 0; p++; int *p1 = p; *p = 1; p++; int *p2 = p; int i=n; while (i>0) { *p2 = *p0 + *p1; p0++; p1++; p2++; i--; } I'm not sure, whether the compiler can eliminate the last bit of laziness that would remain in a 'cursor array'.
participants (6)
-
Andrew Coppin
-
Bulat Ziganshin
-
Don Stewart
-
Duncan Coutts
-
Henning Thielemann
-
Ketil Malde