RFC: general sequences

Ross Paterson

23 May 2005 23 May '05

11:13 a.m.

A general implementation of sequences (based on work with Ralf Hinze) can be found at http://www.soi.city.ac.uk/~ross/software/html/Data.Sequence.html http://www.soi.city.ac.uk/~ross/software/Data/Sequence.hs Our experiments indicate that its performance is comparable to (and sometimes better than) the best known persistent implementations(*). Non-persistent implementations are typically faster, but you need to be more careful when using them. I'd like to propose this for base. Comments welcome. (*) or at least it will be when GHC's SPECIALIZE pragma is fixed for polymorphic specializations (SF bug #1019758).

Show replies by date

Henning Thielemann

23 May 23 May

11:38 a.m.

On Mon, 23 May 2005, Ross Paterson wrote:

...

A general implementation of sequences (based on work with Ralf Hinze) can be found at

http://www.soi.city.ac.uk/~ross/software/html/Data.Sequence.html http://www.soi.city.ac.uk/~ross/software/Data/Sequence.hs

Our experiments indicate that its performance is comparable to (and sometimes better than) the best known persistent implementations(*). Non-persistent implementations are typically faster, but you need to be more careful when using them.

I'd like to propose this for base.

Comments welcome.

I like the prefix-free function names and the argument order, although I'd like even more if also the main type does not replicate the modules name, but is simply T instead of Seq. Then we could write import qualified Data.Sequence as Seq and use Seq.T as type. It would be great if we had a class for finite sequences with instances Data.List, Data.Array, Data.Sequence. Maybe the new identifiers (<|), (|>) and so on should be names of methods of a such a class instead of separate functions.

Ross Paterson

3:47 p.m.

On Mon, May 23, 2005 at 01:38:59PM +0200, Henning Thielemann wrote:

...

It would be great if we had a class for finite sequences with instances Data.List, Data.Array, Data.Sequence. Maybe the new identifiers (<|), (|>) and so on should be names of methods of a such a class instead of separate functions.

What would that buy us? I can't think of many useful functions that would use that interface, what with (<|) lazy for lists and strict for the other two, and (|>) cheap for sequences and expensive for the other two.

Henning Thielemann

4:14 p.m.

On Mon, 23 May 2005, Ross Paterson wrote:

...

On Mon, May 23, 2005 at 01:38:59PM +0200, Henning Thielemann wrote:

...
It would be great if we had a class for finite sequences with instances Data.List, Data.Array, Data.Sequence. Maybe the new identifiers (<|), (|>) and so on should be names of methods of a such a class instead of separate functions.

What would that buy us? I can't think of many useful functions that would use that interface, what with (<|) lazy for lists and strict for the other two, and (|>) cheap for sequences and expensive for the other two.

and more importantly (><) :-) I remember there was some discussion on a generalised sequence type and some people wanted different implementations for String. So if the functions shared by Data.List, Data.Array, Data.Sequence are methods of one class everyone can easily switch between the types to choose the one which performs best.

Bulat Ziganshin

4:41 p.m.

New subject: Re[2]: RFC: general sequences

Hello Ross, Monday, May 23, 2005, 7:47:37 PM, you wrote: RP> On Mon, May 23, 2005 at 01:38:59PM +0200, Henning Thielemann wrote:

...

...
It would be great if we had a class for finite sequences with instances Data.List, Data.Array, Data.Sequence. Maybe the new identifiers (<|), (|>) and so on should be names of methods of a such a class instead of separate functions.

RP> What would that buy us? I can't think of many useful functions that RP> would use that interface, what with (<|) lazy for lists and strict for the other two, and (|>>) cheap for sequences and expensive for the other two. at least, it will be great to have analogues of length, map, filter, partition and a number of other operations defined in classes and supported for variety of data structures. i think it's omission in H98 standard that this names belongs only to lists and not defined in some classes like (+) and (>) -- Best regards, Bulat mailto:bulatz@HotPOP.com

Benjamin Franksen

8:22 p.m.

New subject: Re[2]: RFC: general sequences

On Monday 23 May 2005 18:41, Bulat Ziganshin wrote:

...

RP> On Mon, May 23, 2005 at 01:38:59PM +0200, Henning Thielemann wrote:

...
...
It would be great if we had a class for finite sequences with instances Data.List, Data.Array, Data.Sequence. Maybe the new identifiers (<|), (|>) and so on should be names of methods of a such a class instead of separate functions.

RP> What would that buy us? I can't think of many useful functions that RP> would use that interface, what with (<|) lazy for lists and strict for the other two, and (|>>) cheap for sequences and expensive for the other two.

at least, it will be great to have analogues of length, map, filter, partition and a number of other operations defined in classes and supported for variety of data structures. i think it's omission in H98 standard that this names belongs only to lists and not defined in some classes like (+) and (>)

A agree completely. Almost all of the functions in Data.Seq have the same name and type signature (modulo the data type) and provide the same functionality as the corresponding ones for lists (in the Prelude). A type class Sequence to capture these common features would be great. It would also mean we can import module Sequence unqualified. A problem is that the names conflict with the ones from the Prelude, so that this style can only be used when hiding all the list stuff from Prelude. Ben

John Meacham

8:29 p.m.

New subject: Re[2]: RFC: general sequences

On Mon, May 23, 2005 at 10:22:38PM +0200, Benjamin Franksen wrote:

...

On Monday 23 May 2005 18:41, Bulat Ziganshin wrote:

...
RP> On Mon, May 23, 2005 at 01:38:59PM +0200, Henning Thielemann wrote:

...
...
It would be great if we had a class for finite sequences with instances Data.List, Data.Array, Data.Sequence. Maybe the new identifiers (<|), (|>) and so on should be names of methods of a such a class instead of separate functions.

RP> What would that buy us? I can't think of many useful functions that RP> would use that interface, what with (<|) lazy for lists and strict for the other two, and (|>>) cheap for sequences and expensive for the other two.

at least, it will be great to have analogues of length, map, filter, partition and a number of other operations defined in classes and supported for variety of data structures. i think it's omission in H98 standard that this names belongs only to lists and not defined in some classes like (+) and (>)

A agree completely. Almost all of the functions in Data.Seq have the same name and type signature (modulo the data type) and provide the same functionality as the corresponding ones for lists (in the Prelude). A type class Sequence to capture these common features would be great. It would also mean we can import module Sequence unqualified. A problem is that the names conflict with the ones from the Prelude, so that this style can only be used when hiding all the list stuff from Prelude.

I would much prefer if all of the data structures had their own interface in addition to being instances of a common class. so we could have import Data.GenSeq -- get the general interface import Data.Sequence(Seq) -- all we need is the type name (and the implicitly imported instance) This would nicely decouple the development of the general class from particular data structures, since they can stand on their own if needed. John -- John Meacham - ⑆repetae.net⑆john⑈

Ross Paterson

24 May 24 May

12:10 a.m.

New subject: Re[2]: RFC: general sequences

On Mon, May 23, 2005 at 01:29:34PM -0700, John Meacham wrote:

...

I would much prefer if all of the data structures had their own interface in addition to being instances of a common class. so we could have

import Data.GenSeq -- get the general interface import Data.Sequence(Seq) -- all we need is the type name (and the implicitly imported instance)

This would nicely decouple the development of the general class from particular data structures, since they can stand on their own if needed.

This double-barrelled approach (if I've understood you correctly) is what Edison does. Actually, the type name isn't quite enough: if you're using a sequence locally in a function, you need something more to select the instance you want. You can't use a type signature, because you want to specify the type constructor, not its argument. One possibility is to add a specialized identity: idSeq :: Seq a -> Seq a I proposed a similar large class (though with lots of defaults), last time this was discussed: http://www.haskell.org//pipermail/libraries/2004-April/001978.html At the time, I thought the class was necessary, despite the extra complications, because none of the available sequence implementations dominated all the others. The question is whether we want more than one instance, now that we have finger trees. Finger trees seem to dominate the persistent queue and deque implementations. They're a little slower at lookup than skew binary random access lists, but not by much. Their append is theoretically slower than the O(1) implementations, but it's very hard to construct an example that shows the difference, since you have to consume the sequence to make the appends happen. Non-persistent implementations (e.g. the old 2-list queue) can be a bit faster, but you have to be very careful how you use them to preserve their performance.

Ross Paterson

midnight

New subject: Re[2]: RFC: general sequences

On Mon, May 23, 2005 at 10:22:38PM +0200, Benjamin Franksen wrote:

...

On Monday 23 May 2005 18:41, Bulat Ziganshin wrote:

...
at least, it will be great to have analogues of length, map, filter, partition and a number of other operations defined in classes and supported for variety of data structures. i think it's omission in H98 standard that this names belongs only to lists and not defined in some classes like (+) and (>)

A agree completely. Almost all of the functions in Data.Seq have the same name and type signature (modulo the data type) and provide the same functionality as the corresponding ones for lists (in the Prelude). A type class Sequence to capture these common features would be great. It would also mean we can import module Sequence unqualified. A problem is that the names conflict with the ones from the Prelude, so that this style can only be used when hiding all the list stuff from Prelude.

This can't be fixed by adding a new class. We can't change the type of Prelude.length, and that includes generalizing it. The only alterative to hiding/qualification is to give our functions different names.

Bulat Ziganshin

6:36 a.m.

New subject: Re[4]: RFC: general sequences

Hello Ross, Tuesday, May 24, 2005, 4:00:20 AM, you wrote: RP> This can't be fixed by adding a new class. We can't change the type of RP> Prelude.length, and that includes generalizing it. The only alterative RP> to hiding/qualification is to give our functions different names. of course there is old trick: import Prelude (IO) import FancyNewPrelude or: import Prelude hiding (length,head...) import GeneralContainer we already have old-style `catch` in Prelude which must be hided when importing Concureent.Exception, so this usage of "import Prelude..." will be not new -- Best regards, Bulat mailto:bulatz@HotPOP.com

Henning Thielemann

9 a.m.

On Tue, 24 May 2005, Ross Paterson wrote:

...

On Mon, May 23, 2005 at 10:22:38PM +0200, Benjamin Franksen wrote:

...
A agree completely. Almost all of the functions in Data.Seq have the same name and type signature (modulo the data type) and provide the same functionality as the corresponding ones for lists (in the Prelude). A type class Sequence to capture these common features would be great. It would also mean we can import module Sequence unqualified. A problem is that the names conflict with the ones from the Prelude, so that this style can only be used when hiding all the list stuff from Prelude.

This can't be fixed by adding a new class. We can't change the type of Prelude.length, and that includes generalizing it. The only alterative to hiding/qualification is to give our functions different names.

What is so bad about qualification? Although it might be cumbersome for everyday functions like 'map' and infix operators it's good style for any other functions. With qualification a user module would look like import qualified SomeGoodNameForASequenceClassModule as GenSeq import qualified Data.Sequence as Seq filterEven :: GenSeq.C s => s Int -> s Int filterEven = GenSeq.filter even filterEven' :: Seq.T Int -> Seq.T Int filterEven' = GenSeq.filter even

Benjamin Franksen

9:35 a.m.

On Tuesday 24 May 2005 11:00, Henning Thielemann wrote:

...

On Tue, 24 May 2005, Ross Paterson wrote:

...
On Mon, May 23, 2005 at 10:22:38PM +0200, Benjamin Franksen wrote:

...
A agree completely. Almost all of the functions in Data.Seq have the same name and type signature (modulo the data type) and provide the same functionality as the corresponding ones for lists (in the Prelude). A type class Sequence to capture these common features would be great. It would also mean we can import module Sequence unqualified. A problem is that the names conflict with the ones from the Prelude, so that this style can only be used when hiding all the list stuff from Prelude.

This can't be fixed by adding a new class. We can't change the type of Prelude.length, and that includes generalizing it. The only alterative to hiding/qualification is to give our functions different names.

What is so bad about qualification? Although it might be cumbersome for everyday functions like 'map' and infix operators it's good style for any other functions. With qualification a user module would look like

import qualified SomeGoodNameForASequenceClassModule as GenSeq import qualified Data.Sequence as Seq

filterEven :: GenSeq.C s => s Int -> s Int filterEven = GenSeq.filter even

filterEven' :: Seq.T Int -> Seq.T Int filterEven' = GenSeq.filter even

Do you want to write such wrappers for each 'everyday' function from -- say -- Set, Map, Seq or similar standard collection modules? Surely not. You have answered your question yourself: Qualified import is bad for everyday functions like 'map' and also for operators. The problem with 'map' is that it is restricted to lists and not available for other collections. I think 'filter' is an everyday function like 'map'. All these should be members of appropriate type classes. That doesn't mean I am against qualified imports. I personnally think qualified import is good for libraries that serve a special purpose, for instance a GUI library or a /special/ implementation of some data structure. I think it is less appropriate for the standard collection ADTs that should be available for everyday use in the most easy-to-use manner. Type classes are a lot easier to use, because the compiler selects the correct instance, not the programmer. Ben

Henning Thielemann

10:17 a.m.

On Tue, 24 May 2005, Benjamin Franksen wrote:

...

You have answered your question yourself: Qualified import is bad for everyday functions like 'map' and also for operators. The problem with 'map' is that it is restricted to lists and not available for other collections. I think 'filter' is an everyday function like 'map'.

Agreed

...

All these should be members of appropriate type classes.

Dito. Since I would call the functions of Data.Sequence in a qualified way it would be no difference whether they are functions for Data.Sequence or methods of a type class. In the case of (++) and (!!) I would hide the prelude functions. This effort is only necessary for modules actually using Data.Sequence and only as long as the Prelude is as it is today. (How long will this be? :-)

...

That doesn't mean I am against qualified imports. I personnally think qualified import is good for libraries that serve a special purpose, for instance a GUI library or a /special/ implementation of some data structure. I think it is less appropriate for the standard collection ADTs that should be available for everyday use in the most easy-to-use manner. Type classes are a lot easier to use, because the compiler selects the correct instance, not the programmer.

me too

Benjamin Franksen

9:21 a.m.

On Tuesday 24 May 2005 02:00, Ross Paterson wrote:

...

On Mon, May 23, 2005 at 10:22:38PM +0200, Benjamin Franksen wrote:

...
On Monday 23 May 2005 18:41, Bulat Ziganshin wrote:

...
at least, it will be great to have analogues of length, map, filter, partition and a number of other operations defined in classes and supported for variety of data structures. i think it's omission in H98 standard that this names belongs only to lists and not defined in some classes like (+) and (>)

A agree completely. Almost all of the functions in Data.Seq have the same name and type signature (modulo the data type) and provide the same functionality as the corresponding ones for lists (in the Prelude). A type class Sequence to capture these common features would be great. It would also mean we can import module Sequence unqualified. A problem is that the names conflict with the ones from the Prelude, so that this style can only be used when hiding all the list stuff from Prelude.

This can't be fixed by adding a new class.

That is what I meant to say ;)

...

We can't change the type of Prelude.length, and that includes generalizing it.

Yes. Prelude an the H98 standard lib are in in need of a large overhaul.

...

The only alterative to hiding/qualification is to give our functions different names.

...or use qualified import as Henning suggested. Ben

Adrian Hey

23 May 23 May

12:48 p.m.

Hello, I'd like to know about the space behaviour of the folds and whether or not you need more fold variants. I found that for AVL trees you really need a plethora of different folds to give users proper control. http://homepages.nildram.co.uk/~ahey/HLibs/Data.Tree.AVL/Data.Tree.AVL.List.... The API looks a bit simplistic at the moment in this respect. But maybe I was unduly anal about this for AVL trees, or maybe there's something different about your finger tree implementation. Regards -- Adrian Hey On Monday 23 May 2005 12:13 pm, Ross Paterson wrote:

...

A general implementation of sequences (based on work with Ralf Hinze) can be found at

http://www.soi.city.ac.uk/~ross/software/html/Data.Sequence.html http://www.soi.city.ac.uk/~ross/software/Data/Sequence.hs

Our experiments indicate that its performance is comparable to (and sometimes better than) the best known persistent implementations(*). Non-persistent implementations are typically faster, but you need to be more careful when using them.

I'd like to propose this for base.

Comments welcome.

(*) or at least it will be when GHC's SPECIALIZE pragma is fixed for polymorphic specializations (SF bug #1019758). _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Ross Paterson

3:35 p.m.

On Mon, May 23, 2005 at 01:48:14PM +0100, Adrian Hey wrote:

...

I'd like to know about the space behaviour of the folds and whether or not you need more fold variants.

Hmm, there are a number of choices just for foldr: foldr_l :: (a -> b -> b) -> b -> Seq a -> b foldr_l f z xs = case viewL xs of EmptyL -> z x :< xs' -> x `f` foldr_l f z xs' -- same result as foldr_l, but different performance foldr_r :: (a -> b -> b) -> b -> Seq a -> b foldr_r f z xs = case viewR xs of EmptyR -> z xs' :> x -> foldr_r f (x `f` z) xs' -- strict version foldr_r' :: (a -> b -> b) -> b -> Seq a -> b foldr_r' f z xs = case viewR xs of EmptyR -> z xs' :> x -> let z' = x `f` z in z' `seq` foldr_r' f z' xs' The current definition is equivalent to the first (but operates directly on the internal structure). I can't think of a situation where I'd want the second. The strict version would be useful, and probably a monadic version too (of which strictness could be a special case).

Benjamin Franksen

5:24 p.m.

On Monday 23 May 2005 17:35, Ross Paterson wrote:

...

On Mon, May 23, 2005 at 01:48:14PM +0100, Adrian Hey wrote:

...
I'd like to know about the space behaviour of the folds and whether or not you need more fold variants.

Hmm, there are a number of choices just for foldr:

foldr_l :: (a -> b -> b) -> b -> Seq a -> b foldr_l f z xs = case viewL xs of EmptyL -> z x :< xs' -> x `f` foldr_l f z xs'

-- same result as foldr_l, but different performance foldr_r :: (a -> b -> b) -> b -> Seq a -> b foldr_r f z xs = case viewR xs of EmptyR -> z xs' :> x -> foldr_r f (x `f` z) xs'

-- strict version foldr_r' :: (a -> b -> b) -> b -> Seq a -> b foldr_r' f z xs = case viewR xs of EmptyR -> z xs' :> x -> let z' = x `f` z in z' `seq` foldr_r' f z' xs'

The current definition is equivalent to the first (but operates directly on the internal structure). I can't think of a situation where I'd want the second. The strict version would be useful, and probably a monadic version too (of which strictness could be a special case).

I wonder: Can a compiler optimize the above so that it is as efficient as the 'real' version? If not, why? If yes, everybody could easily create their own folds. BTW, this module (or at least something similar) should definitely be in the standard libs. Ben

Ross Paterson

24 May 24 May

10:44 a.m.

On Mon, May 23, 2005 at 07:24:50PM +0200, Benjamin Franksen wrote:

...

On Monday 23 May 2005 17:35, Ross Paterson wrote:

...
foldr_l :: (a -> b -> b) -> b -> Seq a -> b foldr_l f z xs = case viewL xs of EmptyL -> z x :< xs' -> x `f` foldr_l f z xs'

The current definition is equivalent to [this] (but operates directly on the internal structure).

I wonder: Can a compiler optimize the above so that it is as efficient as the 'real' version? If not, why? If yes, everybody could easily create their own folds.

No, they're too different in structure for a compiler to work out the transformation. People can still create their own folds -- they just lose a constant factor over the insiders.

Tomasz Zielonka

23 May 23 May

2:19 p.m.

On Mon, May 23, 2005 at 12:13:48PM +0100, Ross Paterson wrote:

...

I'd like to propose this for base.

That would be great!

...

Comments welcome.

Are these operations really O(1) : (<|), (|>), viewL, viewR? Best regards Tomasz

Ross Paterson

3:43 p.m.

On Mon, May 23, 2005 at 04:19:12PM +0200, Tomasz Zielonka wrote:

...

Are these operations really O(1) : (<|), (|>), viewL, viewR?

Yes, the amortized time of these operations is independent of the size of the sequence. Not worst case, but Haskell's laziness means most things are amortized anyway. Chris Okasaki's book "Purely Functional Data Structures" contains a number of implementations of deques with such bounds on these operations. There are even implementations of deques with O(1) worst case bounds, but these are quite a bit more complex.

Tomasz Zielonka

2:26 p.m.

On Mon, May 23, 2005 at 12:13:48PM +0100, Ross Paterson wrote:

...

A general implementation of sequences (based on work with Ralf Hinze) can be found at

http://www.soi.city.ac.uk/~ross/software/html/Data.Sequence.html http://www.soi.city.ac.uk/~ross/software/Data/Sequence.hs

Our experiments indicate that its performance is comparable to (and sometimes better than) the best known persistent implementations(*). Non-persistent implementations are typically faster, but you need to be more careful when using them.

I'd like to propose this for base.

Comments welcome.

Another thing: O(1) reverse would be very nice if it could be implemented with no overhead for other operations ;-) Best regards Tomasz

Josef Svenningsson

2:52 p.m.

On 5/23/05, Tomasz Zielonka wrote:

...

On Mon, May 23, 2005 at 12:13:48PM +0100, Ross Paterson wrote:

...
A general implementation of sequences (based on work with Ralf Hinze) can be found at

http://www.soi.city.ac.uk/~ross/software/html/Data.Sequence.html http://www.soi.city.ac.uk/~ross/software/Data/Sequence.hs

Our experiments indicate that its performance is comparable to (and sometimes better than) the best known persistent implementations(*). Non-persistent implementations are typically faster, but you need to be more careful when using them.

I'd like to propose this for base.

Comments welcome.

Another thing: O(1) reverse would be very nice if it could be implemented with no overhead for other operations ;-)

That should be trivial to implement. Just add a bit in the Seq data type which indicates whether the sequence should be treated backwards or forwards. The reverse operation then just toggles that bit. Or am I missing something? Also, I have some comments about the complexities in the documentation. First of all I think it is useful to say what exactly indices like n and i refer to even if it might not be that hard to figure out. Secondly it seems that the complexity of the indexing functions is wrong. Shouldn't they be O(log(min(i,n-i)))? Otherwise I think it looks like a really useful library! /Josef

Ross Paterson

3:07 p.m.

On Mon, May 23, 2005 at 04:52:04PM +0200, Josef Svenningsson wrote:

...

On 5/23/05, Tomasz Zielonka wrote:

...
Another thing: O(1) reverse would be very nice if it could be implemented with no overhead for other operations ;-)

That should be trivial to implement. Just add a bit in the Seq data type which indicates whether the sequence should be treated backwards or forwards. The reverse operation then just toggles that bit. Or am I missing something?

True -- it would mean a bit more code, though.

...

Also, I have some comments about the complexities in the documentation. First of all I think it is useful to say what exactly indices like n and i refer to even if it might not be that hard to figure out. Secondly it seems that the complexity of the indexing functions is wrong. Shouldn't they be O(log(min(i,n-i)))?

Ouch! thanks.

John Meacham

8:09 p.m.

I like it a lot! needs (the obvious) instances for: Data.Monoid Data.FunctorM Typeable and maybe a specialized instance for Data? (that hides the internal structure, but lets you generically traverse the elements in a sequence) And can you compare quickly it with Data.Queue? It also appears to be constant time, but has a lot less operations, does it win you anything over your sequences or is Data.Queue strictly inferior. John -- John Meacham - ⑆repetae.net⑆john⑈

Benjamin Franksen

8:28 p.m.

On Monday 23 May 2005 22:09, John Meacham wrote:

...

I like it a lot!

Me, too!

...

And can you compare quickly it with Data.Queue? It also appears to be constant time, but has a lot less operations, does it win you anything over your sequences or is Data.Queue strictly inferior.

I think the main difference is that Data.Queue is asymmetric, i.e. constant time enQueueing on one side, constant time deQueueing on the other side, whereas Data.Seq is symmetric, i.e. elements can be added to and taken from at both ends in constant time. Ben

Tomasz Zielonka

8:40 p.m.

On Mon, May 23, 2005 at 10:28:29PM +0200, Benjamin Franksen wrote:

...

...
And can you compare quickly it with Data.Queue? It also appears to be constant time, but has a lot less operations, does it win you anything over your sequences or is Data.Queue strictly inferior.

I think the main difference is that Data.Queue is asymmetric, i.e. constant time enQueueing on one side, constant time deQueueing on the other side, whereas Data.Seq is symmetric, i.e. elements can be added to and taken from at both ends in constant time.

I think what John wants to say is that Data.Queue is unneccesary unless it is more efficient than Data.Seq used as a queue. Best regards Tomasz

Ross Paterson

24 May 24 May

12:02 a.m.

On Mon, May 23, 2005 at 01:09:11PM -0700, John Meacham wrote:

...

And can you compare quickly it with Data.Queue? It also appears to be constant time, but has a lot less operations, does it win you anything over your sequences or is Data.Queue strictly inferior.

Thanks for reminding me -- I also think Data.Queue should be phased out in favour of this module. Data.Queue is implemented using Chris Okasaki's real-time queues. These are unspeakably cute, but they're also much slower than say Okasaki's bankers queues, and the real-time feature is pretty much unavailable in a lazy language like Haskell. As queues, finger trees seem to be at least as fast as bankers queues, and sometimes faster, so it seems they dominate bankers queues (and bankers deques) too.

Malcolm Wallace

10:23 a.m.

Ross Paterson writes:

...

A general implementation of sequences (based on work with Ralf Hinze)

http://www.soi.city.ac.uk/~ross/software/html/Data.Sequence.html

Is Sequence intended to be a symmetric structure, or is it intended to be left-biased? The constructors seem to go for symmetry, but many of the other operations are left-biased. For instance: index adjust update take drop splitAt all count from the left end of the Sequence. Wouldn't it be useful to have the equivalent right-biased operations too? indexL indexR adjustL adjustR updateL updateR takeL takeR dropL dropR splitAtL splitAtR Also, there was debate at the time of the Haskell'98 committee about whether the list operations with an Int argument would be better taking an Integer. I believe the consensus was positive towards unbounded Integer, but it was important for the standard to be backwards compatible, so Int remained. However, with a new library, you have the opportunity to get it right from the start. Regards, Malcolm

Duncan Coutts

10:38 a.m.

On Tue, 2005-05-24 at 11:23 +0100, Malcolm Wallace wrote:

...

Ross Paterson writes:

...

Also, there was debate at the time of the Haskell'98 committee about whether the list operations with an Int argument would be better taking an Integer. I believe the consensus was positive towards unbounded Integer, but it was important for the standard to be backwards compatible, so Int remained. However, with a new library, you have the opportunity to get it right from the start.

There is also the argument that an Int is always big enough to be used as an index into a data structure that can fit in memory. This also applies to 64 bit machines because the Int just gets bigger. And so the argument goes, if Int is sufficient then there is the performance advantage of simple machine integers. Duncan

Henning Thielemann

1:01 p.m.

On Tue, 24 May 2005, Duncan Coutts wrote:

...

On Tue, 2005-05-24 at 11:23 +0100, Malcolm Wallace wrote:

...
Ross Paterson writes:

...
Also, there was debate at the time of the Haskell'98 committee about whether the list operations with an Int argument would be better taking an Integer. I believe the consensus was positive towards unbounded Integer, but it was important for the standard to be backwards compatible, so Int remained. However, with a new library, you have the opportunity to get it right from the start.

There is also the argument that an Int is always big enough to be used as an index into a data structure that can fit in memory. This also applies to 64 bit machines because the Int just gets bigger. And so the argument goes, if Int is sufficient then there is the performance advantage of simple machine integers.

A list can be infinite, the garbage collector can throw away nodes with small indices. So it is possible to access nodes above INT_MAX. E.g. x!!(1e10) will generate 1e10 nodes but not all of them are hold in memory.

Ross Paterson

1:07 p.m.

On Tue, May 24, 2005 at 03:01:46PM +0200, Henning Thielemann wrote:

...

On Tue, 24 May 2005, Duncan Coutts wrote:

...
There is also the argument that an Int is always big enough to be used as an index into a data structure that can fit in memory. This also applies to 64 bit machines because the Int just gets bigger. And so the argument goes, if Int is sufficient then there is the performance advantage of simple machine integers.

A list can be infinite, the garbage collector can throw away nodes with small indices. So it is possible to access nodes above INT_MAX. E.g. x!!(1e10) will generate 1e10 nodes but not all of them are hold in memory.

Lists can be infinite, but not sequences, as the operations are strict.

Jon Fairbairn

1:47 p.m.

On 2005-05-24 at 11:38BST Duncan Coutts wrote:

...

On Tue, 2005-05-24 at 11:23 +0100, Malcolm Wallace wrote:

...
Also, there was debate at the time of the Haskell'98 committee about whether the list operations with an Int argument would be better taking an Integer. I believe the consensus was positive towards unbounded Integer, [...]

There is also the argument that an Int is always big enough to be used as an index into a data structure that can fit in memory. This also applies to 64 bit machines because the Int just gets bigger. And so the argument goes, if Int is sufficient then there is the performance advantage of simple machine integers.

But that's not the real problem. Doing arithmetic on indexes for these things forces the other variables in the computation to be inferred as Int -- so instead of defaulting to the safer Integer, other computations end up being done in finite Ints. Best choice is Integral, then when nothing is specified you get Integer, but if you care about performance and know that either the other computations involved also fit in Int or you've decorated your expressions with fromIntegral, you can get the performance advantage of Int. Jón -- Jón Fairbairn Jon.Fairbairn at cl.cam.ac.uk

Ross Paterson

10:39 a.m.

On Tue, May 24, 2005 at 11:23:40AM +0100, Malcolm Wallace wrote:

...

Is Sequence intended to be a symmetric structure, or is it intended to be left-biased? The constructors seem to go for symmetry, but many of the other operations are left-biased. For instance:

index adjust update take drop splitAt

all count from the left end of the Sequence. Wouldn't it be useful to have the equivalent right-biased operations too?

indexL indexR adjustL adjustR updateL updateR takeL takeR dropL dropR splitAtL splitAtR

Fair point. They wouldn't cost much, and the structure should be symmetrical.

...

Also, there was debate at the time of the Haskell'98 committee about whether the list operations with an Int argument would be better taking an Integer. I believe the consensus was positive towards unbounded Integer, but it was important for the standard to be backwards compatible, so Int remained. However, with a new library, you have the opportunity to get it right from the start.

Perhaps Integral, like List.generic*, would be a good idea. The structure uses Ints internally, though.

7346

Age (days ago)

7347

Last active (days ago)

List overview

Download

32 comments

11 participants

participants (11)

Adrian Hey
Benjamin Franksen
Bulat Ziganshin
Duncan Coutts
Henning Thielemann
John Meacham
Jon Fairbairn
Josef Svenningsson
Malcolm Wallace
Ross Paterson
Tomasz Zielonka