The difficulty of designing a sequence class

newer
Fw: [Haskell-cafe] The difficulty...

Brian Hulley

30 Jul 2006 30 Jul '06

11:47 a.m.

Hi - Part 1 of 2 - Monoid versus MonadPlus =========================== I've just run into a troublesome question when trying to design a sequence class: class ISeq c a | c -> a where empty :: c single :: a -> c append :: c -> c -> c However I've noticed that people sometimes separate the empty and append operations out as sequences with these ops form a Monoid therefore: class Monoid c => ISeq c a | c -> a where single :: a -> c -- now outside the class append :: ISeq c a => c -> c -> c append = mappend empty :: ISeq c a => c empty = mempty Another option, is the Edison library which uses: class (Functor s, MonadPlus s) => Sequence s where so here MonadPlus is used instead of Monoid to provide empty and append. So I've got three main questions: 1) Did Edison choose MonadPlus just because this fitted in with the lack of multi-parameter typeclasses in H98? 2) Are there any reasons to prefer the Edison design over the MPTC design (apart from H98 compatibility) or vice versa? 3) Is it worth bothering to derive ISeq from Monoid (with the possible extra inefficiency of the indirection through the definitions for append = mappend etc or does the compiler completely optimize this out)? and a fourth more long term question: 4) Would it be worth reconsidering the rules for top level names so that class methods could always be local to their class (ditto for value constructors and field names being local to their type constructor). For example it would be nice imho to be able to write: class Monoid c => ISeq c a | c -> a where length :: c -> Int f x y = Monoid/append x y -- or ISeq/append x y g x = ISeq/length x instead of having all names collide in the top level of a module, since at the moment it is difficult to think of method names that don't collide with the Prelude, and it's not nice to have to write "mempty" in place of "empty". Part 2 of 2 - Monad versus Ancillary result type ================================ Another issue relates to left and right views of a sequence. If a sequence is non-empty, the left view is just the leftmost element and the rest of the sequence. The problem arises when the sequence is empty. In the Edison library, this is solved by returning a monadic value ie: lview :: Monad m => s a -> m (a, s a) whereas from the paper "Finger trees: a simple general purpose data structure" by Ralf Hinze and Ross Paterson they use an ancillary data type to store the result of a view: data ViewL s a = NilL | ConsL a (s a) viewL :: FingerTree a -> ViewL FingerTree a So my question here is: what's the best choice? I can see that the monadic version has the advantage that it could be used in do notation in cases where you expect the sequence to be non-empty, but has the disadvantage that it treats the empty sequence as something special (resulting in Monad/fail) and an extra indirection to find the components when the sequence is non-empty. Anyway these are my main questions for now - any feedback appreciated ;-) Thanks, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Show replies by date

Robert Dockins

30 Jul 30 Jul

6:36 p.m.

On Sunday 30 July 2006 07:47, Brian Hulley wrote:

...

Hi -

Part 1 of 2 - Monoid versus MonadPlus ===========================

I've just run into a troublesome question when trying to design a sequence class:

class ISeq c a | c -> a where empty :: c single :: a -> c append :: c -> c -> c

However I've noticed that people sometimes separate the empty and append operations out as sequences with these ops form a Monoid therefore:

class Monoid c => ISeq c a | c -> a where single :: a -> c

-- now outside the class append :: ISeq c a => c -> c -> c append = mappend

empty :: ISeq c a => c empty = mempty

Another option, is the Edison library which uses:

class (Functor s, MonadPlus s) => Sequence s where

so here MonadPlus is used instead of Monoid to provide empty and append. So I've got three main questions:

...

1) Did Edison choose MonadPlus just because this fitted in with the lack of multi-parameter typeclasses in H98?

Edison's design hails from a time when MPTCs were not only non-standard (as they still are), but also not widely used, and before fundeps were avaliable (I think). So the answer to this one is pretty much "yes". I've considered reformulating the Sequence class to be more similar to the Collection classes (which use MPTCs, fundeps and mention the element type), but for the 1.2 version I wanted to make as few changes as I thought I could to the overall design decisions. In fact, I will likely make this change at some point. It would allow, eg, making Don's ByteString (or whatever it's called now, I forget) an instance of Sequence, which is currently impossible. OTOH, it would require sacrificing the Functor, Monad and MonadPlus instances...

...

2) Are there any reasons to prefer the Edison design over the MPTC design (apart from H98 compatibility) or vice versa?

H98 is probably the big one. I'm currently in wait-and-see mode concerning MPTCs and especially fundeps. As Edison maintainer, I've tried to use them sparingly in the hope that Edison can be made Haskell' compliant (crosses fingers). Aside: I hope the Haskell' committee makes some sort of decision here soonish.

...

3) Is it worth bothering to derive ISeq from Monoid (with the possible extra inefficiency of the indirection through the definitions for append = mappend etc or does the compiler completely optimize this out)?

Not sure about this one. I suspect, however, that the appropriate SPECIALIZE pragmas could cover any cases that you really care about.

...

and a fourth more long term question:

4) Would it be worth reconsidering the rules for top level names so that class methods could always be local to their class (ditto for value constructors and field names being local to their type constructor).

[snip more question] No comment.

...

Part 2 of 2 - Monad versus Ancillary result type ================================

Another issue relates to left and right views of a sequence. If a sequence is non-empty, the left view is just the leftmost element and the rest of the sequence. The problem arises when the sequence is empty. In the Edison library, this is solved by returning a monadic value ie:

lview :: Monad m => s a -> m (a, s a)

whereas from the paper "Finger trees: a simple general purpose data structure" by Ralf Hinze and Ross Paterson they use an ancillary data type to store the result of a view:

data ViewL s a = NilL | ConsL a (s a)

viewL :: FingerTree a -> ViewL FingerTree a

So my question here is: what's the best choice? I can see that the monadic version has the advantage that it could be used in do notation in cases where you expect the sequence to be non-empty, but has the disadvantage that it treats the empty sequence as something special (resulting in Monad/fail) and an extra indirection to find the components when the sequence is non-empty.

Well, the empty sequence IS special, when it comes to looking the leftmost (resp. righmost) element. You have to deal somehow with the fact that the operation is a partial function. I think the arbitrary monad option is better. It gives the user more flexibility about how to use the operation. Really the only way to use ViewL is to put it inside a "case of". With a monad you can use any of the plethora of monadic operations and, as you mentioned, the do notation. In addition, if you want the use case of ViewL, you can always use the Maybe monad. I'm not inclined to worry too much about the extra indirection, which seems like a viable target for being erased by the compiler (I've not tested if this happens, however).

...

Anyway these are my main questions for now - any feedback appreciated ;-)

BTW, for what purpose are you desiging a new sequence class? You are clearly aware of other efforts in this area; in what ways to they not meet your needs?

...

Thanks, Brian.

-- Rob Dockins Talk softly and drive a Sherman tank. Laugh hard, it's a long way to the bank. -- TMBG

Brian Hulley

9:28 p.m.

Robert Dockins wrote:

...

On Sunday 30 July 2006 07:47, Brian Hulley wrote:

...
Another option, is the Edison library which uses:

class (Functor s, MonadPlus s) => Sequence s where

so here MonadPlus is used instead of Monoid to provide empty and append. So I've got three main questions:

...
1) Did Edison choose MonadPlus just because this fitted in with the lack of multi-parameter typeclasses in H98? Edison's design hails from a time when MPTCs were not only non-standard (as they still are), but also not widely used, and before fundeps were avaliable (I think). So the answer to this one is pretty much "yes". [snip]

Hi - Thanks for the answers to this and my other questions. One thing I just realised is that there doesn't seem to be any instance declarations anywhere in the standard libs relating Monoid to MonadPlus so it's a bit unsettling to have to make a "random" choice on the question of what kind of object a Sequence is... I tried: class (forall a. Monoid s a) => Sequence s where ... but of course that doesn't work, so I suppose MonadPlus is the only option when 'a' doesn't appear as a type variable arg of the class being defined.

...

BTW, for what purpose are you desiging a new sequence class? You are clearly aware of other efforts in this area; in what ways to they not meet your needs?

The existing sequence and collection classes I've looked at don't do enough. For example, when I tried to represent the text in an edit widget, I realised I needed a sequence of characters that could also be considered to be a sequence of lines, and it is necessary to be able to index the sequence by character position as well as by line position, as well as keeping track of the total number of characters, the total number of lines, and the maximum number of characters on any one line (so as to be able to calculate the x,y extents when laying out the widget, assuming a fixed width font (tabs ignored!)), with very efficient split and append operations. I managed to get a good representation by using a FingerTree of lines where each line uses a ByteString. I made my own FingerTree class based on the one referenced in the paper at http://www.soi.city.ac.uk/~ross/papers/FingerTree.html but without the symbolic names which I find totally unreadable and confusing, and also so I could get full control of the strictness of the implementation, and also as a way of understanding them since I'd never come across such a complicated data structure before. (I highly recommend this paper to anyone who wants to learn about FingerTrees, Monoids and other very useful concepts.) So one thing existing sequence classes don't have (apart from FingerTree) is the concept of measurement which is essential when you want efficient updates. Eg in my text buffer, the measurement maintained for a sequence is the number of chars and number of lines and maximum line length. Then I needed a structure for a Trie widget a bit like (details omitted): data Node = Expanded Value T | Collapsed Value T | Leaf Value newtype T = T (FingerTree (Key, Node)) where objects of type T could be regarded as a finite map (eg from hierarchical module names to modules) as well as a flattened linear sequence indexed by line number (for display on the screen in a widget given the current scroll bar position), and which also needed to keep track of the total horizontal and vertical extent of the Trie as it would appear in the widget's font. There are several different kinds of measurement going on in this data structure, as well as the complexity of the extra recursion through the leaf to a new level. Existing sequence abstractions don't seem to provide the operations needed to treat a nested data structure as a single sequence. In summary: 1) Often a complex data structure must be able to be simultaneously regarded as a single flattened sequence 2) Measurements are needed for efficient updates (may need to keep track of several at once) 3) Indexing and size are sometimes needed relative to the flattened sequence not just the top level 4) It is useful to have a finite map that can also be regarded as a linear sequence 5) Such finite maps may also be nested (when the keys are hierarchical) but this nesting should be hidden from the user... 6) I want a design that can allow complex data structures to be built up easily and instanced to the appropriate interfaces 7) Also naming conventions in the existing libs are a bit irregular and burdened with old fashioned lisp-isms eg in Data.Edison.Seq there are functions "lview" and "reducel" but I'd argue that there must be one and only one way of forming any identifier in any program namely that the function should appear first followed by qualifiers (so that related functionality always appears together in a lexicographical listing of functions) and it must use camel case with no exceptions at all, thus "viewL" and "reduceL" (and "foldL"). 8) More factoring needs to be done since not all sequences need to be indexed or measured or to be "flattened through the leaf" (eg the FingerTree paper already has a separate class for Reduce and I believe their implementation also referred to a class for Foldable) rather than bundling everything in a single Sequence class. Anyway apologies for my very rambling answer - I'm still a long way from finding a good set of classes to address the above issues :-) Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Robert Dockins

31 Jul 31 Jul

1:16 p.m.

On Jul 30, 2006, at 5:28 PM, Brian Hulley wrote:

...

Robert Dockins wrote:

...
On Sunday 30 July 2006 07:47, Brian Hulley wrote:

...
Another option, is the Edison library which uses:

class (Functor s, MonadPlus s) => Sequence s where

so here MonadPlus is used instead of Monoid to provide empty and append. So I've got three main questions:

...
1) Did Edison choose MonadPlus just because this fitted in with the lack of multi-parameter typeclasses in H98? Edison's design hails from a time when MPTCs were not only non-standard (as they still are), but also not widely used, and before fundeps were avaliable (I think). So the answer to this one is pretty much "yes". [snip]

Hi - Thanks for the answers to this and my other questions. One thing I just realised is that there doesn't seem to be any instance declarations anywhere in the standard libs relating Monoid to MonadPlus so it's a bit unsettling to have to make a "random" choice on the question of what kind of object a Sequence is...

I tried:

class (forall a. Monoid s a) => Sequence s where ...

but of course that doesn't work, so I suppose MonadPlus is the only option when 'a' doesn't appear as a type variable arg of the class being defined.

...
BTW, for what purpose are you desiging a new sequence class? You are clearly aware of other efforts in this area; in what ways to they not meet your needs?

The existing sequence and collection classes I've looked at don't do enough.

For example, when I tried to represent the text in an edit widget, I realised I needed a sequence of characters that could also be considered to be a sequence of lines, and it is necessary to be able to index the sequence by character position as well as by line position, as well as keeping track of the total number of characters, the total number of lines, and the maximum number of characters on any one line (so as to be able to calculate the x,y extents when laying out the widget, assuming a fixed width font (tabs ignored!)), with very efficient split and append operations.

So, what you want is a sequence of sequences that can be transparently converted to a "flattened" sequence and vice versa? And you also want to keep track of the total number of lines and characters within each line. Additionally, you want to keep track of the maximum number of characters in any one line.

...

I managed to get a good representation by using a FingerTree of lines where each line uses a ByteString. I made my own FingerTree class based on the one referenced in the paper at http://www.soi.city.ac.uk/~ross/papers/FingerTree.html but without the symbolic names which I find totally unreadable and confusing, and also so I could get full control of the strictness of the implementation, and also as a way of understanding them since I'd never come across such a complicated data structure before. (I highly recommend this paper to anyone who wants to learn about FingerTrees, Monoids and other very useful concepts.)

So one thing existing sequence classes don't have (apart from FingerTree) is the concept of measurement which is essential when you want efficient updates. Eg in my text buffer, the measurement maintained for a sequence is the number of chars and number of lines and maximum line length.

Edison has support for transparently keeping track of the size of a sequence. http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison-Seq- SizedSeq.html It may well be possible to create a slightly generalized wrapper that keeps track of arbitrary "measures". (If they can be computed by a function which is associative, commutative and has a unit). Humm, sort of an incremental fold.... I like it.

...

Then I needed a structure for a Trie widget a bit like (details omitted):

data Node = Expanded Value T | Collapsed Value T | Leaf Value newtype T = T (FingerTree (Key, Node))

where objects of type T could be regarded as a finite map (eg from hierarchical module names to modules) as well as a flattened linear sequence indexed by line number (for display on the screen in a widget given the current scroll bar position), and which also needed to keep track of the total horizontal and vertical extent of the Trie as it would appear in the widget's font.

There are several different kinds of measurement going on in this data structure, as well as the complexity of the extra recursion through the leaf to a new level. Existing sequence abstractions don't seem to provide the operations needed to treat a nested data structure as a single sequence.

In summary:

1) Often a complex data structure must be able to be simultaneously regarded as a single flattened sequence 2) Measurements are needed for efficient updates (may need to keep track of several at once) 3) Indexing and size are sometimes needed relative to the flattened sequence not just the top level 4) It is useful to have a finite map that can also be regarded as a linear sequence 5) Such finite maps may also be nested (when the keys are hierarchical) but this nesting should be hidden from the user... 6) I want a design that can allow complex data structures to be built up easily and instanced to the appropriate interfaces 7) Also naming conventions in the existing libs are a bit irregular and burdened with old fashioned lisp-isms eg in Data.Edison.Seq there are functions "lview" and "reducel" but I'd argue that there must be one and only one way of forming any identifier in any program namely that the function should appear first followed by qualifiers (so that related functionality always appears together in a lexicographical listing of functions) and it must use camel case with no exceptions at all, thus "viewL" and "reduceL" (and "foldL").

OK. Point taken. I'm not sure I agree with the no-exceptions camel- case, but the lexicographical-listing-groups-functionality holds strong appeal for me.

...

8) More factoring needs to be done since not all sequences need to be indexed or measured or to be "flattened through the leaf" (eg the FingerTree paper already has a separate class for Reduce and I believe their implementation also referred to a class for Foldable) rather than bundling everything in a single Sequence class.

Anyway apologies for my very rambling answer - I'm still a long way from finding a good set of classes to address the above issues :-)

Well, I guess I'd suggest you attempt to identify specific problems with already existing packages and attempt to work with those who maintain such packages before reinventing something as basic (and difficult to get right!) as data structure abstractions. Such maintainers may be willing to accept patches and/or implement requested features in order to reduce fragmentation in this space *hint, hint* :-) <soapbox type="Edison plug"> I personally think that Edison is a great piece of work, and I took up maintainership because I felt it was a great shame that no one was using it. My ultimate goal is to make Edison the package that everyone thinks of first when they discover they need a Haskell datastructure for some purpose. Even if Edison does not fill that need, I want every Haskeller to compare his needs against what Edison provides before striking out on his own, and I want that to be a decision made with some hesitation. Over time I hope to make the cases where Edison doesn't cut the mustard fewer and further between. So, if you've ever looked at Edison, or ever do so in the future, and decide it isn't what you need, please let me know why so I can make it better for the next time. After all, squeaky wheels get the grease, but only if I can hear the squeaking! </soapbox> Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Brian Hulley

2:50 p.m.

Robert Dockins wrote:

...

...
Robert Dockins wrote: So, what you want is a sequence of sequences that can be

On Jul 30, 2006, at 5:28 PM, Brian Hulley wrote: transparently converted to a "flattened" sequence and vice versa?

Yes as long as the conversion between them takes no time at all - the sequence of sequences and flattened sequence must coexist simultaneously. The concrete data structure is a sequence of sequences and the flattened sequence is a particular view of it.

...

And you also want to keep track of the total number of lines and characters within each line. Additionally, you want to keep track of the maximum number of characters in any one line.

Edison has support for transparently keeping track of the size of a sequence.

http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison-Seq- SizedSeq.html

I used this in an initial version of an edit buffer when I just used a SizedSeq wrapping a BinaryRandList to store the text as a sequence of chars. But the lack of ability to also index by line number and keep track of max line length was the problem that led me to use a finger tree instead. Of course I could have used a sequence of chars, a sequence of line lengths, and a bag of line lengths to keep track of everything, and kept them in sync, but after reading the FingerTree paper I was seduced by the idea of being able to represent all this stuff at once in a single data structure.

...

It may well be possible to create a slightly generalized wrapper that keeps track of arbitrary "measures". (If they can be computed by a function which is associative, commutative and has a unit). Humm, sort of an incremental fold.... I like it.

I got this from the FingerTree paper. A finger tree supports any measurement that is a Monoid (so it needs to be associative but not commutative (if it had to be commutative it would be impossible to use a sequence as a set or map, which I needed for my Trie structure)).

...

Well, I guess I'd suggest you attempt to identify specific problems with already existing packages and attempt to work with those who maintain such packages before reinventing something as basic (and difficult to get right!) as data structure abstractions.

The problem is that some people will be using Data.Edison.Seq at the moment and will naturally not want it to change. However I'd suggest that all the common operations be factored out into separate classes eg: class Foldable f where fold :: (a -> b -> b) -> b -> f a -> b foldL :: ... class Reduce f where -- based on FingerTree paper reduceR :: (a -> b -> b) -> (f a -> b -> b) reduceL :: (b -> a -> b) -> (b -> f a -> b) class TakeDrop f where take :: Int -> f a -> f a takeWhile :: (a -> Bool) -> f a -> f a drop ... class Filter f where filter :: (a -> Bool) -> f a -> f a partition :: (a -> Bool) -> f a -> (f a, f a) class Indexable f where length :: f a -> Int at :: Int -> f a -> f a -- (*) splitAt :: Int -> f a -> (f a, f a) (*) Data.ByteString.index puts the Int arg second. It's not at all clear to me which is best, because I often wish that the Int arg of take and drop was second also so I could write take g $! x+1 instead of (take $! x + 1) g though it's consistent with the arg order for takeWhile etc. I know you don't agree with the no-exception-camel-case idea, but I still would argue that this is essential if you want to have a consistent naming convention. I find it extremely confusing that a word like "reducer" is supposed to be read as "reduceR" because the word "reducer" means to me "something which reduces". It seems to me that a restructuring of the usual fold, reduce ops into classes is a great opportunity to perfect the naming of these functions to make life easier for generations to come... :-)

...

Such maintainers may be willing to accept patches and/or implement requested features in order to reduce fragmentation in this space *hint, hint* :-)

Point taken, although in the case of the above refactoring idea, I think it really is a Haskell-wide task because although there appears to be a defacto standard use of names like take, drop, splitAt etc, it's not nearly so clear which ops belong together and which should be separated out, and I personally don't have enough experience of Haskell yet to be able to recommend a solution.

...

<soapbox type="Edison plug"> I personally think that Edison is a great piece of work, and I took up maintainership because I felt it was a great shame that no one was using it. My ultimate goal is to make Edison the package that everyone thinks of first when they discover they need a Haskell datastructure for some purpose. Even if Edison does not fill that need, I want every Haskeller to compare his needs against what Edison provides before striking out on his own, and I want that to be a decision made with some hesitation. Over time I hope to make the cases where Edison doesn't cut the mustard fewer and further between.

So, if you've ever looked at Edison, or ever do so in the future, and decide it isn't what you need, please let me know why so I can make it better for the next time. After all, squeaky wheels get the grease, but only if I can hear the squeaking! </soapbox>

Best regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Brian Hulley

3:27 p.m.

Brian Hulley wrote:

...

class Foldable f where fold :: (a -> b -> b) -> b -> f a -> b foldL :: ...

class Reduce f where -- based on FingerTree paper reduceR :: (a -> b -> b) -> (f a -> b -> b) reduceL :: (b -> a -> b) -> (b -> f a -> b)

class TakeDrop f where take :: Int -> f a -> f a takeWhile :: (a -> Bool) -> f a -> f a drop ...

class Filter f where filter :: (a -> Bool) -> f a -> f a partition :: (a -> Bool) -> f a -> (f a, f a)

class Indexable f where length :: f a -> Int at :: Int -> f a -> f a -- (*) splitAt :: Int -> f a -> (f a, f a)

[snip]

...

I personally don't have enough experience of Haskell yet to be able to recommend a solution.

None of the above type classes would be compatible with Data.ByteString! (You mentioned this issue before wrt Data.Edison.Seq but it just clicked with me now for the above refactoring.) For compatibility, the element type would need to appear also thus: class Foldable f_a a | f_a -> a where fold :: (a -> b -> b) -> b -> f_a -> b class Reduce f_a a | f_a -> a where reduceR :: (a -> b -> b) -> (f_a -> b -> b) reduceL :: (b -> a -> b) -> (b -> f_a -> b) instance Reduce [a] a where reduceR f l x = foldr f x l -- etc Thus Data.ByteString, as well as being extremely useful to store strings (!), is a central lynchpin in the entire re-factoring question since it's an example of a non-polymorphic collection type (or a collection with restricted polymorphism), and supporting it would require a redesign of existing ideas in this direction eg http://cvs.haskell.org/Hugs/pages/libraries/base/Data-Foldable.html is no use either. Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Duncan Coutts

3:53 p.m.

On Mon, 2006-07-31 at 16:27 +0100, Brian Hulley wrote:

...

None of the above type classes would be compatible with Data.ByteString! (You mentioned this issue before wrt Data.Edison.Seq but it just clicked with me now for the above refactoring.) For compatibility, the element type would need to appear also thus:

class Foldable f_a a | f_a -> a where fold :: (a -> b -> b) -> b -> f_a -> b

class Reduce f_a a | f_a -> a where reduceR :: (a -> b -> b) -> (f_a -> b -> b) reduceL :: (b -> a -> b) -> (b -> f_a -> b)

instance Reduce [a] a where reduceR f l x = foldr f x l -- etc

Thus Data.ByteString, as well as being extremely useful to store strings (!), is a central lynchpin in the entire re-factoring question since it's an example of a non-polymorphic collection type (or a collection with restricted polymorphism), and supporting it would require a redesign of existing ideas in this direction eg http://cvs.haskell.org/Hugs/pages/libraries/base/Data-Foldable.html is no use either.

Indeed. There seem to be several new classes appearing in GHC 6.6 which we will not be able to make ByteString an instance of for just these reasons. If these are seeking to eventually replace the prelude versions of map, fold etc then it would be nice if they were general enough to cover specialised containers like ByteStrings, unboxed arrays or bitmaps of pixels. You'll note that the array type classes solve this, as you suggest, by adding the element type as a parameter to the class. This allows for instances that are polymorphic in the element, like Array and instances which only support specific element types like UArray. Duncan

Einar Karttunen

1 Aug 1 Aug

9:58 a.m.

On 31.07 16:27, Brian Hulley wrote:

...

None of the above type classes would be compatible with Data.ByteString! (You mentioned this issue before wrt Data.Edison.Seq but it just clicked with me now for the above refactoring.) For compatibility, the element type would need to appear also thus:

class Foldable f_a a | f_a -> a where fold :: (a -> b -> b) -> b -> f_a -> b

With the new System FC (when it is merged) we could make these classes nicer. class ElementType c a | c -> a instance ElementType [a] a instance ElementType ByteString Char instance IArray a e => ElementType (a i e) e class Foldable c where fold :: ElementType c a => (a -> b -> b) -> b -> c -> b This won't work at the moment due to limitations in GHC, but seems like a cleaner solution. - Einar Karttunen

Bulat Ziganshin

11:01 a.m.

New subject: Re[2]: The difficulty of designing a sequence class

Hello Einar, Tuesday, August 1, 2006, 1:58:30 PM, you wrote:

...

class ElementType c a | c -> a class Foldable c where fold :: ElementType c a => (a -> b -> b) -> b -> c -> b

i love it! will it be possible to write smth like this: class Stream m h | h->m data T h = (Stream m h) => C (m Int) ? currently, i need to pass 'm' parameter to T type constructor too, because GHC can't guess what 'm' is determined by 'h' -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

ajb＠spamcop.net

31 Jul 31 Jul

9:33 p.m.

G'day all. Quoting Brian Hulley :

...

The problem is that some people will be using Data.Edison.Seq at the moment and will naturally not want it to change. However I'd suggest that all the common operations be factored out into separate classes eg:

While I think the huge typeclass is unfortunate, one of Edison's greatest strengths is that every sequence supports every sequence operation. (The catch, of course, is that the operation may be inefficient.) This was a deliberate design decision, and I'd be sorry to see it go. Many is the time in C++ when I started, say, with a std::stack, then discovered soon after that I needed to peer at the top few elements on the stack, only to find that std::stack doesn't support that. Supporting all operations supports exploratory/agile programming. You don't have to decide up front what operations you need to be fast. You can discover this as you go. Yes, this is orthogonal to breaking up the huge typeclass, but I thought I'd just mention it. Cheers, Andrew Bromage

Brian Hulley

1 Aug 1 Aug

12:43 a.m.

ajb@spamcop.net wrote:

...

G'day all.

Quoting Brian Hulley :

...
The problem is that some people will be using Data.Edison.Seq at the moment and will naturally not want it to change. However I'd suggest that all the common operations be factored out into separate classes eg:

While I think the huge typeclass is unfortunate, one of Edison's greatest strengths is that every sequence supports every sequence operation. (The catch, of course, is that the operation may be inefficient.)

This was a deliberate design decision, and I'd be sorry to see it go. Many is the time in C++ when I started, say, with a std::stack, then discovered soon after that I needed to peer at the top few elements on the stack, only to find that std::stack doesn't support that.

As an aside, if I was needing any kind of sequence in C++ I'd use a std::vector because it supplies all the ops you need (and is usually fast enough for exploratory programming). I've never seen any point in stack or deque etc because they're far too limited.

...

Supporting all operations supports exploratory/agile programming. You don't have to decide up front what operations you need to be fast. You can discover this as you go.

Yes, this is orthogonal to breaking up the huge typeclass, but I thought I'd just mention it.

As you've pointed out, there are 2 separate issues that are in danger of being confused: 1) Forcing all sequence instances to support all operations 2) Bundling all the ops into a single huge class I'd suggest that while 1) may be useful for the classes that are there at present, there are many ops that they don't yet support and also some ops that are never needed. Also, surely as long as there is one concrete type that supports everything that should be good enough for exploratory programming (I'm thinking of FingerTrees which seem to be able to do absolutely anything in logarithmic time!!! :-) ) For 2), you could still have a Sequence class to gather all the separate functionality together but I'd make it inherit from all the separate pieces of functionality rather than being the place where all the functionality is defined eg: class Viewable c a | c -> a where viewL :: Monad m => c -> m (a, c) viewR :: Monad m => c -> m (c, a) atL :: c -> a -- must be called on non-empty sequence atR :: c -> a class Indexable c a | c -> a where length :: c -> Int at :: Int -> c -> a -- index must be in range splitAt :: Int -> c -> (c, c) -- in same module as Indexable take :: Indexable c a => Int -> c -> c take i c = let (l, _) = splitAt i c in l class (Monoid c, Foldable c a, Indexable c a, Filterable c a, Viewable c a) => Sequence c a This way, we'd get the advantages of being able to write (Sequence c a) as well as the advantages of being able to supply a sequence to a function that needed a Foldable - at the moment the fold methods of sequence are invisible to the rest of Haskell because they're trapped inside the Sequence class. Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Bulat Ziganshin

10:39 a.m.

New subject: Re[2]: The difficulty of designing a sequence class

Hello Brian, Tuesday, August 1, 2006, 4:43:23 AM, you wrote:

...

As you've pointed out, there are 2 separate issues that are in danger of being confused: 1) Forcing all sequence instances to support all operations 2) Bundling all the ops into a single huge class

Collections library (darcs get --partial http://darcs.haskell.org/packages/collections/) defines good hierarchy of collection classes -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Brian Hulley

3:55 p.m.

New subject: Re[2]: The difficulty of designing a sequence class

Bulat Ziganshin wrote:

...

Hello Brian,

Tuesday, August 1, 2006, 4:43:23 AM, you wrote:

...
As you've pointed out, there are 2 separate issues that are in danger of being confused: 1) Forcing all sequence instances to support all operations 2) Bundling all the ops into a single huge class

Collections library (darcs get --partial http://darcs.haskell.org/packages/collections/) defines good hierarchy of collection classes

Hi Bulat - Thanks for the link to the collections repository. I've at last taken the plunge and installed darcs and managed to get this onto my computer (although with a strange warning from darcs: 'plink: unknown option "-O" '). I'll have to have a proper look into it. On superficial inspection, there are some unusual choices - for example putting (size) and (null) into Foldable instead of Collection, and calling "null" "null" instead of "isEmpty" (considering that Collection has a method called "isSingleton" which follows the usual convention of starting unary predicates with "is"). In any case it's interesting to see another possible factoring of the concept of collections to compare with Edison and the existing base collections. Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

ajb＠spamcop.net

31 Jul 31 Jul

9:17 p.m.

G'day all. Quoting Robert Dockins :

...

Edison's design hails from a time when MPTCs were not only non-standard (as they still are), but also not widely used, and before fundeps were avaliable (I think).

Yes. Chris Okasaki's original version of Edison was standard H98.

...

I've considered reformulating the Sequence class to be more similar to the Collection classes (which use MPTCs, fundeps and mention the element type),

The redesign of the Collection hierarchy was from my tree. The main reason why I changed it was that ternary tries couldn't really be typed properly. (Chris' implementation of Patricia trees used a phantom key type along with a stern warning to only define the Int instance. That didn't work for ternary tries, since the key type is polymorphic.) I didn't get around to fixing Sequence because there wasn't a need for it yet, but yes, it should be done. Cheers, Andrew Bromage

David Menendez

10:35 p.m.

ajb@spamcop.net writes:

...

G'day all.

Quoting Robert Dockins :

...
I've considered reformulating the Sequence class to be more similar to the Collection classes (which use MPTCs, fundeps and mention the element type),

The redesign of the Collection hierarchy was from my tree. The main reason why I changed it was that ternary tries couldn't really be typed properly. (Chris' implementation of Patricia trees used a phantom key type along with a stern warning to only define the Int instance. That didn't work for ternary tries, since the key type is polymorphic.)

I didn't get around to fixing Sequence because there wasn't a need for it yet, but yes, it should be done.

That's a tough call to make. Changing the kind of Sequence to * from * -> * means losing the Functor, Monad, and MonadPlus superclasses and all the various maps and zips. I guess you could separate those into an auxiliary class, class (Functor s, MonadPlus s) => SeqFunctor s where mapWithIndex :: (Int -> a -> b) -> s a -> s b zip :: s a -> s b -> s (a,b) ... and require that any instance of SeqFunctor also be an instance of Sequence. A pity we can't do something like, class (Functor s, MonadPlus s, forall a. Sequence (s a) a) => SeqFunctor s where ... -- David Menendez | "In this house, we obey the laws http://www.eyrie.org/~zednenem | of thermodynamics!"

ajb＠spamcop.net

11:55 p.m.

G'day all. Quoting David Menendez :

...

That's a tough call to make. Changing the kind of Sequence to * from * -> * means losing the Functor, Monad, and MonadPlus superclasses and all the various maps and zips.

And on the other hand, containers that need extra constraints (e.g. sets, which need their members to be Eq at the very least) can't be Functors or Monads anyway. Perhaps Functor/Monad/etc are the culprits here. Cheers, Andrew Bromage

Jim Apple

1 Aug 1 Aug

2:25 a.m.

On 7/31/06, ajb@spamcop.net wrote:

...

G'day all.

Quoting David Menendez :

...
That's a tough call to make. Changing the kind of Sequence to * from * -> * means losing the Functor, Monad, and MonadPlus superclasses and all the various maps and zips.

Perhaps Functor/Monad/etc are the culprits here.

Indeed. See Oleg's message from a few months back where he shows that we can get John Hughes Restricted Data Types (Set is a Monad) by adding parameters to type classes: http://www.haskell.org//pipermail/haskell-prime/2006-February/000498.html http://hackage.haskell.org/trac/haskell-prime/ticket/98 Jim

Brian Hulley

12:23 a.m.

David Menendez wrote:

...

ajb@spamcop.net writes:

...
I didn't get around to fixing Sequence because there wasn't a need for it yet, but yes, it should be done.

That's a tough call to make. Changing the kind of Sequence to * from * -> * means losing the Functor, Monad, and MonadPlus superclasses and all the various maps and zips.

But there's no option if you want to be able to support non-polymorphic sequences like Data.ByteString etc. I think the Functor class is just fundamentally too limited - it assumes the whole world is polymorphic and it isn't. Also, MPTC's mean we would gain Monoid. Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

Bulat Ziganshin

10:37 a.m.

New subject: Re[2]: The difficulty of designing a sequence class

Hello Brian, Tuesday, August 1, 2006, 4:23:53 AM, you wrote:

...

...
That's a tough call to make. Changing the kind of Sequence to * from * -> * means losing the Functor, Monad, and MonadPlus superclasses and all the various maps and zips.

...

But there's no option if you want to be able to support non-polymorphic sequences like Data.ByteString etc. I think the Functor class is just fundamentally too limited - it assumes the whole world is polymorphic and it isn't.

it's possible, at least in principle, to make ByteString parametric class: data PlainSequence a = ... type ByteString = PlainSequence Word8 and then rewrite all ByteString functions so that they will work with elements of any type, not just Word8 -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Duncan Coutts

5:36 p.m.

New subject: Re[2]: The difficulty of designing a sequence class

On Tue, 2006-08-01 at 14:37 +0400, Bulat Ziganshin wrote:

...

Hello Brian,

Tuesday, August 1, 2006, 4:23:53 AM, you wrote:

...
...
That's a tough call to make. Changing the kind of Sequence to * from * -> * means losing the Functor, Monad, and MonadPlus superclasses and all the various maps and zips.

...
But there's no option if you want to be able to support non-polymorphic sequences like Data.ByteString etc. I think the Functor class is just fundamentally too limited - it assumes the whole world is polymorphic and it isn't.

it's possible, at least in principle, to make ByteString parametric class:

data PlainSequence a = ...

type ByteString = PlainSequence Word8

and then rewrite all ByteString functions so that they will work with elements of any type, not just Word8

Much of the performance advantages that we can get are due to the special representation and knowing the types involved. This would not work for arbitrary types. For one thing all the performance advantages of using a packed representation would be lost as we would have to use pointers to boxed objects rather than flat arrays of bytes. We can use low level standard C functions like memchr because we know the representation. There is certainly a place for a general strict sequence type but there is also a need for efficient handling of big chunks of bytes. It'd be even better if the special case representation collections could fit into a general framework. Duncan

6910

Age (days ago)

6912

Last active (days ago)

List overview

Download

19 comments

8 participants

participants (8)

ajb＠spamcop.net
Brian Hulley
Bulat Ziganshin
David Menendez
Duncan Coutts
Einar Karttunen
Jim Apple
Robert Dockins