Re: [Haskell] ANN: random-access-list-0.1

Stephan Friedrichs wrote:
Hello,
I've implemented Chris Okasaki's random-access list[1] which provides typical list operations (cons, head, tail) in O(1) and yet offers indexed random-access in O(log n). It's uploaded on hackage[2].
It's still an early version which I'll extend, but especially at this eary stage I'd appreciate your feedback concerning what's still missing / to be fixed / to be improved.
Great to see it, it deserved implementing, IIRC! I don't remember enough about it.. (and don't have Okasaki anywhere handy). Can it be lazy or infinitely long? (Data.Sequence can't, but it's fast on both ends and has fast concatenation.) Anyway, document that. If it can't be lazy/infinite, does it have any advantage over using Data.Sequence?(constant factor of speed?, possible operations?, something I'm forgetting?) Is "RandomAccessList" the best name for something that's not O(1), just O(log n)? Or just that "RandomAccessList" is just much longer than "[]"? don't use those unorthodox infix function names.. `cons` is hardly worse than .:. , `append` or `mappend` than .+. , and .!. than, hmm.. . Export a ++ and ! (!! ?) if you're really dedicated. But I'd prefer an `at` that's the same partial indexing operation, rather than the name .!. (I think this "at" was a new name being put somewhere else? partly because "!" is trying to be gradually used only to refer to strictness?) "extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists) For "index", don't use Monad, use Maybe (I think that's what the recent libraries@haskell.org discussion concluded, in the context of switching Data.Map back to Maybe). Also, Data.List has genericLength etc, to support. Isn't "index" (like Data.List.genericIndex) supposed to be a name for a partial operation, not one that returns a Maybe? Shouldn't "size" be named "length" (or exported as both names, since e.g. Data.Map.size, .List.length) (why is it O(1) not O(log n)? Is it possible for these RandomAccessLists to be longer than maxBound::Int?)? for e.g. toList, is the O(n) cost spread over traversing/demanding the items of the generated list, or all up-front, or somewhere in between? Why is zip slow with unbalanced lists? Obviously, it could be implemented O(min(m,n)*(log m + log n)) just indexing each one, which would be faster for really unbalanced-size lists... Obviously, I don't understand the implementation. BTW, looking at the file, data RandomAccessList a = RandomAccessList {-# UNPACK #-} !Int ![(Int, CBTree a)] Marking a list strict like that doesn't have much effect because it only makes the first (:) or [] strict. Did you mean for an element-strict or spine-strict(which would necessarily be non-infinite) list? -Isaac
Regards, Stephan
[1] Chris Okasaki: "Purely Functional Data Structures" [2] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/random-access-lis...

On Wed, 11 Jun 2008, Isaac Dupree wrote:
"extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists)
I like the 'viewL' and 'viewR' kind of functions, they are safer than 'head' and 'tail', 'init' and 'last'. But since in most cases I used 'viewL' in connection with 'maybe', the continuation style functions switchL :: b -> (a -> Seq a -> b) -> Seq a -> b switchR :: b -> (Seq a -> a -> b) -> Seq a -> b are even more convenient. They replace 'case' on those structures where you do not have access to the constructors.

Henning Thielemann wrote:
On Wed, 11 Jun 2008, Isaac Dupree wrote:
"extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists)
I like the 'viewL' and 'viewR' kind of functions, they are safer than 'head' and 'tail', 'init' and 'last'. But since in most cases I used 'viewL' in connection with 'maybe', the continuation style functions
switchL :: b -> (a -> Seq a -> b) -> Seq a -> b switchR :: b -> (Seq a -> a -> b) -> Seq a -> b
I think you got L and R backwards? it's a little confusing to me, but following Data.Sequence I think they're meant to match whether repeated L or R makes a foldl or a foldr.
are even more convenient. They replace 'case' on those structures where you do not have access to the constructors.
on the other hand, they look harder to use with the upcoming (in GHC 6.9/6.10) "view patterns", a simple example being foo (viewr -> Nothing) = ... foo (viewr -> Just (a, as)) = ... equivalent to foo s = case viewr s of Nothing -> ... Just (a, as) -> ... (for view patterns, it's also proposed but not implemented to have special syntax for Maybe and/or tuples to make it even more convenient for these purposes... I'm not sure if it'd be a good idea either) I also mentioned the continuation style functions for Data.Map's maybe-returning in the discussion... they're most convenient if you want to destruct them right away, but harder to apply combinators to (e.g. Maybe monad. which doesn't work for Data.Sequence because they use their own custom view datatype -- was that a bad choice for them? "Maybe" of a tuple has an extra laziness-spot that shouldn't really be there, though) Anyway, people didn't seem to respond to that idea of continuation-style returning, not sure why. On the other hand, also, the only thing defending the function from using the arguments in a way it shouldn't, is parametricity... less obvious than with algrebraic data returns -Isaac

On Thu, 12 Jun 2008, Isaac Dupree wrote:
Henning Thielemann wrote:
On Wed, 11 Jun 2008, Isaac Dupree wrote:
"extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists)
I like the 'viewL' and 'viewR' kind of functions, they are safer than 'head' and 'tail', 'init' and 'last'. But since in most cases I used 'viewL' in connection with 'maybe', the continuation style functions
switchL :: b -> (a -> Seq a -> b) -> Seq a -> b switchR :: b -> (Seq a -> a -> b) -> Seq a -> b
I think you got L and R backwards?
Why do you think so? http://cvs.haskell.org/Hugs/pages/libraries/base/Data-Sequence.html#v%3Aview...
it's a little confusing to me, but following Data.Sequence I think they're meant to match whether repeated L or R makes a foldl or a foldr.
viewl is like a 'case' for distinction of [] and (:) and thus can be used to implement both foldl and foldr.

Henning Thielemann wrote:
On Thu, 12 Jun 2008, Isaac Dupree wrote:
Henning Thielemann wrote:
On Wed, 11 Jun 2008, Isaac Dupree wrote:
"extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists)
I like the 'viewL' and 'viewR' kind of functions, they are safer than 'head' and 'tail', 'init' and 'last'. But since in most cases I used 'viewL' in connection with 'maybe', the continuation style functions
switchL :: b -> (a -> Seq a -> b) -> Seq a -> b switchR :: b -> (Seq a -> a -> b) -> Seq a -> b
I think you got L and R backwards?
Why do you think so?
http://cvs.haskell.org/Hugs/pages/libraries/base/Data-Sequence.html#v%3Aview...
Because I was confused. Looking again, you seem to be right...
it's a little confusing to me, but following Data.Sequence I think they're meant to match whether repeated L or R makes a foldl or a foldr.
viewl is like a 'case' for distinction of [] and (:) and thus can be used to implement both foldl and foldr.
yes, but in the sense that foldr is more "natural" for lists than foldl, requiring just replacing the constructors "directly"

On Wed, Jun 11, 2008 at 08:56:17PM -0400, Isaac Dupree wrote:
Stephan Friedrichs wrote:
I've implemented Chris Okasaki's random-access list[1] which provides typical list operations (cons, head, tail) in O(1) and yet offers indexed random-access in O(log n). It's uploaded on hackage[2].
It's still an early version which I'll extend, but especially at this eary stage I'd appreciate your feedback concerning what's still missing / to be fixed / to be improved.
Great to see it, it deserved implementing, IIRC! I don't remember enough about it.. (and don't have Okasaki anywhere handy). Can it be lazy or infinitely long? (Data.Sequence can't, but it's fast on both ends and has fast concatenation.) Anyway, document that. If it can't be lazy/infinite, does it have any advantage over using Data.Sequence?(constant factor of speed?, possible operations?, something I'm forgetting?)
The O(1) size function settles the finiteness question. In tests we did for the finger tree paper, we found that skew binary random access lists are 1.6-1.7 times faster than Data.Sequence for stack operations and indexing, but not much faster for updates. foldr is defined using foldr' on lists, introducing strictness without the user asking for it. In any case, it would be better to define the Foldable instance directly over the internal structure. Then you wouldn't need toList. A similar Traversable instance would also be a good idea.
Why is zip slow with unbalanced lists? Obviously, it could be implemented O(min(m,n)*(log m + log n)) just indexing each one, which would be faster for really unbalanced-size lists...
It could be made O(min(m,n)) by toListing the longer one and traversing the shorter one adding corresponding elements from the list. I agree about using Maybe instead of Monad, and would generally prefer view functions to null/head/tail combinations.

Isaac Dupree wrote:
[...]
Great to see it, it deserved implementing, IIRC! I don't remember enough about it.. (and don't have Okasaki anywhere handy). Can it be lazy or infinitely long?
No, it has to be finite as it's actually a list of complete binary trees whose size depend on the skew binary representation of the list's size. I'll document this.
[...]
Is "RandomAccessList" the best name for something that's not O(1), just O(log n)? Or just that "RandomAccessList" is just much longer than "[]"?
Well Chris Okasaki called them "Skew Binary Random-Access Lists", which is even longer :)
don't use those unorthodox infix function names.. `cons` is hardly
worse than .:. , `append` or `mappend` than .+. , and .!. than, hmm.. . Export a ++ and ! (!! ?) if you're really dedicated. But I'd prefer an `at` that's the same partial indexing operation, rather than the name .!. (I think this "at" was a new name being put somewhere else? partly because "!" is trying to be gradually used only to refer to strictness?) Good point!
"extractHead" is an ugly name for a nevertheless standardish-meaning
function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists) Yes, I wasn't happy with that one either. The view-concept of Data.Sequence is a good idea.
For "index", don't use Monad, use Maybe (I think that's what the
recent libraries@haskell.org discussion concluded, in the context of switching Data.Map back to Maybe). I was just copying the idea from Data.Map and it's usually a good thing to have functions as general as possible, or why is it not?
Also, Data.List has genericLength etc, to
support. Isn't "index" (like Data.List.genericIndex) supposed to be a name for a partial operation, not one that returns a Maybe? Shouldn't "size" be named "length" (or exported as both names, since e.g. Data.Map.size, .List.length) (why is it O(1) not O(log n)? Is it
At the moment, I'm using the Int type for size and indexing only for one reason: I haven't found a proper way to generalize it. I'd really like to use the Ix class, but it doesn't provide enough functionality, it only works on fixed-size intervals (i. e. for arrays, which don't change their size, but a list does). Maybe someone has an idea of how to realize lists with a variable starting index and size? possible for these RandomAccessLists to be longer than maxBound::Int?)? The size function is in O(1) because I cache it, otherwise it would be size (RandomAccessList xs) = sum (map fst xs) which is O(log n). I consider the caching useful, as most applications will check 0 <= i < size quite often.
for e.g. toList, is the O(n) cost spread over traversing/demanding
the items of the generated list, or all up-front, or somewhere in between?
Why is zip slow with unbalanced lists? Obviously, it could be
understand the implementation. BTW, looking at the file, data RandomAccessList a = RandomAccessList {-# UNPACK #-} !Int ![(Int, CBTree a)] Marking a list strict like that doesn't have much effect because it only makes the first (:) or [] strict. Did you mean for an element-strict or spine-strict(which would necessarily be non-infinite)
implemented O(min(m,n)*(log m + log n)) just indexing each one, which would be faster for really unbalanced-size lists... Obviously, I don't If two lists have exactly the same size, all the complete binary trees holding the data have the same size as well. This can be zipped directly and is a bit (~5% in my tests) faster. list? Oh, you're right, I just meant to mark the Int strict. It obviously was too late yesterday! Thanks for this feedback! //Stephan -- Früher hieß es ja: Ich denke, also bin ich. Heute weiß man: Es geht auch so. - Dieter Nuhr

2008/6/12 Stephan Friedrichs
For "index", don't use Monad, use Maybe (I think that's what the recent libraries@haskell.org discussion concluded, in the context of switching Data.Map back to Maybe).
I was just copying the idea from Data.Map and it's usually a good thing to have functions as general as possible, or why is it not?
Mainly because it's too easy to use them in a Monad that does not have a meaningful fail(), like IO, many beginners do this error and are surprised when their program is less robust that it should be. On the other hand it's pretty easy to get an error out of a Maybe (for example you can use fromMaybe with error() to get a meaningful error output).
Also, Data.List has genericLength etc, to
At the moment, I'm using the Int type for size and indexing only for one reason: I haven't found a proper way to generalize it. I'd really like to use the Ix class, but it doesn't provide enough functionality, it only works on fixed-size intervals (i. e. for arrays, which don't change their size, but a list does). Maybe someone has an idea of how to realize lists with a variable starting index and size?
Given that this structure isn't lazy enough, I really don't see a problem with using Int (any random access list with a size that needs an Integer would blow the memory anyway...). -- Jedaï

Chaddaï Fouché wrote:
Given that this structure isn't lazy enough, I really don't see a problem with using Int (any random access list with a size that needs an Integer would blow the memory anyway...).
Bad way to think about things. The implications of using Int as the result type of a function extend far beyond just storing the result type. They also include doing other computations that depend on the result type, and with enough type inference, the broken Int type propogates throughout the program. -- Chris

Chris Smith wrote:
Chaddaï Fouché wrote:
Given that this structure isn't lazy enough, I really don't see a problem with using Int (any random access list with a size that needs an Integer would blow the memory anyway...).
Bad way to think about things. The implications of using Int as the result type of a function extend far beyond just storing the result type. They also include doing other computations that depend on the result type, and with enough type inference, the broken Int type propogates throughout the program.
I'd like not to be restricted to Int, but what's the proper way to do so? Just adding "genericFun = fromInteger . toInteger . fun" functions is just as bad as using Ints directly. What I need is something like a more general Ix class that is not limited to a fixed interval: class Ix2 i where -- convert starting index and given index to an Int for internal -- representation toIndex :: i -> i -> Int -- given an internal Int index and a starting index, return the -- representing index fromIndex :: Int -> i -> i -- translate a starting index and a number of elements into a -- size size :: i -> Int -> i This is meant to work with a starting-index and a given index, so that a list does not have to start at index 0. Is there something like this? Or is it somehow possible to use the existing Ix class to do that? //Stephan -- Früher hieß es ja: Ich denke, also bin ich. Heute weiß man: Es geht auch so. - Dieter Nuhr

Isaac Dupree wrote:
"extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists)
Views are nice, but these other functions are needed too. As for "extractHead", what about "split"? (cf. Control.Monad.Logic.msplit[1]) [1] http://tinyurl.com/37f4ga http://hackage.haskell.org/packages/archive/logict/0.2.3/doc/html/Control-Mo... -- Live well, ~wren
participants (7)
-
Chaddaï Fouché
-
Chris Smith
-
Henning Thielemann
-
Isaac Dupree
-
Ross Paterson
-
Stephan Friedrichs
-
wren ng thornton