Re: [Haskell] ANN: random-access-list-0.1

Stephan Friedrichs wrote:
Isaac Dupree wrote:
[...]
Great to see it, it deserved implementing, IIRC! I don't remember enough about it.. (and don't have Okasaki anywhere handy). Can it be lazy or infinitely long?
No, it has to be finite as it's actually a list of complete binary trees whose size depend on the skew binary representation of the list's size. I'll document this.
okay then
[...]
Is "RandomAccessList" the best name for something that's not O(1), just O(log n)? Or just that "RandomAccessList" is just much longer than "[]"?
Well Chris Okasaki called them "Skew Binary Random-Access Lists", which is even longer :)
:) hmm.. IndexableList? (just a thought, not sure whether I like it any better)
don't use those unorthodox infix function names.. `cons` is hardly worse than .:. , `append` or `mappend` than .+. , and .!. than, hmm.. . Export a ++ and ! (!! ?) if you're really dedicated. But I'd prefer an `at` that's the same partial indexing operation, rather than the name .!. (I think this "at" was a new name being put somewhere else? partly because "!" is trying to be gradually used only to refer to strictness?)
Good point!
"extractHead" is an ugly name for a nevertheless standardish-meaning function... what is it usually called? uncons? headTail? (Data.Sequence, which is meant to be left-right symmetric, calls it "viewr"... except your version doesn't have the Maybe, it's partial instead, fails on empty lists)
Yes, I wasn't happy with that one either. The view-concept of Data.Sequence is a good idea.
yeah, it's a good idea, although I'm not sure how much I like the particular syntax of how it's done in Data.Sequence (the view-types' constructor names, mostly)
For "index", don't use Monad, use Maybe (I think that's what the recent libraries@haskell.org discussion concluded, in the context of switching Data.Map back to Maybe).
I was just copying the idea from Data.Map and it's usually a good thing to have functions as general as possible, or why is it not?
To summarize: Monad isn't the proper abstraction for failable/Maybe. Maybe is an algebraic data type that *exactly* represents the spirit of what you're trying to do: e.g. Conor McBride said: "Maybe is the most general abstraction. Requiring (>>=), or even (<*>) seems excessive. What we need is "any f with zero and return", so why not pick the canonical, initial, inductively defined such thing?" In this case the typeclass adds no generality to the function's behaviour (Maybe can be trivially converted to any other type, with a combinator even). And the confusion that results, when the function is almost always used at type Maybe anyway. If you want to read the whole discussion... if you haven't been subscribed to libraries@haskell.org , it's archived: http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9082
Also, Data.List has genericLength etc, to
At the moment, I'm using the Int type for size and indexing only for one reason: I haven't found a proper way to generalize it. I'd really like to use the Ix class, but it doesn't provide enough functionality, it only works on fixed-size intervals (i. e. for arrays, which don't change their size, but a list does). Maybe someone has an idea of how to realize lists with a variable starting index and size?
fair enough. If your implementation only supports sizes up to that of Int (which is reasonable for a strict finite type... whereas something like ([1..2^34] `genericIndex` (2^33)) can probably complete in a small amount of memory and only a moderate amount of time on a modern machine, even a 32-bit one, due to laziness and garbage collection)
support. Isn't "index" (like Data.List.genericIndex) supposed to be a name for a partial operation, not one that returns a Maybe? Shouldn't "size" be named "length" (or exported as both names, since e.g. Data.Map.size, .List.length) (why is it O(1) not O(log n)? Is it possible for these RandomAccessLists to be longer than maxBound::Int?)?
The size function is in O(1) because I cache it, otherwise it would be
size (RandomAccessList xs) = sum (map fst xs)
which is O(log n). I consider the caching useful, as most applications will check 0 <= i < size quite often.
sounds good
for e.g. toList, is the O(n) cost spread over traversing/demanding the items of the generated list, or all up-front, or somewhere in between?
Why is zip slow with unbalanced lists? Obviously, it could be implemented O(min(m,n)*(log m + log n)) just indexing each one, which would be faster for really unbalanced-size lists... Obviously, I don't
If two lists have exactly the same size, all the complete binary trees holding the data have the same size as well. This can be zipped directly and is a bit (~5% in my tests) faster.
okay, that sounds like a fair optimization, since zipping same-size lists is a nice thing to do anyway. But the asymptotic speed ideally should still be O(min(m,n)), if possible?
understand the implementation. BTW, looking at the file, data RandomAccessList a = RandomAccessList {-# UNPACK #-} !Int ![(Int, CBTree a)] Marking a list strict like that doesn't have much effect because it only makes the first (:) or [] strict. Did you mean for an element-strict or spine-strict(which would necessarily be non-infinite) list?
Oh, you're right, I just meant to mark the Int strict. It obviously was too late yesterday!
Thanks for this feedback! //Stephan

To summarize: Monad isn't the proper abstraction for failable/Maybe. Maybe is an algebraic data type that *exactly* represents the spirit of what you're trying to do: e.g. Conor McBride said: "Maybe is the most general abstraction. Requiring (>>=), or even (<*>) seems excessive. What we need is "any f with zero and return", so why not pick the canonical, initial, inductively defined such thing?" In this case the typeclass adds no generality to the function's behaviour (Maybe can be trivially converted to any other type, with a combinator even). And the confusion that results, when the function is almost always used at type Maybe anyway. If you want to read the whole discussion... if you haven't been subscribed to libraries@haskell.org , it's archived: http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9082
Thanks for the summary. I had been wondering about this change of mood, and I disagree with the suggestion that Maybe Nothing is the right replacement for Monad fail. Whether fail should be in Monad, or whether we really want MonadZero, MonadPlus, MonadError, or something else entirely has been open for discussion, but it is easily shown that Maybe is not the most general abstraction - it loses information wrt to (Either String), for instance: Prelude> let {f [] = fail "empty"; f [_] = fail "singleton"; f l = return l } Prelude> f [] :: Maybe [Bool] Nothing Prelude> f [True] :: Maybe [Bool] Nothing Prelude> f [True,False] :: Maybe [Bool] Just [True,False] Prelude> Prelude> :m +Control.Monad.Error Prelude Control.Monad.Error> f [] :: Either String [Bool] Left "empty" Prelude Control.Monad.Error> f [True] :: Either String [Bool] Left "singleton" Prelude Control.Monad.Error> f [True,False] :: Either String [Bool] Right [True,False] You can specialise Monad to Maybe, but you can't get back to the general handling of failure without losing the failure messages! Choosing Maybe over (Either String) means: "I don't care about the failure messages" (not that String is necessarily the best way to represent failure conditions, but that is another story again). As anyone who has ever tried to use a Parser based on that choice can attest, that choice should not be taken lightly ("Compilation failed. There were errors."). Claus

Claus Reinke wrote:
To summarize: Monad isn't the proper abstraction for failable/Maybe. Maybe is an algebraic data type that *exactly* represents the spirit of what you're trying to do: e.g. Conor McBride said: "Maybe is the most general abstraction. Requiring (>>=), or even (<*>) seems excessive. What we need is "any f with zero and return", so why not pick the canonical, initial, inductively defined such thing?" In this case the typeclass adds no generality to the function's behaviour (Maybe can be trivially converted to any other type, with a combinator even). And the confusion that results, when the function is almost always used at type Maybe anyway. If you want to read the whole discussion... if you haven't been subscribed to libraries@haskell.org , it's archived: http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9082
Thanks for the summary. I had been wondering about this change of mood, and I disagree with the suggestion that Maybe Nothing is the right replacement for Monad fail. Whether fail should be in Monad, or whether we really want MonadZero, MonadPlus, MonadError, or something else entirely has been open for discussion, but it is easily shown that Maybe is not the most general abstraction - it loses information wrt to (Either String), for instance:
Prelude> let {f [] = fail "empty"; f [_] = fail "singleton"; f l = return l }
okay, I see, it's just that most partial functions in Data.* / container libraries don't really have multiple failure modes that need distinguishing. You could say, in a type/information-theoretic mindset, that even your "f [_]" above loses information because it doesn't vary based on the _ (and so it isn't reversible) (and this is very common and normal especially for error messages, but there's a large design space for places where they're needed, depending on whether a machine needs to understand the message, etc.) I think we didn't conclude much about e.g. parser-library return types, because we (thankfully) weren't trying to. -Isaac

http://thread.gmane.org/gmane.comp.lang.haskell.libraries/9082 .. that Maybe is not the most general abstraction - it loses information wrt to (Either String), for instance:
Prelude> let {f [] = fail "empty"; f [_] = fail "singleton"; f l = return l }
okay, I see, it's just that most partial functions in Data.* / container libraries don't really have multiple failure modes that need distinguishing.
Yes, I noticed that Ross was careful to note that in the message that started that thread. I was more concerned with that specific case being generalized to an argument against not-just-Maybe (I think that was the title on the old wiki?). And single failure mode in the components doesn't imply single failure mode in combinations, either. Consider import Control.Monad import Control.Monad.Error f 1 = return 1 f _ = fail "f" g 2 = return 2 g _ = fail "g" If I call 'f', and it fails, I know that it was 'f' that failed, and it could only fail in a single way, so there's really no point in not replacing 'fail "f"' with 'Nothing', right (and the same goes for 'g')? Wrong (or Left;-)! Combining functions with single failure modes gives combinations with multiple failure modes (think of different branches in a parser/ type system/.., or of different phases in a compiler). Compare the two outputs of 'test' below: forg n = f n `mplus` g n gorf n = g n `mplus` f n fandg n = f n >>= g gandf n = g n >>= f test = do print ([forg 3, gorf 3, fandg 1, fandg 2]::[Maybe Int]) print ([forg 3, gorf 3, fandg 1, fandg 2]::[Either String Int]) I don't know whether that is an immediate concern for the particular functions under discussion from the 'containers' package, apart from all the extra lifting (and re-adding of sensible failure messages) when wanting to use those functions in a non-Maybe Monad? But it is a concern in general monadic programming (and it often requires extra work to ensure that failure paths combine as well as success paths). Claus PS: There was also the argument that there are cases where local failure handling is just not sensible (what the Erlangers call "don't program defensively", or "let it crash"), and where trying to return and handle those 'Nothing's would only make the code less readable as the problem gets passed from Pontius to Pilatus. The "let someone else clean up" approach is also supported by the not-just-Maybe pattern, although not as well in Haskell as in Erlang (with its supervisor trees and heart-beat monitoring).

Whether fail should be in Monad, or whether we really want MonadZero, MonadPlus, MonadError, or something else entirely has been open for discussion, but it is easily shown that Maybe is not the most general abstraction - it loses information wrt to (Either String), for instance:
Prelude> let {f [] = fail "empty"; f [_] = fail "singleton"; f l =
Yes. But that's not what we're talking about. We're talking about lookup and index which both have one and exactly one failure mode : not found. For these functions, Maybe a is both the most general and the most precise type. It is trivial to upgrade Maybe a by decorating it with an error should you choose to do so: maybe (throwError "better error message here") return which I sometimes define as 'withError' or similar. d <- M.lookup "foo" `withError` "Variable foo not in symbol table" Jules

Isaac Dupree wrote:
[...]
Well Chris Okasaki called them "Skew Binary Random-Access Lists", which is even longer :)
:)
hmm.. IndexableList? (just a thought, not sure whether I like it any better)
RAList? IList? <- (will I be sued by a large computer company for that?)
[...]
Yes, I wasn't happy with that one either. The view-concept of Data.Sequence is a good idea.
yeah, it's a good idea, although I'm not sure how much I like the particular syntax of how it's done in Data.Sequence (the view-types' constructor names, mostly)
I now have data View a = Empty | Cons a (RandomAccessList a) and view :: RandomAccessList a -> a additionally, I renamed "extractHead" to "uncons" (which is OK, because I also have "cons") but still left "head" and "tail" with the typical list-like behaviour (causing an error on empty lists).
[Monad vs. Maybe]
That's quite convincing, most of all that "fail" has rather strange definitions for many Monads (because it originally does not belong to monads, does it?).
[...]
The size function is in O(1) because I cache it, otherwise it would be
size (RandomAccessList xs) = sum (map fst xs)
which is O(log n). I consider the caching useful, as most applications will check 0 <= i < size quite often.
sounds good
[...]
If two lists have exactly the same size, all the complete binary trees holding the data have the same size as well. This can be zipped directly and is a bit (~5% in my tests) faster.
okay, that sounds like a fair optimization, since zipping same-size lists is a nice thing to do anyway. But the asymptotic speed ideally should still be O(min(m,n)), if possible?
Well... I guess that's possible converting the shorter one into a list and using fold for zipping. I'll have a close look at this! -- Früher hieß es ja: Ich denke, also bin ich. Heute weiß man: Es geht auch so. - Dieter Nuhr
participants (4)
-
Claus Reinke
-
Isaac Dupree
-
Jules Bean
-
Stephan Friedrichs