RE: [Haskell-cafe] Re: OCaml list sees abysmal Language Shootoutresults

On 07 October 2004 18:23, Ketil Malde wrote:
Couldn't readFile et al. provide the standard interface, but use hGetBuf tricks (e.g. from your 'wc' entry) behind the curtains?
readFile does do buffering behind the scenes, that's not the problem. The problem is doing the computation on a [Char] instead of a raw buffer of bytes. There are various known techniques that could be used to speed up GHC's implementation of lists, none of which we've ever tried. This might be a good area for experimentation, if anyone's looking for something to do in GHC. Cheers, Simon

Actually, I've been wondering about this. If my understanding is correct, Haskell lists are basicly singly-linked lists of cons cells (is that correct?) A simple (I think) thing to do would be to make the lists doubly-linked and circular. That would let us do nice things like have O(1) primops for reverse, tail, (++) and other things that access lists at the end. However, I'm still pretty new to FP in general, so I don't know if there are any theoretical reasons why something like this couldn't work. Obviously lazy and infinite lists add some wrinkles, but I think it could be worked through. BTW can you give some references to these known techniques? Robert Simon Marlow wrote:
On 07 October 2004 18:23, Ketil Malde wrote:
Couldn't readFile et al. provide the standard interface, but use hGetBuf tricks (e.g. from your 'wc' entry) behind the curtains?
readFile does do buffering behind the scenes, that's not the problem. The problem is doing the computation on a [Char] instead of a raw buffer of bytes.
There are various known techniques that could be used to speed up GHC's implementation of lists, none of which we've ever tried. This might be a good area for experimentation, if anyone's looking for something to do in GHC.
Cheers, Simon _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Oct 08, 2004 at 08:35:40AM -0400, Robert Dockins wrote:
Actually, I've been wondering about this. If my understanding is correct, Haskell lists are basicly singly-linked lists of cons cells (is that correct?) A simple (I think) thing to do would be to make the lists doubly-linked and circular. That would let us do nice things like have O(1) primops for reverse, tail, (++) and other things that access lists at the end. However, I'm still pretty new to FP in general, so I don't know if there are any theoretical reasons why something like this couldn't work. Obviously lazy and infinite lists add some wrinkles, but I think it could be worked through. BTW can you give some references to these known techniques?
Ugh, lousy cache properties... try rank-ordered B+ trees. There are probably better choices than that even. It's probably best Simon point us to references to what's actually useful here. -- wli

William Lee Irwin III
Actually, I've been wondering about this. If my understanding is correct, Haskell lists are basicly singly-linked lists of cons cells (is that correct?) A simple (I think) thing to do would be to make the lists doubly-linked and circular.
Uh, I think one of the main problems with the usual IO functions is that it adds the overhead of cons cells and optionally 32bit chars (although I think GHC packs them for char values <256) - when you really want an (unboxed) array of Word8.
BTW can you give some references to these known techniques?
Ugh, lousy cache properties... try rank-ordered B+ trees. There are probably better choices than that even. It's probably best Simon point us to references to what's actually useful here.
I'm as dumb as anybody, but it seems to me that one could read a lazy chain (list) of buffers as UArray Int Word8, and tack a list-type interface (head, tail, cons...) interface on top. -kzm -- If I haven't seen further, it is by standing in the footprints of giants

William Lee Irwin III
Ugh, lousy cache properties... try rank-ordered B+ trees. There are probably better choices than that even. It's probably best Simon point us to references to what's actually useful here.
On Fri, Oct 08, 2004 at 03:55:05PM +0200, Ketil Malde wrote:
I'm as dumb as anybody, but it seems to me that one could read a lazy chain (list) of buffers as UArray Int Word8, and tack a list-type interface (head, tail, cons...) interface on top.
I had in mind a more general attack on things like e.g. list concatenation, reversal, indexing (!!), and so on in addition to cache locality, packing, and reducing space overhead of list linkage. As Keith Wansbrough mentioned in another post, these kinds of arrangements tend to spoil other useful properties of the more naive-looking lists. -- wli

Actually, I've been wondering about this. If my understanding is correct, Haskell lists are basicly singly-linked lists of cons cells (is that correct?) A simple (I think) thing to do would be to make the lists doubly-linked and circular. That would let us do nice things like have O(1) primops for reverse, tail, (++) and other things that access lists at the end. However, I'm still pretty new to FP in general, so I don't know if there are any theoretical reasons why something like this couldn't work.
x = [3,5,7] primes = 2:x odds = 1:x You can't do sharing like this if your lists are doubly-linked; lots of cool algorithms depend on this sharing. -- first-arg-reversed-then-args-flipped append revApp :: [a] -> [a] -> [a] revApp = foldr (flip (.) . (:)) id -- all insertions of a into ys, with prefix (rev xs) allinserts :: [a] -> a -> [a] -> [[a]] -> [[a]] allinserts xs a [] = (:) (revApp xs [a] ) allinserts xs a ys0@(y:ys) = (:) (revApp xs (a:ys0)) . allinserts (y:xs) a ys -- all permutations of a list allperms :: [a] -> [[a]] allperms = foldr (\ x -> foldr (allinserts [] x) []) [[]]
allperms "abcd" ["abcd","bacd","bcad","bcda","acbd","cabd","cbad","cbda","acdb","cadb","cdab","cdba","abdc","badc","bdac","bdca","adbc","dabc","dbac","dbca","adcb","dacb","dcab","dcba"]
In this list, all the common tails are *shared*, recursively; this is not 24x4 list (cons) cells in memory, but rather less. --KW 8-)

At some point in the past, someone wrote:
Actually, I've been wondering about this. If my understanding is correct, Haskell lists are basicly singly-linked lists of cons cells (is that correct?) A simple (I think) thing to do would be to make the lists doubly-linked and circular. That would let us do nice things like have O(1) primops for reverse, tail, (++) and other things that access lists at the end. However, I'm still pretty new to FP in general, so I don't know if there are any theoretical reasons why something like this couldn't work.
On Fri, Oct 08, 2004 at 02:42:28PM +0100, Keith Wansbrough wrote:
x = [3,5,7] primes = 2:x odds = 1:x You can't do sharing like this if your lists are doubly-linked; lots of cool algorithms depend on this sharing.
That constraint makes various other things painful. I suppose there is no one-size-fits-all solution. -- wli

x = [3,5,7] primes = 2:x odds = 1:x You can't do sharing like this if your lists are doubly-linked; lots of cool algorithms depend on this sharing.
That constraint makes various other things painful. I suppose there is no one-size-fits-all solution.
Then perhaps it is worth considering having multiple implementations and choosing between them with pragmas and/or command line switches (with a sensible default naturally). Maybe doubly linked lists are not a great idea, but if we had a good implementation with, eg. O(1) access to both ends of the list but poor sharing, we can choose to use it only in cases where queue semantics are important and sharing is not. It would be nice to be able to monkey about with that kind of "under the hood" functionality w/o having to make any code changes. You could also do fun things like have chained-buffer list implementations for [Word8], [Char] etc.

On Fri, Oct 08, 2004 at 12:43:45PM -0400, Robert Dockins wrote:
x = [3,5,7] primes = 2:x odds = 1:x You can't do sharing like this if your lists are doubly-linked; lots of cool algorithms depend on this sharing.
That constraint makes various other things painful. I suppose there is no one-size-fits-all solution.
Then perhaps it is worth considering having multiple implementations and choosing between them with pragmas and/or command line switches (with a sensible default naturally). Maybe doubly linked lists are not a great idea, but if we had a good implementation with, eg. O(1) access to both ends of the list but poor sharing, we can choose to use it only in cases where queue semantics are important and sharing is not. It would be nice to be able to monkey about with that kind of "under the hood" functionality w/o having to make any code changes. You could also do fun things like have chained-buffer list implementations for [Word8], [Char] etc.
What I thought would be interesting to implement (and should be possible with ghc) would be as lists were evaluated, if they were packed into UArrays by the thunk-evaluation code. basically, the update mechanism, rather than replace a node with a constant Char thunk, would write the char to a UArray-like structure (of a certain block size) and replace the code pointer to one that knows how to read the list out of said UArray structure. That way, lists would behave like normal, but once evaluated, would be optimized into efficient arrays. also, the work of the update-analysis phase could be leveraged to avoid this when we know the list is not shared. A variation would be to not do this in the update code, but rather the GC scavaging code, if it notices chains of unboxable values in HNF. This is more advantagous as we would avoid the work if the lists are GCed and has no run-time cost. At the time of the GC, we might also have more information, like the full size of the list (if it is fully evaluated) which could help us choose buffer sizes more efficiently or pack it into 8 bit chars if we find out it is all ascii. John -- John Meacham - ⑆repetae.net⑆john⑈

--- Robert Dockins
Then perhaps it is worth considering having multiple implementations and choosing between them with pragmas and/or command line switches (with a sensible default naturally). Maybe doubly linked lists are not a great idea, but if we had a good implementation with, eg. O(1) access to both ends of the list but poor sharing, we can choose to use it only in cases where queue semantics are important and sharing is not. It would be nice to be able to monkey about with that kind of "under the hood" functionality w/o having to make any code changes. You could also do fun things like have chained-buffer list implementations for [Word8], [Char] etc.
Lists are an integral part of the Haskell language, and in fact most languages have some version of list at a fundamental level. Here's an interesting (not necessarily useful!) shift of viewpoint: What if List were a type class? Shawn _______________________________ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com

On Wed, 13 Oct 2004, Shawn Garbett wrote:
Lists are an integral part of the Haskell language, and in fact most languages have some version of list at a fundamental level. Here's an interesting (not necessarily useful!) shift of viewpoint: What if List were a type class?
Then we'd need defaulting ala arithmetic - but yeah, otherwise I would go with that. -- flippa@flippac.org

Shawn Garbett
viewpoint: What if List were a type class?
Or, what if String were one? Could we have painless read/show with arrays of Char, as well as lists, for instance? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

Robert Dockins wrote:
Actually, I've been wondering about this. If my understanding is correct, Haskell lists are basicly singly-linked lists of cons cells (is that correct?) A simple (I think) thing to do would be to make the lists doubly-linked and circular. That would let us do nice things like have O(1) primops for reverse, tail, (++) and other things that access lists at the end. However, I'm still pretty new to FP in general, so I don't know if there are any theoretical reasons why something like this couldn't work. Obviously lazy and infinite lists add some wrinkles, but I think it could be worked through.
BTW can you give some references to these known techniques?
Don't know if this is exactly what you are looking for, but I found these articles quite interesting. http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/ http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/democratic/ Greg Buchholz

Robert Dockins wrote:
BTW can you give some references to these known techniques?
See also, "Purely Functional Data Structures" by Chris Okasaki for functional implementations of queues, dequeues, etc. www-2.cs.cmu.edu/~rwh/theses/okasaki.pdf Greg Buchholz
participants (10)
-
Greg Buchholz
-
Greg Buchholz
-
John Meacham
-
Keith Wansbrough
-
Ketil Malde
-
Philippa Cowderoy
-
Robert Dockins
-
Shawn Garbett
-
Simon Marlow
-
William Lee Irwin III