RE: [Haskell-cafe] Re: OCaml list sees abysmalLanguage Shootoutresults

- take this further and have list cells with 2 (or more) unboxed characters, or even a full buffer.
This sounds like the best idea to me... with each list cell being a full buffer you could effectively write nieve [Char] code and have it implemented in about as fast a way as possible... A little improvement on this would be to do a bit of read ahead, so that the filesystem latency can be hidden. (ie one thread reads the next buffer, whilst the application thread processes the previous buffer)... This would also benefit string processing... Imagine: test = "aaaa" ++ "bbbb" This could be implented as two list cells, one for each string, anf the cost of the "++" becomes the same as the cost of ":" test = 'a' : "bbbb" could be implemented in exactly the same way (a cell of size one and a cell of size 4) In this way with the cell size dependant on the source, lists generated by prepending characters would remain exactly as they are. The only slight complexity would be allowing 'next' pointers to point halfway through a cell, so that if you start with "dddd" remove the first character and prepend 'c' the original list "dddd" could still be in scope. Keean.

MR K P SCHUPKE
This sounds like the best idea to me... with each list cell being a full buffer you could effectively write nieve [Char] code and have it implemented in about as fast a way as possible...
If the representation of some lists was changed, it would complicate all code which works on lists. Or maybe only polymorphic code, but it's still much. I don't believe it would be practical. In general only specific *code* can be compiled more efficiently. Data can't have special cases represented differently without inducing a cost to some other code which doesn't know statically whether the special case is used and must deal with the complexity at runtime. It can't be transparent. A different type for semi-packed strings, with some tricky implementation - fine. But without making good old lists harder. OCaml represents arrays of floats differently than others. It makes all the generic array code much slower than it would be without that decision, and slower than the same code copied-and-pasted but specialized to some concrete type. IMHO it was a bad decision (but I don't have any concrete benchmark data to confirm this). It would be better to have a separate float array type. Of course it would be problematic to have a convenient syntax for them, as OCaml doesn't like overloading... -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

On Fri, 8 Oct 2004, Marcin 'Qrczak' Kowalczyk wrote:
If the representation of some lists was changed, it would complicate all code which works on lists. Or maybe only polymorphic code, but it's still much. I don't believe it would be practical.
That's true in OCaml but not in the STG-machine, where heap objects are accessed through a method-call interface. Adding new representations is easy because the caller doesn't know what the representation is anyway. GHC already uses a special packed representation for static Strings, and I see no reason why the garbage collector couldn't in principle compact lists as it copied them. I don't even see any reason why there couldn't be a packing-aware (++), or a listGetBuf :: [a] -> Int -> (Array Int a, [a]) which noticed when a list was packed and used memcpy() as appropriate. None of these optimizations would complicate or slow down other code which used lists, though they would complicate the run-time system of course. -- Ben

This would also benefit string processing... Imagine:
test = "aaaa" ++ "bbbb"
This could be implented as two list cells, one for each string, anf the cost of the "++" becomes the same as the cost of ":"
No, you still have to copy "aaaa" so you can change the tail pointer - but at least it's a single memcpy() rather than a tree-walk-and-allocate. --KW 8-)
participants (4)
-
Ben Rudiak-Gould
-
Keith Wansbrough
-
Marcin 'Qrczak' Kowalczyk
-
MR K P SCHUPKE