Re: [Haskell-cafe] Re: Go Haskell! -> array libraries

Henning Thielemann wrote:
I suspect that this particular function is less useful than you think. It safes one allocation and might be faster since it uses less cache, but on the other hand, it cannot be fused.
If the array is "seriously large", you don't want to have five or six versions of it sitting in memory until the GC comes along to remove it. (OTOH, presumably an unboxed array can be allocated or freed quite quickly...) At a minimum, you've going to end up with two copies in memory - the existing one, and the one being built.
I think in-place array updates are only sensible for writing array elements in really random order. As long as you can formulate your algorithm the way "read from random indices, but write a complete array from left to right", there is almost no need for mutable arrays.
Does quick-sort fit that pattern? (I'm guessing yes, since all you need to do is filtering and merging...) Yeah, generally you only need arrays if you really want random access. Except in Haskell, where an array also happens to be the only way of storing large numbers of values unboxed. So sometimes you'll use an array because you want to save space. (E.g., parsing text is usually a sequential process with no random access, but you probably want to use ByteString all the same!) I sometimes also use unboxed arrays for the increase in strictness. I guess the thing to do would be to measure the difference...

On 30/11/2008, at 08:32, Andrew Coppin wrote:
Henning Thielemann wrote:
I suspect that this particular function is less useful than you think. It safes one allocation and might be faster since it uses less cache, but on the other hand, it cannot be fused.
Hmm, I haven't seen your original message but I suspect you are talking about in-place map. In that case, this is not entirely true. Shameless plug: http://www.cse.unsw.edu.au/~rl/publications/recycling.html
I think in-place array updates are only sensible for writing array elements in really random order. As long as you can formulate your algorithm the way "read from random indices, but write a complete array from left to right", there is almost no need for mutable arrays.
Many array algorithms cannot really be written in this way. I think we do need mutable arrays and they should provide much more than just read/write. How to integrate them nicely with immutable arrays is not really clear, though. Roman

rl:
On 30/11/2008, at 08:32, Andrew Coppin wrote:
Henning Thielemann wrote:
I suspect that this particular function is less useful than you think. It safes one allocation and might be faster since it uses less cache, but on the other hand, it cannot be fused.
Hmm, I haven't seen your original message but I suspect you are talking about in-place map. In that case, this is not entirely true. Shameless plug:
http://www.cse.unsw.edu.au/~rl/publications/recycling.html
I think in-place array updates are only sensible for writing array elements in really random order. As long as you can formulate your algorithm the way "read from random indices, but write a complete array from left to right", there is almost no need for mutable arrays.
Many array algorithms cannot really be written in this way. I think we do need mutable arrays and they should provide much more than just read/write. How to integrate them nicely with immutable arrays is not really clear, though.
Should mutable arrays have list-like APIs? All the usual operations, just in-place and destructive where appropriate? -- Don

On 30/11/2008, at 11:36, Don Stewart wrote:
Should mutable arrays have list-like APIs? All the usual operations, just in-place and destructive where appropriate?
I don't know. To be honest, I don't think that the term "mutable array" describes a single data structure. For instance, one of the central questions which unveils a whole bunch of design possibilities is: can mutable arrays be concatenated and how does that work if yes? Roman

Should mutable arrays have list-like APIs? All the usual operations, just in-place and destructive where appropriate?
I don't know. To be honest, I don't think that the term "mutable array" describes a single data structure. For instance, one of the central questions which unveils a whole bunch of design possibilities is: can mutable arrays be concatenated and how does that work if yes?
wrt the first part: within a pure functional language, mutable arrays are something I (am forced to) choose when the implementation is unable to see some essential optimization opportunities in the way I am using immutable arrays; whenever I am forced to go down that route, I'd at least like to have an equivalent set of operations. The move from immutable arrays to mutable arrays is painful enough as it is. So the suggested design criterion would be: as uniform an API as possible, just that immutable arrays cannot make some essential performance guarantees and some operations sit a little uncomfortably in a librart where those guarantees are important. wrt the second part: I'm not sure whether you'll be able to choose a single one of those design possibilities as the only one (perhaps the concatenation is about simpler code, so virtual concatenation would be sufficient and copying possible, but perhaps the goal was locality, so copying would be needed, or perhaps a mixture of both, and the limited disruption in locality is less expensive than the cost of copying, ..), so you might want to expose several of them, with a separate way of selecting a default or I-don't-care choice (imilar to why you have several array libs at present, just that there should be a common API and documentation of design alternatives/rationales;-). -- Btw, it would really be nice if packages took a hint from academic publishing: good papers are expected not only (a) to provide new content, but also (b) to position that content wrt related work and (c) to make explicit what the new contributions/goals are. As forks have become more common on hackage, perhaps .cabal files could be extended with two fields, pointing to related packages (this is more specific than category, linking to packages that might be used instead or in combination with the current package) and giving the rationale/high-lights for the package in text form. In the meantime, it would be great if all packages came with good old README files, which should be linked from the .cabal-based package descriptions on hackage, covering all those useful bits of information that are not yet covered in .cabal files, and giving an overview of what the package is about if the package does not have its own webspace (as is often the case). That would improve on the current situation, where sometimes all one has to go on is the package name and a one-liner description. Random example: looking at hackage package categories, I'd look for array libraries under 'data structures', but they are actually spread over 'data structures', 'data', and 'maths' (perhaps others?). Once I've found one, say 'vector' or 'uvector', what do the package descriptions tell me about the packages and their relation, or their relative advantages? The descriptions are brief, the "home pages" are actually "home repositories", the haddocks seem to have details only, no overview, and from the references/authors/uvector README, one might think these are actually the same library, sequential libs spun off from the same data-parallelism project; only that the haddock details suggest that these libraries are quite different. How? Why? What other alternatives are there? Note that I'm not attacking any specific packages, I would just encourage all package authors to try and see their packages from the perspective of a hackage user: what information does the user look for, what information does the package description offer? -- While we're on the topic: hackage has grown so much and so rapidly, that it might be worthwhile to have a hackage "editor", not in the sense of accepting/rejecting packages (not wanted at present), nor in the sense of objective quality measurements (that is in the process of being automated, according to Duncan), but in the sense of trying to guarantee a subjectively useful overall presentation (category organisation, finding and putting side-by-side related packages despite different names/categories, suitability of package descriptions, getting package authors to talk, etc.). And no, I'm not volunteering!-) But I would imagine that someone might find this a useful way of contributing. Such a hackage editor would also be able to give the Haskell Community Report editor a hand by summarizing hackage activity/trends and highlighting interesting projects that the HCAR editor might want to solicit reports for. From my own time as HCAR editor, I recall finding and chasing interesting projects as well as deciding how to structure the tools&libraries sections as difficult parts of the job. To take another hint from paper publishing: once the volume and variety of publications passes certain thresholds, survey papers become very valuable contributions. We do have time based surveys (biannual, weekly, even daily, if you count unedited RSS feeds;-), but topical surveys would help a lot as well (what are all the options, relative advantages and status of hackage packages for arrays, or databases, or web programming, or..). Such mini-surveys could appear in the Monad.Reader wiki magazine, for instance, and be written by anyone looking for that information the first time ("my experiences in looking for database libraries", etc.), then updated on the wiki as the package authors respond and improve the experience. Obviously, I can't speak for the current hackage maintainers or HCAR and Monad.Reader editors, but I assume they are busy enough that they might welcome this kind of help. Claus

On Sunday 30 November 2008 6:28:29 am Roman Leshchinskiy wrote:
On 30/11/2008, at 11:36, Don Stewart wrote:
Should mutable arrays have list-like APIs? All the usual operations, just in-place and destructive where appropriate?
I don't know. To be honest, I don't think that the term "mutable array" describes a single data structure. For instance, one of the central questions which unveils a whole bunch of design possibilities is: can mutable arrays be concatenated and how does that work if yes?
I don't know about concatenation, but it is useful for them to be able to be zipped. For instance: schwartz f arr = do keys <- cloneWith f arr pairs <- zip keys arr sortBy (comparing fst) pairs (cloneWith should construct a new array where new[i] = f (old[i])) is a mutable sort of arr using the comparison compare e e' = compare (f e) (f e') caching the results of f for each element. This, obviously, takes advantage of the fact that sorting the zipped array mutably updates the underlying (original) arrays. On the other hand, I could see such an operation easily leading to bugs if it slips your mind that changing one array can effect another (maybe it should be called unsafeZip :)). -- Dan
participants (5)
-
Andrew Coppin
-
Claus Reinke
-
Dan Doel
-
Don Stewart
-
Roman Leshchinskiy