
Jon Fairbairn wrote:
Roman Leshchinskiy
writes: No, arrays were not considered to be bad, they were designed with parallelism in mind.
I'm not sure how this can be the case if, as discussed below, most array operations have to go through lists, an inherently sequential data structure.
It's rather that some considered the IArray API to be inadequate most of the time. Really, H98 arrays aren't very good at anything they do. For collective operations, you are supposed to convert the array to a list, work on the list and then convert it back to an array which just seems wrong.
I am unconvinded that this is any more wrong than using a for loop in an imperative language.
Oh, definitely. In fact, I consider using a for loop even more wrong, you really want to use collective operations instead whenever possible. But I don't think for loops are the right benchmark for ease of use.
Remember that the lists are lazy, so itâs misleading to say âconvert the array to a listâ since what happens most of the time is that elements are taken out of the array and passed to the processing function and then thrown away before the next element is processed.
Efficiency isn't even the biggest problem here. Whenever you want to perform a collective operation on an array, you have to do the actual work on an entirely different data structure. So I, as a programmer, have to say: "Convert this array to a list, then do something with the list and then convert the resulting list back to an array". To me, at least, this is very inconvenient and requires a context switch. No other widely used container type requires you to do that.
Multidimensional arrays can't be sliced and diced in the style of Repa or NumPy.
Iâm not familiar with Repa or NumPy, but what can they do that cannot be done with judicious use of ixmap, which is a very powerful mechanism.
Yes, it is quite powerful but not very convenient. Just as an example, if A is a matrix, then A[3,:] gives you the 4th row and A[:,3] the 4th column in NumPy. You can do that with ixmap but it's far more involved. It also isn't powerful enough since it doesn't support certain highly useful uses of shape polymorphism (an example is a generic concat which decreases the dimensionality of any array by 1). This is discussed in detail in http://www.cse.unsw.edu.au/~chak/papers/repa.pdf.
In general, H98 arrays seem to have been designed with the goal of providing a container with O(1) indexing. They do that, I suppose, although they aren't very well integrated with the rest of the language
Can you give examples?
My favourite is the interaction between arrays and Enum. If I want to implement a generic enumFromTo m n that produces an array, I have to create the list [m .. n], take its length (which forces the entire list into memory) and then create an array of that length and fill it from the list. There doesn't seem to be any other way of doing it. This is a deficiency of Enum, really, but that's what I mean by not being well integrated. There are other examples like this.
and they have conceptual problems (such as requiring two bounds checks for one array access).
Assuming that you mean that for safe array access where nothing is known about the index at compile time, since any sort of array has at least a beginning and an end, they all require two bounds checks. Once you do know something about the index, itâs a question of implementation.
That's not what I meant. You really need to do two full bounds checks, one against inRange and one against the actual length of the array. Here is the relevant comment from the GHC libraries: Note [Double bounds-checking of index values] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When you index an array, a!x, there are two possible bounds checks we might make: (A) Check that (inRange (bounds a) x) holds. (A) is checked in the method for 'index' (B) Check that (index (bounds a) x) lies in the range 0..n, where n is the size of the underlying array (B) is checked in the top-level function (!), in safeIndex. Of course it *should* be the case that (A) holds iff (B) holds, but that is a property of the particular instances of index, bounds, and inRange, so GHC cannot guarantee it. * If you do (A) and not (B), then you might get a seg-fault, by indexing at some bizarre location. Trac #1610 * If you do (B) but not (A), you may get no complaint when you index an array out of its semantic bounds. Trac #2120 At various times we have had (A) and not (B), or (B) and not (A); both led to complaints. So now we implement *both* checks (Trac #2669). Roman