Data.IArray rant

newer
Is there any way to parametrize a...

Evan Laforge

3 Sep 2011 3 Sep '11

1:38 a.m.

I never liked Data.IArray. Recently I had to deal with it again and it kind of touched off a bit of a rant: Array bounds are an inclusive range, which runs counter to almost all other uses of ranges, and for good reason. Length of an array is (high - low + 1). If you want to convert a list to an array it's 'IArray.listArray (0, length xs - 1) xs'. And if you forget to subtract one or add one it's likely to be a runtime crash. Off-by-one errors and runtime crashes... I've gotten accustomed to not having those in haskell! The vast majority of data types have their own set of monomorphic fuctions, and yet array all of the sudden wants to be an interface. That's nice and all, but who actually uses that interface? I recall there used to be a DiffArray and some other variants, but they were decried as being slow and appear to be gone now. So it appears that there's a complexity price paid to support different array types under the same interface, namely giant class contexts that can't be abstracted away and dependence on MPTCs... and appears to be unused. And then another price paid to support arbitrary indices... which I at least have never used. Maybe there is a field where arrays with strange index ranges like 5--20 are useful, but I always just want 0--length-1. Meanwhile, they lack basic features like concatenation. No Monoid instance. In fact, hardly any useful instances. The result is that my first contact with haskell arrays left me with the impression that they were complicated, hard to use, and designed for someone with different priorities than me. Of course, Data.Vector didn't exist back then, but if someone were new to haskell now I would recommend they skip Data.IArray and head straight for vector. If they wanted 2 dimensional arrays then I know there are some matrix libraries. Of course, sometimes IArray is hard to avoid, e.g. Data.Graph is just a type synonym for an array. I wrote a small library of graph utilities, and dealing with the arrays was back to awkward land. So... am I alone in this dislike of IArray? Or maybe someone can explain why it's actually a very useful interface? Admittedly vector is a mere 'cabal install' away, but array has a blessed status as a ghc bundled library and is likely to be the first place people will look, as well as the natural base for other data types that want an array, like Data.Graph. I know it's not as simple as "just debundle IArray" because it's a bootlib and presumably used in ghc itself, though I know there is support for reducing the set of bootlibs. There's also the whole mutable interface, which I haven't used, but maybe I'd like it more. I know the array stuff winds up integrated fairly deeply with ghc's unboxed arrays and whatnot, so there's a lot of detail I don't understand. I just know I dread going back to array using code while I tend to enjoy working with vector, bytestring, text, storablevector, etc.

Show replies by date

Ivan Lazar Miljenovic

3 Sep 3 Sep

2:04 a.m.

On 3 September 2011 11:38, Evan Laforge wrote:

...

I never liked Data.IArray. Recently I had to deal with it again and it kind of touched off a bit of a rant:

Array bounds are an inclusive range, which runs counter to almost all other uses of ranges, and for good reason. Length of an array is (high - low + 1). If you want to convert a list to an array it's 'IArray.listArray (0, length xs - 1) xs'. And if you forget to subtract one or add one it's likely to be a runtime crash. Off-by-one errors and runtime crashes... I've gotten accustomed to not having those in haskell!

The vast majority of data types have their own set of monomorphic fuctions, and yet array all of the sudden wants to be an interface. That's nice and all, but who actually uses that interface? I recall there used to be a DiffArray and some other variants, but they were decried as being slow and appear to be gone now.

So it appears that there's a complexity price paid to support different array types under the same interface, namely giant class contexts that can't be abstracted away and dependence on MPTCs... and appears to be unused. And then another price paid to support arbitrary indices... which I at least have never used. Maybe there is a field where arrays with strange index ranges like 5--20 are useful, but I always just want 0--length-1.

I've used this a fair amount: e.g. 2D grids going from (-n,-n) to (n,n). This is one of the things I liked about Haskell arrays, in that you _weren't_ limited to the [0,n-1] bounds found in C-based languages (and that annoyed me to no end when I was using them). Even _Fortran_ had arrays where you can specify the index bounds you want to use rather than being forced to translate from the bounds that make sense to your code to the bounds imposed by the array implementation. Admittedly, I don't use arrays much anymore, but that's because I tend to play with graphs nowadays...

...

Meanwhile, they lack basic features like concatenation.

I obviously don't know what you're doing with arrays, but this sounds a bit like using the wrong datatype for the job: if you're concatenating arrays, wouldn't you have to explicitly create or extend one of those arrays to get the contiguous memory (unless you're using a DiffArray)?

...

No Monoid instance. In fact, hardly any useful instances.

Maybe someone needs to suggest + implement such instances then.

...

The result is that my first contact with haskell arrays left me with the impression that they were complicated, hard to use, and designed for someone with different priorities than me. Of course, Data.Vector didn't exist back then, but if someone were new to haskell now I would recommend they skip Data.IArray and head straight for vector.

To an extent, I wonder how much of this has been that arrays were considered to be bad in Haskell, so no-one used them and no-one bothered to try and improve the API much (and instead went and created Vector, etc.).

...

If they wanted 2 dimensional arrays then I know there are some matrix libraries.

We now have repa, but hmatrix (the only other matrix library that I know of on Hackage) limits you to only 2D matrices.

...

Of course, sometimes IArray is hard to avoid, e.g. Data.Graph is just a type synonym for an array. I wrote a small library of graph utilities, and dealing with the arrays was back to awkward land.

Any particular reason you used Data.Graph rather than fgl, etc.?

...

So... am I alone in this dislike of IArray? Or maybe someone can explain why it's actually a very useful interface? Admittedly vector is a mere 'cabal install' away, but array has a blessed status as a ghc bundled library and is likely to be the first place people will look, as well as the natural base for other data types that want an array, like Data.Graph.

I know it's not as simple as "just debundle IArray" because it's a bootlib and presumably used in ghc itself, though I know there is support for reducing the set of bootlibs. There's also the whole mutable interface, which I haven't used, but maybe I'd like it more. I know the array stuff winds up integrated fairly deeply with ghc's unboxed arrays and whatnot, so there's a lot of detail I don't understand. I just know I dread going back to array using code while I tend to enjoy working with vector, bytestring, text, storablevector, etc.

I see there are two possible sources of your "problems" here: 1. The underlying data structure (index types and bounds, etc.) 2. The API You mainly seem to have an issue with the API itself. My guess is that the API for array hasn't changed much because it was already there and a boot library (so incremental changes, etc.) whereas when vector, etc. was being written the opportunity was being taken to create an API better suited for the task, taking into account people's usage of array. I don't disagree that IArray's API isn't that great and could be improved, but I quite like the ability to have custom index types and bounds. That said, if I was going to be doing anything using arrays, I would probably be looking at vector or repa (for the touted efficiency if nothing else) first, and using mappings between my bounds and the bounds on the array. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Evan Laforge

2:53 a.m.

...

I've used this a fair amount: e.g. 2D grids going from (-n,-n) to (n,n). This is one of the things I liked about Haskell arrays, in that you _weren't_ limited to the [0,n-1] bounds found in C-based languages (and that annoyed me to no end when I was using them). Even

And to be honest, it's fairly easy to wrap the array functions to specialize them for (0..n-1). It just took me a while to figure out that I would have to make my own array module to stop getting those off-by-one errors.

...

...
Meanwhile, they lack basic features like concatenation.

I obviously don't know what you're doing with arrays, but this sounds a bit like using the wrong datatype for the job: if you're concatenating arrays, wouldn't you have to explicitly create or extend one of those arrays to get the contiguous memory (unless you're using a DiffArray)?

Yes, that's also the case for vector, or bytestring, or any of the other array libraries. Concatenating arrays is not a crazy thing to do if it's uncommon and the arrays are not large. You have to copy them in imperative languages too, so it's hardly unique to haskell. Admittedly, I could have used an IntMap but it's a small dense array that's very infrequently changed, and I do mostly iterating forwards and backwards and a plain array seemed simpler than an IntMap with the condition that the keys should be consecutive.

...

To an extent, I wonder how much of this has been that arrays were considered to be bad in Haskell, so no-one used them and no-one bothered to try and improve the API much (and instead went and created Vector, etc.).

Right, so we have arrays with nicer APIs, it's just that IArray has been kind of left behind. I don't think arrays are bad in haskell, if that were true all those array libraries wouldn't be so popular! They might not be quite the go-to type as in other languages, but they still have their uses.

...

Any particular reason you used Data.Graph rather than fgl, etc.?

Well... it's built-in... I needed something simple... so I just hit the ghc stdlib doc and searched for "graph". Is this another "left behind" module? And if so... well, maybe it's in the same situation as IArray.

...

I see there are two possible sources of your "problems" here:

1. The underlying data structure (index types and bounds, etc.) 2. The API

You mainly seem to have an issue with the API itself. My guess is that the API for array hasn't changed much because it was already there and a boot library (so incremental changes, etc.) whereas when vector, etc. was being written the opportunity was being taken to create an API better suited for the task, taking into account people's usage of array.

Right, it's hard to dislike the underlying data structure, arrays are about as simple as you can get :) My guess was the same as yours, i.e. that it's been sort of neglected but it's a bootlib and H98 so nothing can happen to it quickly. I can't realistically suggest a course of action here, other than "gosh someone should update these modules" to which everyone is likely to say "yes, it would be nice if you did that" :) Or maybe in 10 years vector will be mature and standard and the old Array stuff can be deprecated and removed. So that's why this was more of a little rant than actual constructive comment. Maybe at the least I could submit a patch to the IArray haddock saying "there are other array interfaces out there on hackage if you're looking for the 'default array' and this one isn't doing it for you". Maybe the same with Data.Graph and fgl, if that's so much nicer. But then, you know, it feels strange for a stdlib to come with such disclaimers :)

Ivan Lazar Miljenovic

4:20 a.m.

On 3 September 2011 12:53, Evan Laforge wrote:

...

...
To an extent, I wonder how much of this has been that arrays were considered to be bad in Haskell, so no-one used them and no-one bothered to try and improve the API much (and instead went and created Vector, etc.).

Right, so we have arrays with nicer APIs, it's just that IArray has been kind of left behind. I don't think arrays are bad in haskell, if that were true all those array libraries wouldn't be so popular! They might not be quite the go-to type as in other languages, but they still have their uses.

By "bad" I was referring to the API, not the data structure itself.

...

...
Any particular reason you used Data.Graph rather than fgl, etc.?

Well... it's built-in... I needed something simple... so I just hit the ghc stdlib doc and searched for "graph". Is this another "left behind" module? And if so... well, maybe it's in the same situation as IArray.

Pretty much; most people don't even think much of FGL. I _really_ should get around to expanding and improving Edward Kmett's graph library so that it covers more of a variety of common graph types and operations, but I've been busy on specialised graph libraries that I need lately.

...

Right, it's hard to dislike the underlying data structure, arrays are about as simple as you can get :) My guess was the same as yours, i.e. that it's been sort of neglected but it's a bootlib and H98 so nothing can happen to it quickly.

I can't realistically suggest a course of action here, other than "gosh someone should update these modules" to which everyone is likely to say "yes, it would be nice if you did that" :) Or maybe in 10 years vector will be mature and standard and the old Array stuff can be deprecated and removed. So that's why this was more of a little rant than actual constructive comment.

Maybe as a stop-gap some more useful/constructive examples/documentation on how to use Data.Array.IArray compared to vector, etc.? That said, apart from Data.Graph I don't know of many uses of it nowadays, as everyone seems to have migrated to vector for 1-Dimensional and hmatrix/repa for multi-dimensional. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Roman Leshchinskiy

8:46 a.m.

On 03/09/2011, at 03:04, Ivan Lazar Miljenovic wrote:

...

On 3 September 2011 11:38, Evan Laforge wrote:

...
The result is that my first contact with haskell arrays left me with the impression that they were complicated, hard to use, and designed for someone with different priorities than me. Of course, Data.Vector didn't exist back then, but if someone were new to haskell now I would recommend they skip Data.IArray and head straight for vector.

To an extent, I wonder how much of this has been that arrays were considered to be bad in Haskell, so no-one used them and no-one bothered to try and improve the API much (and instead went and created Vector, etc.).

It's rather that some considered the IArray API to be inadequate most of the time. Really, H98 arrays aren't very good at anything they do. For collective operations, you are supposed to convert the array to a list, work on the list and then convert it back to an array which just seems wrong. Multidimensional arrays can't be sliced and diced in the style of Repa or NumPy. In general, H98 arrays seem to have been designed with the goal of providing a container with O(1) indexing. They do that, I suppose, although they aren't very well integrated with the rest of the language and they have conceptual problems (such as requiring two bounds checks for one array access). But requirements have shifted quite a bit since then. Now, people want to write real array programs and they want those to be fast. Personally, I don't know how to improve the H98 array API to provide this. You basically need to create a completely new API based on different principles. Roman

Jon Fairbairn

6 Sep 6 Sep

2:14 p.m.

Roman Leshchinskiy writes:

...

On 03/09/2011, at 03:04, Ivan Lazar Miljenovic wrote:

...
On 3 September 2011 11:38, Evan Laforge wrote:

...
The result is that my first contact with haskell arrays left me with the impression that they were complicated, hard to use, and designed for someone with different priorities than me. Of course, Data.Vector didn't exist back then, but if someone were new to haskell now I would recommend they skip Data.IArray and head straight for vector.

To an extent, I wonder how much of this has been that arrays were considered to be bad in Haskell, so no-one used them and no-one bothered to try and improve the API much (and instead went and created Vector, etc.).

No, arrays were not considered to be bad, they were designed with parallelism in mind. It’s just that no-one has really put much effort into the implementation (partly because during the first few years no-one wrote anything that used arrays much, and then there’s a feedback loop no usage=>no implementation effor=>no usage. Also there seem to be some misunderstandings.

...

It's rather that some considered the IArray API to be inadequate most of the time. Really, H98 arrays aren't very good at anything they do. For collective operations, you are supposed to convert the array to a list, work on the list and then convert it back to an array which just seems wrong.

I am unconvinded that this is any more wrong than using a for loop in an imperative language. Remember that the lists are lazy, so it’s misleading to say “convert the array to a list” since what happens most of the time is that elements are taken out of the array and passed to the processing function and then thrown away before the next element is processed.

...

Multidimensional arrays can't be sliced and diced in the style of Repa or NumPy.

I’m not familiar with Repa or NumPy, but what can they do that cannot be done with judicious use of ixmap, which is a very powerful mechanism.

...

In general, H98 arrays seem to have been designed with the goal of providing a container with O(1) indexing. They do that, I suppose, although they aren't very well integrated with the rest of the language

Can you give examples?

...

and they have conceptual problems (such as requiring two bounds checks for one array access).

Assuming that you mean that for safe array access where nothing is known about the index at compile time, since any sort of array has at least a beginning and an end, they all require two bounds checks. Once you do know something about the index, it’s a question of implementation. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Roman Leshchinskiy

3:41 p.m.

Jon Fairbairn wrote:

...

Roman Leshchinskiy writes:

No, arrays were not considered to be bad, they were designed with parallelism in mind.

I'm not sure how this can be the case if, as discussed below, most array operations have to go through lists, an inherently sequential data structure.

...

...
It's rather that some considered the IArray API to be inadequate most of the time. Really, H98 arrays aren't very good at anything they do. For collective operations, you are supposed to convert the array to a list, work on the list and then convert it back to an array which just seems wrong.

I am unconvinded that this is any more wrong than using a for loop in an imperative language.

Oh, definitely. In fact, I consider using a for loop even more wrong, you really want to use collective operations instead whenever possible. But I don't think for loops are the right benchmark for ease of use.

...

Remember that the lists are lazy, so itâs misleading to say âconvert the array to a listâ since what happens most of the time is that elements are taken out of the array and passed to the processing function and then thrown away before the next element is processed.

Efficiency isn't even the biggest problem here. Whenever you want to perform a collective operation on an array, you have to do the actual work on an entirely different data structure. So I, as a programmer, have to say: "Convert this array to a list, then do something with the list and then convert the resulting list back to an array". To me, at least, this is very inconvenient and requires a context switch. No other widely used container type requires you to do that.

...

...
Multidimensional arrays can't be sliced and diced in the style of Repa or NumPy.

Iâm not familiar with Repa or NumPy, but what can they do that cannot be done with judicious use of ixmap, which is a very powerful mechanism.

Yes, it is quite powerful but not very convenient. Just as an example, if A is a matrix, then A[3,:] gives you the 4th row and A[:,3] the 4th column in NumPy. You can do that with ixmap but it's far more involved. It also isn't powerful enough since it doesn't support certain highly useful uses of shape polymorphism (an example is a generic concat which decreases the dimensionality of any array by 1). This is discussed in detail in http://www.cse.unsw.edu.au/~chak/papers/repa.pdf.

...

...
In general, H98 arrays seem to have been designed with the goal of providing a container with O(1) indexing. They do that, I suppose, although they aren't very well integrated with the rest of the language

Can you give examples?

My favourite is the interaction between arrays and Enum. If I want to implement a generic enumFromTo m n that produces an array, I have to create the list [m .. n], take its length (which forces the entire list into memory) and then create an array of that length and fill it from the list. There doesn't seem to be any other way of doing it. This is a deficiency of Enum, really, but that's what I mean by not being well integrated. There are other examples like this.

...

...
and they have conceptual problems (such as requiring two bounds checks for one array access).

Assuming that you mean that for safe array access where nothing is known about the index at compile time, since any sort of array has at least a beginning and an end, they all require two bounds checks. Once you do know something about the index, itâs a question of implementation.

That's not what I meant. You really need to do two full bounds checks, one against inRange and one against the actual length of the array. Here is the relevant comment from the GHC libraries: Note [Double bounds-checking of index values] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When you index an array, a!x, there are two possible bounds checks we might make: (A) Check that (inRange (bounds a) x) holds. (A) is checked in the method for 'index' (B) Check that (index (bounds a) x) lies in the range 0..n, where n is the size of the underlying array (B) is checked in the top-level function (!), in safeIndex. Of course it *should* be the case that (A) holds iff (B) holds, but that is a property of the particular instances of index, bounds, and inRange, so GHC cannot guarantee it. * If you do (A) and not (B), then you might get a seg-fault, by indexing at some bizarre location. Trac #1610 * If you do (B) but not (A), you may get no complaint when you index an array out of its semantic bounds. Trac #2120 At various times we have had (A) and not (B), or (B) and not (A); both led to complaints. So now we implement *both* checks (Trac #2669). Roman

Andrew Coppin

5:40 p.m.

...

...
It's rather that some considered the IArray API to be inadequate most of the time. Really, H98 arrays aren't very good at anything they do. For collective operations, you are supposed to convert the array to a list, work on the list and then convert it back to an array which just seems wrong.

I am unconvinded that this is any more wrong than using a for loop in an imperative language. Remember that the lists are lazy, so it’s misleading to say “convert the array to a list” since what happens most of the time is that elements are taken out of the array and passed to the processing function and then thrown away before the next element is processed.

You're still needlessly allocating and then garbage collecting memory though. My personal feeling is that Haskell programmers use lists a lot when what they really want are "streams" (in the sense of the stream-fusion library). I don't want you to actually *make* a physical list, I just want to abstract over something that sequentially generates data. But then, lists are in the standard Prelude, and hard-wired into the language syntax. Streams aren't exposed by any moderately standard library. Even the stream-fusion library treats streams as a sort of "under the covers" thing to deforest lists. My humble opinion is that half the time when people create physical lists, what they really mean is a data pipeline that doesn't need to physically materialise at all. But anyway... Haskell '98 arrays may have fancy indexing, but I've always wished that there had been a plain ordinary integer-indexed array construct, with the fancy indexing implemented on top of that, so your use of fancy indexing is /optional/. I get the distinct impression that the default array libraries are so barren just because nobody really used them much, and that was then and this is now, and it's a major pain to break something that so much code is written against... I just hope we don't end up with a bazillion different array implementations popping up. But we'll see.

Brandon Allbery

9:03 p.m.

On Tue, Sep 6, 2011 at 13:40, Andrew Coppin wrote:

...

But then, lists are in the standard Prelude, and hard-wired into the language syntax. Streams aren't exposed by any moderately standard library. Even the stream-fusion library treats streams as a sort of "under the covers" thing to deforest lists.

These two statements taken together are slightly ironic: stream-fusion's goal is to make streams wired-in syntax by borrowing the syntax from lists. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Richard O'Keefe

7 Sep 7 Sep

5:06 a.m.

On 7/09/2011, at 5:40 AM, Andrew Coppin wrote:

...

But anyway... Haskell '98 arrays may have fancy indexing, but I've always wished that there had been a plain ordinary integer-indexed array construct, with the fancy indexing implemented on top of that, so your use of fancy indexing is /optional/

Just like Clean, in fact, although Clean programmers seem to feel no pressure to implement fancy indexing. If the Clean team hadn't decided to focus their attention on an operating system made by that company that was convicted of software piracy (http://www.diaplous.org/nanterre.htm), I'd still be using it...

Evan Laforge

6 Sep 6 Sep

7:23 p.m.

...

I am unconvinded that this is any more wrong than using a for loop in an imperative language. Remember that the lists are lazy, so it’s misleading to say “convert the array to a list” since what happens most of the time is that elements are taken out of the array and passed to the processing function and then thrown away before the next element is processed.

I don't think it's misleading to say "convert to a list". Without the presence of some fusion-esque optimization, the entire list is created and destroyed. It's nice if it can be done incrementally, but that's still a lot of garbage. And for a concrete example, consider concatenating a bunch of arrays. Unless it gets deforested, you have to copy each element a couple of times, since there is intermediate list concatenation. And then this kind of API is incompatible with unboxing... if you later decide to unbox the data type then I don't think you could ever get the intermediate list approach to work, since it's boxed by nature.

...

I’m not familiar with Repa or NumPy, but what can they do that cannot be done with judicious use of ixmap, which is a very powerful mechanism.

I spent hours messing with ixmap before getting something that seemed to work right. Maybe I just have more trouble than most with it, but though it may be powerful it's certainly not easy to use! After spending too much time on it I learned to just convert to a list and back.

5058

Age (days ago)

5062

Last active (days ago)

List overview

Download

10 comments

7 participants

participants (7)

Andrew Coppin
Brandon Allbery
Evan Laforge
Ivan Lazar Miljenovic
Jon Fairbairn
Richard O'Keefe
Roman Leshchinskiy

Data.IArray rant

Andrew Coppin

tags

participants (7)