The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

Spencer Janssen

2 Dec 2006 2 Dec '06

3:23 a.m.

I've discovered a problem in the array libraries that allows one to read arbitrary memory locations without the use of any unsafeFoo functions. Both GHC and Hugs are affected.

...

import Data.Array.IArray import Data.Array.Unboxed

Here is a poorly behaved instance of Ix: inRange and index ignore the bounds supplied.

...

newtype EvilIx = E Int deriving (Eq, Ord) instance Ix EvilIx where inRange _ _ = True index _ (E i) = i range (E x, E y) = map E $ range (x, y)

One can read arbitrary memory locations, eventually indexing far enough to cause a segmentation fault.

...

main = print [a ! E (2^i) | i <- [0..]] where a :: UArray EvilIx Int a = array (E 0, E 0) []

This problem is not specific to UArrays:

...

main' = print [a ! E (2^i) | i <- [0..]] where a :: Array EvilIx String a = array (E 0, E 0) []

The issue is that the array operators trust the Ix instance to manage boundaries correctly. The solution is to double check the value returned by index with the actual length of the array. I volunteer to write the fix, if I can extract some hints from more knowledgeable folk. There's sizeOfByteArray#, is there an analog for an Array#? I need to know how to find the length of Hugs array primitives too. Cheers, Spencer Janssen

Show replies by date

Simon Marlow

4 Dec 4 Dec

3:02 p.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

Spencer Janssen wrote:

...

I've discovered a problem in the array libraries that allows one to read arbitrary memory locations without the use of any unsafeFoo functions. Both GHC and Hugs are affected.

...
import Data.Array.IArray import Data.Array.Unboxed

Here is a poorly behaved instance of Ix: inRange and index ignore the bounds supplied.

...
newtype EvilIx = E Int deriving (Eq, Ord) instance Ix EvilIx where inRange _ _ = True index _ (E i) = i range (E x, E y) = map E $ range (x, y)

One can read arbitrary memory locations, eventually indexing far enough to cause a segmentation fault.

See Chapter 15 of the Haskell Report, in particular the paragraph An implementation is entitlesd to assume the following laws about these operations: range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)] Your inRange operation doesn't satisfy the second law. You might well argue that having this assumption is undesirable (and I'd probably agree), but haskell-prime@haskell.org is probably a better place to start that discussion. Cheers, Simon

Nils Anders Danielsson

9 p.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

On Mon, 04 Dec 2006, Simon Marlow wrote:

...

An implementation is entitlesd to assume the following laws about these operations:

range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)]

Even if these laws are not satisfied, is the implementation really allowed to segfault? I would have guessed that the array operations should still be equivalent to some pure Haskell program (e.g. undefined). If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;) -- /NAD

Samuel Bronson

10:21 p.m.

On 12/4/06, Nils Anders Danielsson wrote:

...

If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)

Yeah, but that would just be asking for it to happen to *you*! Besides, it is too much work!

Simon Marlow

5 Dec 5 Dec

10:06 a.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

Nils Anders Danielsson wrote:

...

On Mon, 04 Dec 2006, Simon Marlow wrote:

...
An implementation is entitlesd to assume the following laws about these operations:

range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)]

Even if these laws are not satisfied, is the implementation really allowed to segfault? I would have guessed that the array operations should still be equivalent to some pure Haskell program (e.g. undefined).

To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.

...

If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)

Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type... Cheers, Simon

Iavor Diatchki

6 p.m.

On 12/5/06, Simon Marlow wrote:

...

Nils Anders Danielsson wrote:

...
On Mon, 04 Dec 2006, Simon Marlow wrote:

...
An implementation is entitlesd to assume the following laws about these operations:

range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)]

Even if these laws are not satisfied, is the implementation really allowed to segfault? I would have guessed that the array operations should still be equivalent to some pure Haskell program (e.g. undefined).

To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.

...
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)

Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type...

Cheers, Simon

Still, this is pretty bad, and raises questions about the safety of Haskell programs in general. It seems unsatisfactory that if a programmer makes a mistake in the definition of an 'Ix' instance, then there are no guarantees about the behavior of their program at all... I though that implementations did not assume anything about the user-specified instances. Are there other classes for which GHC assumes something about the instances? If so this should be documented in bold letters with lots of flashing red lights :-) -Iavor

Simon Peyton-Jones

6 Dec 6 Dec

7:50 a.m.

| > To me, the wording "An implementation is entitled to assume..." implies that | > there are no obligations on the implementation should the assumption not hold - | > no obligation to yield _|_ or any other behaviour. | > | > > If "laws not satisfied => any behaviour OK" were the correct | > > interpretation, then it would be OK for the Array implementation to | > > wipe all your files at the first encounter of a broken Ix law... ;) | > | > Yup. That's not quite as bad as in C, where it's ok for an implementation to | > wipe all your files if you overflow the int type... | > | > Cheers, | > Simon | | Still, this is pretty bad, and raises questions about the safety of | Haskell programs in general. It seems unsatisfactory that if a | programmer makes a mistake in the definition of an 'Ix' instance, then | there are no guarantees about the behavior of their program at all... I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash. However, I don't see how to achieve this for array indexing, without adding another test to every array access. Simon

Simon Marlow

11:17 a.m.

Simon Peyton-Jones wrote:

...

...
...
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.

...
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)

Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type...

Cheers, Simon

Still, this is pretty bad, and raises questions about the safety of Haskell programs in general. It seems unsatisfactory that if a programmer makes a mistake in the definition of an 'Ix' instance, then there are no guarantees about the behavior of their program at all...

I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash.

Yes, I agree too! I'm just pointing out that the problem is already in the Haskell 98 specification, which we follow. If possible we should fix this in Haskell'.

...

However, I don't see how to achieve this for array indexing, without adding another test to every array access.

If (!) was changed to be a method of IArray, then for certain arrays we could use unsafeIndex instead of index, and check the index against the physical array size instead. Eg. (!) is currently defined as arr ! i = case bounds arr of (l,u) -> unsafeAt arr (index (l,u) i) This would be the default method for (!), but for some arrays we could replace it by arr ! i = case bounds arr of (l,u) -> safeAt (unsafeIndex (l,u) i) Where safeAt is implemented by looking at the physical array size. One slight problem is that the size stored in GHC's byte arrays is rounded up to the nearest word. Also we don't have a sizeOfArray# primitive, as Spencer pointed out in the original post in the thread. Similar changes would be required in the MArray class too: readArray/writeArray would need to become methods. Cheers, Simon

Iavor Diatchki

5:33 p.m.

Hello, another option (perhaps too draconian?) would be to disallow user-defined instances for Ix, and to ensure that the predefined instances satisfy the required laws. We can do this in one of the following ways: * Do not export 'Ix': not good because we cannot write type signatures * Export 'Ix' but not its methods: ensures that programmers can define only trivial (i.e. undefined) instances. * Add a super class to 'Ix' (e.g., 'PrivateIx') which is not exported. This would disallow user-defined instances because they can never satisfy the 'PrivateIx' constraint. Perhaps such restrictions are not in the spirit of Haskell but this is a valid point in the design space that we might want to consider. -Iavor On 12/6/06, Simon Marlow wrote:

...

Simon Peyton-Jones wrote:

...
...
...
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.

...
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)

Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type...

Cheers, Simon

Still, this is pretty bad, and raises questions about the safety of Haskell programs in general. It seems unsatisfactory that if a programmer makes a mistake in the definition of an 'Ix' instance, then there are no guarantees about the behavior of their program at all...

I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash.

Yes, I agree too! I'm just pointing out that the problem is already in the Haskell 98 specification, which we follow. If possible we should fix this in Haskell'.

...
However, I don't see how to achieve this for array indexing, without adding another test to every array access.

If (!) was changed to be a method of IArray, then for certain arrays we could use unsafeIndex instead of index, and check the index against the physical array size instead. Eg. (!) is currently defined as

arr ! i = case bounds arr of (l,u) -> unsafeAt arr (index (l,u) i)

This would be the default method for (!), but for some arrays we could replace it by

arr ! i = case bounds arr of (l,u) -> safeAt (unsafeIndex (l,u) i)

Where safeAt is implemented by looking at the physical array size. One slight problem is that the size stored in GHC's byte arrays is rounded up to the nearest word. Also we don't have a sizeOfArray# primitive, as Spencer pointed out in the original post in the thread. Similar changes would be required in the MArray class too: readArray/writeArray would need to become methods.

Cheers, Simon

Samuel Bronson

11 Dec 11 Dec

11:44 p.m.

On 12/6/06, Iavor Diatchki wrote:

...

Hello,

another option (perhaps too draconian?) would be to disallow user-defined instances for Ix, and to ensure that the predefined instances satisfy the required laws. We can do this in one of the following ways: * Do not export 'Ix': not good because we cannot write type signatures * Export 'Ix' but not its methods: ensures that programmers can define only trivial (i.e. undefined) instances. * Add a super class to 'Ix' (e.g., 'PrivateIx') which is not exported. This would disallow user-defined instances because they can never satisfy the 'PrivateIx' constraint.

We could put one of these things in e.g. Data.Ix.IAmNotAnIdiot, so that users can define instances of Ix if they need to, but not without being warned that this might be dangerous, and that it is usually possible to just use derived one. (Such warnings would appear in the documentation of this module.)

Stefan O'Rear

6 Dec 6 Dec

1:39 p.m.

On Wed, Dec 06, 2006 at 07:50:30AM +0000, Simon Peyton-Jones wrote:

...

I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash.

However, I don't see how to achieve this for array indexing, without adding another test to every array access.

If we allow inRange to return anything at all if the result of index is out-of-bounds, then the standard Ix instances lose one test for each access; e.g. the Int instance can return constant True for inRange because any invalid index would give an invalid result. Obviously, this makes inRange into a rather low level operation that should be renamed and wrapped (with a function that always gives the right answer).

Nils Anders Danielsson

5 Dec 5 Dec

6:50 p.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

On Tue, 05 Dec 2006, Simon Marlow wrote:

...

To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.

I guess that's a valid interpretation. As an aside, it is interesting to note that the Ix instance for Integer does not satisfy the first law (if a wraparound semantics is used for Int; this is allowed by the report). With l = 0, u = toInteger (maxBound :: Int) + 1 and i = u we have inRange (l,u) i = True but range (l,u) !! index (l,u) i = ⊥ ≠ i. Furthermore, more seriously, the third law isn't type correct, and a corrected version, map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u)], isn't satisfied by the Ix instance for Int, since (with b = (0, 0 :: Int)) map (index b) (range b) = [0] while [0..rangeSize b] = [0, 1]. I think the law should really be map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u) - 1]. This law is also prone to overflow errors, by the way. -- /NAD

Samuel Bronson

11:37 p.m.

On 12/5/06, Nils Anders Danielsson wrote:

...

[Stuff about overflow and Ix laws]

But, will you ever encounter such overflows in using an array? I suppose you might if you tried to make one larger than the address space...

Nils Anders Danielsson

6 Dec 6 Dec

10:16 a.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

On Wed, 06 Dec 2006, "Samuel Bronson" wrote:

...

But, will you ever encounter such overflows in using an array? I suppose you might if you tried to make one larger than the address space...

Are you assuming that Ints can always be used to address the entire address space? Ints are only guaranteed to have 30 bits. -- /NAD

Simon Marlow

11:25 a.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

Samuel Bronson wrote:

...

On 12/5/06, Nils Anders Danielsson wrote:

...
[Stuff about overflow and Ix laws]

But, will you ever encounter such overflows in using an array? I suppose you might if you tried to make one larger than the address space...

Problems in this area have been reported as GHC bugs several times, see for example: http://hackage.haskell.org/trac/ghc/ticket/229 Cheers, Simon

Simon Marlow

11:27 a.m.

New subject: The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

Nils Anders Danielsson wrote:

...

On Tue, 05 Dec 2006, Simon Marlow wrote:

...
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.

I guess that's a valid interpretation.

As an aside, it is interesting to note that the Ix instance for Integer does not satisfy the first law (if a wraparound semantics is used for Int; this is allowed by the report). With l = 0, u = toInteger (maxBound :: Int) + 1 and i = u we have inRange (l,u) i = True but range (l,u) !! index (l,u) i = ⊥ ≠ i.

Furthermore, more seriously, the third law isn't type correct, and a corrected version,

map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u)],

isn't satisfied by the Ix instance for Int, since (with b = (0, 0 :: Int)) map (index b) (range b) = [0] while [0..rangeSize b] = [0, 1]. I think the law should really be

map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u) - 1].

This law is also prone to overflow errors, by the way.

Thanks Nils, I'll make sure these points get addressed during the Haskell' process. Cheers, Simon

6784

Age (days ago)

6793

Last active (days ago)

List overview

Download

15 comments

8 participants

participants (8)

Iavor Diatchki
Nils Anders Danielsson
Samuel Bronson
Simon Marlow
Simon Marlow
Simon Peyton-Jones
Spencer Janssen
Stefan O'Rear