The Data.Array.* hierarchy is unsafe (or, Segfaulting for fun and profit)

I've discovered a problem in the array libraries that allows one to read arbitrary memory locations without the use of any unsafeFoo functions. Both GHC and Hugs are affected.
import Data.Array.IArray import Data.Array.Unboxed
Here is a poorly behaved instance of Ix: inRange and index ignore the bounds supplied.
newtype EvilIx = E Int deriving (Eq, Ord) instance Ix EvilIx where inRange _ _ = True index _ (E i) = i range (E x, E y) = map E $ range (x, y)
One can read arbitrary memory locations, eventually indexing far enough to cause a segmentation fault.
main = print [a ! E (2^i) | i <- [0..]] where a :: UArray EvilIx Int a = array (E 0, E 0) []
This problem is not specific to UArrays:
main' = print [a ! E (2^i) | i <- [0..]] where a :: Array EvilIx String a = array (E 0, E 0) []
The issue is that the array operators trust the Ix instance to manage boundaries correctly. The solution is to double check the value returned by index with the actual length of the array. I volunteer to write the fix, if I can extract some hints from more knowledgeable folk. There's sizeOfByteArray#, is there an analog for an Array#? I need to know how to find the length of Hugs array primitives too. Cheers, Spencer Janssen

Spencer Janssen wrote:
I've discovered a problem in the array libraries that allows one to read arbitrary memory locations without the use of any unsafeFoo functions. Both GHC and Hugs are affected.
import Data.Array.IArray import Data.Array.Unboxed
Here is a poorly behaved instance of Ix: inRange and index ignore the bounds supplied.
newtype EvilIx = E Int deriving (Eq, Ord) instance Ix EvilIx where inRange _ _ = True index _ (E i) = i range (E x, E y) = map E $ range (x, y)
One can read arbitrary memory locations, eventually indexing far enough to cause a segmentation fault.
See Chapter 15 of the Haskell Report, in particular the paragraph An implementation is entitlesd to assume the following laws about these operations: range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)] Your inRange operation doesn't satisfy the second law. You might well argue that having this assumption is undesirable (and I'd probably agree), but haskell-prime@haskell.org is probably a better place to start that discussion. Cheers, Simon

On Mon, 04 Dec 2006, Simon Marlow
An implementation is entitlesd to assume the following laws about these operations:
range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)]
Even if these laws are not satisfied, is the implementation really allowed to segfault? I would have guessed that the array operations should still be equivalent to some pure Haskell program (e.g. undefined). If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;) -- /NAD

On 12/4/06, Nils Anders Danielsson
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)
Yeah, but that would just be asking for it to happen to *you*! Besides, it is too much work!

Nils Anders Danielsson wrote:
On Mon, 04 Dec 2006, Simon Marlow
wrote: An implementation is entitlesd to assume the following laws about these operations:
range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)]
Even if these laws are not satisfied, is the implementation really allowed to segfault? I would have guessed that the array operations should still be equivalent to some pure Haskell program (e.g. undefined).
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)
Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type... Cheers, Simon

On 12/5/06, Simon Marlow
Nils Anders Danielsson wrote:
On Mon, 04 Dec 2006, Simon Marlow
wrote: An implementation is entitlesd to assume the following laws about these operations:
range (l,u) !! index (l,u) i == i -- when i is in scope inRange (l,u) i == i `elem` range (l,u) map index (range (l,u)) == [0..rangeSize (l,u)]
Even if these laws are not satisfied, is the implementation really allowed to segfault? I would have guessed that the array operations should still be equivalent to some pure Haskell program (e.g. undefined).
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)
Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type...
Cheers, Simon
Still, this is pretty bad, and raises questions about the safety of Haskell programs in general. It seems unsatisfactory that if a programmer makes a mistake in the definition of an 'Ix' instance, then there are no guarantees about the behavior of their program at all... I though that implementations did not assume anything about the user-specified instances. Are there other classes for which GHC assumes something about the instances? If so this should be documented in bold letters with lots of flashing red lights :-) -Iavor

| > To me, the wording "An implementation is entitled to assume..." implies that | > there are no obligations on the implementation should the assumption not hold - | > no obligation to yield _|_ or any other behaviour. | > | > > If "laws not satisfied => any behaviour OK" were the correct | > > interpretation, then it would be OK for the Array implementation to | > > wipe all your files at the first encounter of a broken Ix law... ;) | > | > Yup. That's not quite as bad as in C, where it's ok for an implementation to | > wipe all your files if you overflow the int type... | > | > Cheers, | > Simon | | Still, this is pretty bad, and raises questions about the safety of | Haskell programs in general. It seems unsatisfactory that if a | programmer makes a mistake in the definition of an 'Ix' instance, then | there are no guarantees about the behavior of their program at all... I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash. However, I don't see how to achieve this for array indexing, without adding another test to every array access. Simon

Simon Peyton-Jones wrote:
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)
Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type...
Cheers, Simon
Still, this is pretty bad, and raises questions about the safety of Haskell programs in general. It seems unsatisfactory that if a programmer makes a mistake in the definition of an 'Ix' instance, then there are no guarantees about the behavior of their program at all...
I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash.
Yes, I agree too! I'm just pointing out that the problem is already in the Haskell 98 specification, which we follow. If possible we should fix this in Haskell'.
However, I don't see how to achieve this for array indexing, without adding another test to every array access.
If (!) was changed to be a method of IArray, then for certain arrays we could use unsafeIndex instead of index, and check the index against the physical array size instead. Eg. (!) is currently defined as arr ! i = case bounds arr of (l,u) -> unsafeAt arr (index (l,u) i) This would be the default method for (!), but for some arrays we could replace it by arr ! i = case bounds arr of (l,u) -> safeAt (unsafeIndex (l,u) i) Where safeAt is implemented by looking at the physical array size. One slight problem is that the size stored in GHC's byte arrays is rounded up to the nearest word. Also we don't have a sizeOfArray# primitive, as Spencer pointed out in the original post in the thread. Similar changes would be required in the MArray class too: readArray/writeArray would need to become methods. Cheers, Simon

Hello,
another option (perhaps too draconian?) would be to disallow
user-defined instances for Ix, and to ensure that the predefined
instances satisfy the required laws. We can do this in one of the
following ways:
* Do not export 'Ix': not good because we cannot write type signatures
* Export 'Ix' but not its methods: ensures that programmers can define
only trivial (i.e. undefined) instances.
* Add a super class to 'Ix' (e.g., 'PrivateIx') which is not exported.
This would disallow user-defined instances because they can never
satisfy the 'PrivateIx' constraint.
Perhaps such restrictions are not in the spirit of Haskell but this is
a valid point in the design space that we might want to consider.
-Iavor
On 12/6/06, Simon Marlow
Simon Peyton-Jones wrote:
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.
If "laws not satisfied => any behaviour OK" were the correct interpretation, then it would be OK for the Array implementation to wipe all your files at the first encounter of a broken Ix law... ;)
Yup. That's not quite as bad as in C, where it's ok for an implementation to wipe all your files if you overflow the int type...
Cheers, Simon
Still, this is pretty bad, and raises questions about the safety of Haskell programs in general. It seems unsatisfactory that if a programmer makes a mistake in the definition of an 'Ix' instance, then there are no guarantees about the behavior of their program at all...
I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash.
Yes, I agree too! I'm just pointing out that the problem is already in the Haskell 98 specification, which we follow. If possible we should fix this in Haskell'.
However, I don't see how to achieve this for array indexing, without adding another test to every array access.
If (!) was changed to be a method of IArray, then for certain arrays we could use unsafeIndex instead of index, and check the index against the physical array size instead. Eg. (!) is currently defined as
arr ! i = case bounds arr of (l,u) -> unsafeAt arr (index (l,u) i)
This would be the default method for (!), but for some arrays we could replace it by
arr ! i = case bounds arr of (l,u) -> safeAt (unsafeIndex (l,u) i)
Where safeAt is implemented by looking at the physical array size. One slight problem is that the size stored in GHC's byte arrays is rounded up to the nearest word. Also we don't have a sizeOfArray# primitive, as Spencer pointed out in the original post in the thread. Similar changes would be required in the MArray class too: readArray/writeArray would need to become methods.
Cheers, Simon

On 12/6/06, Iavor Diatchki
Hello,
another option (perhaps too draconian?) would be to disallow user-defined instances for Ix, and to ensure that the predefined instances satisfy the required laws. We can do this in one of the following ways: * Do not export 'Ix': not good because we cannot write type signatures * Export 'Ix' but not its methods: ensures that programmers can define only trivial (i.e. undefined) instances. * Add a super class to 'Ix' (e.g., 'PrivateIx') which is not exported. This would disallow user-defined instances because they can never satisfy the 'PrivateIx' constraint.
We could put one of these things in e.g. Data.Ix.IAmNotAnIdiot, so that users can define instances of Ix if they need to, but not without being warned that this might be dangerous, and that it is usually possible to just use derived one. (Such warnings would appear in the documentation of this module.)

On Wed, Dec 06, 2006 at 07:50:30AM +0000, Simon Peyton-Jones wrote:
I rather agree with Iavor here. If a program makes no use of unsafeX functions, and has no foreign calls, and passes the typechecker, then it should not crash.
However, I don't see how to achieve this for array indexing, without adding another test to every array access.
If we allow inRange to return anything at all if the result of index is out-of-bounds, then the standard Ix instances lose one test for each access; e.g. the Int instance can return constant True for inRange because any invalid index would give an invalid result. Obviously, this makes inRange into a rather low level operation that should be renamed and wrapped (with a function that always gives the right answer).

On Tue, 05 Dec 2006, Simon Marlow
To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.
I guess that's a valid interpretation. As an aside, it is interesting to note that the Ix instance for Integer does not satisfy the first law (if a wraparound semantics is used for Int; this is allowed by the report). With l = 0, u = toInteger (maxBound :: Int) + 1 and i = u we have inRange (l,u) i = True but range (l,u) !! index (l,u) i = ⊥ ≠ i. Furthermore, more seriously, the third law isn't type correct, and a corrected version, map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u)], isn't satisfied by the Ix instance for Int, since (with b = (0, 0 :: Int)) map (index b) (range b) = [0] while [0..rangeSize b] = [0, 1]. I think the law should really be map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u) - 1]. This law is also prone to overflow errors, by the way. -- /NAD

On Wed, 06 Dec 2006, "Samuel Bronson"
But, will you ever encounter such overflows in using an array? I suppose you might if you tried to make one larger than the address space...
Are you assuming that Ints can always be used to address the entire address space? Ints are only guaranteed to have 30 bits. -- /NAD

Samuel Bronson wrote:
On 12/5/06, Nils Anders Danielsson
wrote: [Stuff about overflow and Ix laws]
But, will you ever encounter such overflows in using an array? I suppose you might if you tried to make one larger than the address space...
Problems in this area have been reported as GHC bugs several times, see for example: http://hackage.haskell.org/trac/ghc/ticket/229 Cheers, Simon

Nils Anders Danielsson wrote:
On Tue, 05 Dec 2006, Simon Marlow
wrote: To me, the wording "An implementation is entitled to assume..." implies that there are no obligations on the implementation should the assumption not hold - no obligation to yield _|_ or any other behaviour.
I guess that's a valid interpretation.
As an aside, it is interesting to note that the Ix instance for Integer does not satisfy the first law (if a wraparound semantics is used for Int; this is allowed by the report). With l = 0, u = toInteger (maxBound :: Int) + 1 and i = u we have inRange (l,u) i = True but range (l,u) !! index (l,u) i = ⊥ ≠ i.
Furthermore, more seriously, the third law isn't type correct, and a corrected version,
map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u)],
isn't satisfied by the Ix instance for Int, since (with b = (0, 0 :: Int)) map (index b) (range b) = [0] while [0..rangeSize b] = [0, 1]. I think the law should really be
map (index (l,u)) (range (l,u)) == [0..rangeSize (l,u) - 1].
This law is also prone to overflow errors, by the way.
Thanks Nils, I'll make sure these points get addressed during the Haskell' process. Cheers, Simon
participants (8)
-
Iavor Diatchki
-
Nils Anders Danielsson
-
Samuel Bronson
-
Simon Marlow
-
Simon Marlow
-
Simon Peyton-Jones
-
Spencer Janssen
-
Stefan O'Rear