Haskell Platform proposal: Add the vector package

newer
ANN: data-dword: Long binary words...

older
Stricter WriterT monad transformer

Johan Tibell

15 Jun 2012 15 Jun '12

9:45 p.m.

Hi, I am, with Roman's support, making a formal proposal to have the vector package included in the Haskell Platform: http://trac.haskell.org/haskell-platform/wiki/Proposals/vector See the wiki page for details, and a list of open issues for discussion. The vector package itself can be found on Hackage: http://hackage.haskell.org/package/vector I have set the deadline to 13 August 2012 (~2 months' time), with comments due by 13 July 2012 (~one month). There's plenty of time before the next platform release anyway (scheduled for 12 November 2012). -- Johan

Show replies by date

Henning Thielemann

17 Jun 17 Jun

11:14 a.m.

On Fri, 15 Jun 2012, Johan Tibell wrote:

...

I am, with Roman's support, making a formal proposal to have the vector package included in the Haskell Platform:

http://trac.haskell.org/haskell-platform/wiki/Proposals/vector

See the wiki page for details, and a list of open issues for discussion.

The vector package itself can be found on Hackage:

http://hackage.haskell.org/package/vector

I thought about migrating storablevector to vector, but it seems that the vector package needs some GHC-only extensions like type families. I do not plan to make the storablevector package obsolete, but I think it would be useful if both packages would use the same datatype. Is it possible to put the Storable part of 'vector' into a separate package? Would this one be more portable?

Johan Tibell

18 Jun 18 Jun

6:38 p.m.

On Sun, Jun 17, 2012 at 4:14 AM, Henning Thielemann wrote:

...

I thought about migrating storablevector to vector, but it seems that the vector package needs some GHC-only extensions like type families. I do not plan to make the storablevector package obsolete, but I think it would be useful if both packages would use the same datatype. Is it possible to put the Storable part of 'vector' into a separate package? Would this one be more portable?

I'll let Roman address this. I suggest we consider this a separately from whether we should add vector to the platform (perhaps we should start a new thread from that.) -- Johan

Roman Leshchinskiy

9:50 p.m.

On 17/06/2012, at 12:14, Henning Thielemann wrote:

...

On Fri, 15 Jun 2012, Johan Tibell wrote:

...
I am, with Roman's support, making a formal proposal to have the vector package included in the Haskell Platform:

http://trac.haskell.org/haskell-platform/wiki/Proposals/vector

See the wiki page for details, and a list of open issues for discussion.

The vector package itself can be found on Hackage:

http://hackage.haskell.org/package/vector

I thought about migrating storablevector to vector, but it seems that the vector package needs some GHC-only extensions like type families. I do not plan to make the storablevector package obsolete, but I think it would be useful if both packages would use the same datatype. Is it possible to put the Storable part of 'vector' into a separate package? Would this one be more portable?

There are type families, rank-n types, unboxed types and other goodies deep in the guts of vector so the Storable part is very much GHC-specific. To be honest, I don't think being portable is feasible for high-performance code at the moment, the language standard simply doesn't have enough tools for this. Which is a shame, really. FWIW, Storable vectors are fundamentally broken, anyway, since a Storable instance can perform arbitrary I/O in its methods but a pure vector based on Storable will necessarily have to unsafePerformIO these operations. Storable should *really* live in ST but it's too late for that now. Which reminds me, I should dig up and finish my ST-based Storable alternative one of these days and provide a safe vector type for interoperating with C. Roman

Henning Thielemann

10:16 p.m.

On Mon, 18 Jun 2012, Roman Leshchinskiy wrote:

...

There are type families, rank-n types, unboxed types and other goodies deep in the guts of vector so the Storable part is very much GHC-specific. To be honest, I don't think being portable is feasible for high-performance code at the moment, the language standard simply doesn't have enough tools for this. Which is a shame, really.

I am not mainly interested in the efficient implementation. I am completely ok with having the definition of (Vector a) in a separate package, such that it can be used by vector (GHC only) and storablevector (portable). However, I have just looked into Vector.Storable and it looks like data Vector a = Vector Int (ForeignPtr a) I thought it was data Vector a = Vector {len :: Int, allocated :: ForeignPtr a, start :: Ptr a} ByteString looks like: data ByteString = PS {allocated :: ForeignPtr Word8, start, length :: Int} Both forms allow efficient slicing. How do you perform efficient 'take' and 'drop' ?

...

FWIW, Storable vectors are fundamentally broken, anyway, since a Storable instance can perform arbitrary I/O in its methods but a pure vector based on Storable will necessarily have to unsafePerformIO these operations.

That's unfortunately true.

...

Storable should *really* live in ST but it's too late for that now.

How would this prevent from broken pointer arithmetic?

Bas van Dijk

10:53 p.m.

On Jun 19, 2012 12:16 AM, "Henning Thielemann" < lemming@henning-thielemann.de> wrote:

...

On Mon, 18 Jun 2012, Roman Leshchinskiy wrote:

...
There are type families, rank-n types, unboxed types and other goodies

deep in the guts of vector so the Storable part is very much GHC-specific. To be honest, I don't think being portable is feasible for high-performance code at the moment, the language standard simply doesn't have enough tools for this. Which is a shame, really.

...

I am not mainly interested in the efficient implementation. I am

completely ok with having the definition of (Vector a) in a separate package, such that it can be used by vector (GHC only) and storablevector (portable).

...

However, I have just looked into Vector.Storable and it looks like

data Vector a = Vector Int (ForeignPtr a)

I thought it was

data Vector a = Vector {len :: Int, allocated :: ForeignPtr a, start ::

Ptr a}

...

ByteString looks like:

data ByteString = PS {allocated :: ForeignPtr Word8, start, length ::

Int}

...

Both forms allow efficient slicing. How do you perform efficient 'take' and 'drop' ?

Slicing is done by directly updating the pointer in the ForeignPtr: {-# INLINE basicUnsafeSlice #-} basicUnsafeSlice i n (Vector _ fp) = Vector n (updPtr (`advancePtr` i) fp) {-# INLINE updPtr #-} updPtr :: (Ptr a -> Ptr a) -> ForeignPtr a -> ForeignPtr a updPtr f (ForeignPtr p c) = case f (Ptr p) of { Ptr q -> ForeignPtr q c } This saves an Int. Regards, Bas

Johan Tibell

11:03 p.m.

On Mon, Jun 18, 2012 at 3:53 PM, Bas van Dijk wrote:

...

Slicing is done by directly updating the pointer in the ForeignPtr:

{-# INLINE basicUnsafeSlice #-} basicUnsafeSlice i n (Vector _ fp) = Vector n (updPtr (`advancePtr` i) fp)

{-# INLINE updPtr #-} updPtr :: (Ptr a -> Ptr a) -> ForeignPtr a -> ForeignPtr a updPtr f (ForeignPtr p c) = case f (Ptr p) of { Ptr q -> ForeignPtr q c }

This saves an Int.

(This is off-topic as far as the proposal.) ByteString has an extra Ptr field so that accessing the data is fast, given that ForeignPtrs can't be unpacked. -- Johan

Roman Leshchinskiy

11:09 p.m.

On 19/06/2012, at 00:03, Johan Tibell wrote:

...

On Mon, Jun 18, 2012 at 3:53 PM, Bas van Dijk wrote:

...
Slicing is done by directly updating the pointer in the ForeignPtr:

{-# INLINE basicUnsafeSlice #-} basicUnsafeSlice i n (Vector _ fp) = Vector n (updPtr (`advancePtr` i) fp)

{-# INLINE updPtr #-} updPtr :: (Ptr a -> Ptr a) -> ForeignPtr a -> ForeignPtr a updPtr f (ForeignPtr p c) = case f (Ptr p) of { Ptr q -> ForeignPtr q c }

This saves an Int.

(This is off-topic as far as the proposal.)

ByteString has an extra Ptr field so that accessing the data is fast, given that ForeignPtrs can't be unpacked.

ForeignPtrs can be unpacked, just not manually (which is a GHC bug, IMO, I should report it): data Vector a = Vector {-# UNPACK #-} !Int {-# UNPACK #-} !(ForeignPtr a) ByteString just has some room for optimisation here ;-) Roman

Johan Tibell

11:13 p.m.

On Mon, Jun 18, 2012 at 4:09 PM, Roman Leshchinskiy wrote:

...

ForeignPtrs can be unpacked, just not manually (which is a GHC bug, IMO, I should report it):

data Vector a = Vector {-# UNPACK #-} !Int {-# UNPACK #-} !(ForeignPtr a)

ByteString just has some room for optimisation here ;-)

I probably just remembered the ByteString optimization incorrectly. My apologies. I think the Addr# field was moved to the top of ForeignPtr (not to the PS constructor itself) to support unpacking. -- Johan

Simon Peyton-Jones

19 Jun 19 Jun

6:22 a.m.

| ForeignPtrs can be unpacked, just not manually (which is a GHC bug, IMO, | I should report it): Yes, do!

Roman Leshchinskiy

18 Jun 18 Jun

11:04 p.m.

On 18/06/2012, at 23:16, Henning Thielemann wrote:

...

On Mon, 18 Jun 2012, Roman Leshchinskiy wrote:

...
There are type families, rank-n types, unboxed types and other goodies deep in the guts of vector so the Storable part is very much GHC-specific. To be honest, I don't think being portable is feasible for high-performance code at the moment, the language standard simply doesn't have enough tools for this. Which is a shame, really.

I am not mainly interested in the efficient implementation. I am completely ok with having the definition of (Vector a) in a separate package, such that it can be used by vector (GHC only) and storablevector (portable).

By Vector a you mean just the data type, not the type classes, right? What would the package contain apart from the type definition?

...

However, I have just looked into Vector.Storable and it looks like

data Vector a = Vector Int (ForeignPtr a)

I thought it was

data Vector a = Vector {len :: Int, allocated :: ForeignPtr a, start :: Ptr a}

The ForeignPtr already stores an Addr#: data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents I just manipulate it directly which saves a pointer per vector. This might not seem like a big deal but sometimes this pointer will be threaded through a loop clobbering registers which *is* a big deal. Simon Marlow says that there is no requirement that the Addr# in the ForeignPtr must point to the start of the block. I would make this more explicit by manually unpacking the ForeignPtr but alas, ForeignPtrContents (the actual type name) isn't exported from GHC.ForeignPtr so I can't. Incidentally, this is another portability issue.

...

...
FWIW, Storable vectors are fundamentally broken, anyway, since a Storable instance can perform arbitrary I/O in its methods but a pure vector based on Storable will necessarily have to unsafePerformIO these operations.

That's unfortunately true.

...
Storable should *really* live in ST but it's too late for that now.

How would this prevent from broken pointer arithmetic?

It wouldn't but it would rule out this: data T = T class Storable T where peek p = do print "Peeking" spam ghcBugTracker return T Which, unfortunately, is a perfectly valid implementation of peek. Roman

Henning Thielemann

19 Jun 19 Jun

8:24 a.m.

On Tue, 19 Jun 2012, Roman Leshchinskiy wrote:

...

On 18/06/2012, at 23:16, Henning Thielemann wrote:

...
On Mon, 18 Jun 2012, Roman Leshchinskiy wrote:

...
There are type families, rank-n types, unboxed types and other goodies deep in the guts of vector so the Storable part is very much GHC-specific. To be honest, I don't think being portable is feasible for high-performance code at the moment, the language standard simply doesn't have enough tools for this. Which is a shame, really.

I am not mainly interested in the efficient implementation. I am completely ok with having the definition of (Vector a) in a separate package, such that it can be used by vector (GHC only) and storablevector (portable).

...

By Vector a you mean just the data type, not the type classes, right?

yes

...

What would the package contain apart from the type definition?

If the implementation of Vector functions requires GHC extensions then the pure Vector data type definition would be the only definition. However, if Vector is defined as it is and these direct manipulations of ForeignPtr are not portable, then there is even no benefit in putting the Vector definition in a separate package. We should then leave the 'vector' and the 'storablevector' packages as they are and have to convert explicitly between these types.

...

...
...
Storable should *really* live in ST but it's too late for that now.

How would this prevent from broken pointer arithmetic?

It wouldn't but it would rule out this:

data T = T

class Storable T where peek p = do print "Peeking" spam ghcBugTracker return T

I see. But how would I define an ST-Storable instance for a new type, say LLVM.Vector (the CPU vector type in 'llvm') without unsafeIOtoST? I could still lift the 'spam' command into ST. However, ST might make people think more thoroughly whether the lifted operations are appropriate for ST.

Bas van Dijk

18 Jun 18 Jun

4:54 p.m.

On 15 June 2012 23:45, Johan Tibell wrote:

...

Hi,

I am, with Roman's support, making a formal proposal to have the vector package included in the Haskell Platform:

http://trac.haskell.org/haskell-platform/wiki/Proposals/vector

See the wiki page for details, and a list of open issues for discussion.

The vector package itself can be found on Hackage:

http://hackage.haskell.org/package/vector

I have set the deadline to 13 August 2012 (~2 months' time), with comments due by 13 July 2012 (~one month). There's plenty of time before the next platform release anyway (scheduled for 12 November 2012).

+1 I like the idea of the vector-safe package. Are you also proposing to add this package to the HP? (I would also be +1 on that) I see that the trustworthiness of the .Safe modules is conditional on whether bound checking is enabled in vector: #if defined(VECTOR_BOUNDS_CHECKS) {-# LANGUAGE Trustworthy #-} #endif The VECTOR_BOUNDS_CHECKS pragma would not be directly available in vector-safe. But I guess, by using the install-includes cabal field, vector can export a header file that exports this symbol when bound checking is enabled.

Johan Tibell

6:39 p.m.

On Mon, Jun 18, 2012 at 9:54 AM, Bas van Dijk wrote:

...

I like the idea of the vector-safe package. Are you also proposing to add this package to the HP? (I would also be +1 on that)

I think it makes sense as a separate package, but I don't think it makes sense to add to the HP. SafeHaskell isn't used enough to warrant that.

...

I see that the trustworthiness of the .Safe modules is conditional on whether bound checking is enabled in vector:

#if defined(VECTOR_BOUNDS_CHECKS) {-# LANGUAGE Trustworthy #-} #endif

The VECTOR_BOUNDS_CHECKS pragma would not be directly available in vector-safe. But I guess, by using the install-includes cabal field, vector can export a header file that exports this symbol when bound checking is enabled.

That sounds like a reasonable solution. -- Johan

Ryan Newton

9:24 p.m.

+1 This is one of those packages that I periodically forget is not in HP. On Mon, Jun 18, 2012 at 2:39 PM, Johan Tibell wrote:

...

On Mon, Jun 18, 2012 at 9:54 AM, Bas van Dijk wrote:

...
I like the idea of the vector-safe package. Are you also proposing to add this package to the HP? (I would also be +1 on that)

I think it makes sense as a separate package, but I don't think it makes sense to add to the HP. SafeHaskell isn't used enough to warrant that.

...
I see that the trustworthiness of the .Safe modules is conditional on whether bound checking is enabled in vector:

#if defined(VECTOR_BOUNDS_CHECKS) {-# LANGUAGE Trustworthy #-} #endif

The VECTOR_BOUNDS_CHECKS pragma would not be directly available in vector-safe. But I guess, by using the install-includes cabal field, vector can export a header file that exports this symbol when bound checking is enabled.

That sounds like a reasonable solution.

-- Johan

_______________________________________________ Haskell-platform mailing list Haskell-platform@projects.haskell.org http://projects.haskell.org/cgi-bin/mailman/listinfo/haskell-platform

Roman Leshchinskiy

10:06 p.m.

On 18/06/2012, at 19:39, Johan Tibell wrote:

...

On Mon, Jun 18, 2012 at 9:54 AM, Bas van Dijk wrote:

...
I like the idea of the vector-safe package. Are you also proposing to add this package to the HP? (I would also be +1 on that)

I think it makes sense as a separate package, but I don't think it makes sense to add to the HP. SafeHaskell isn't used enough to warrant that.

I fully agree with Johan and I wouldn't even really want to maintain this separate package. It is a lot of work for something I don't use and don't entirely understand. The *.Safe modules in vector are currently bitrotted since I forget to update them when I add new operations and I'm not really sure what is and isn't "safe" anyway. Is anybody interested in this code at all?

...

...
I see that the trustworthiness of the .Safe modules is conditional on whether bound checking is enabled in vector:

#if defined(VECTOR_BOUNDS_CHECKS) {-# LANGUAGE Trustworthy #-} #endif

The VECTOR_BOUNDS_CHECKS pragma would not be directly available in vector-safe. But I guess, by using the install-includes cabal field, vector can export a header file that exports this symbol when bound checking is enabled.

That sounds like a reasonable solution.

VECTOR_BOUNDS_CHECKS is defined in vector.cabal as follows: if flag(BoundsChecks) cpp-options: -DVECTOR_BOUNDS_CHECKS Doesn't Cabal provide access to the flags that a package has been compiled with? It seems like it should and depending on the BoundsChecks flag would be cleaner than depending on a CPP symbol. Roman

Henning Thielemann

10:20 p.m.

On Mon, 18 Jun 2012, Roman Leshchinskiy wrote:

...

VECTOR_BOUNDS_CHECKS is defined in vector.cabal as follows:

if flag(BoundsChecks) cpp-options: -DVECTOR_BOUNDS_CHECKS

Doesn't Cabal provide access to the flags that a package has been compiled with?

Do you mean this way: http://hackage.haskell.org/trac/hackage/ticket/836 ?

Roman Leshchinskiy

11:06 p.m.

On 18/06/2012, at 23:20, Henning Thielemann wrote:

...

On Mon, 18 Jun 2012, Roman Leshchinskiy wrote:

...
VECTOR_BOUNDS_CHECKS is defined in vector.cabal as follows:

if flag(BoundsChecks) cpp-options: -DVECTOR_BOUNDS_CHECKS

Doesn't Cabal provide access to the flags that a package has been compiled with?

Do you mean this way: http://hackage.haskell.org/trac/hackage/ticket/836 ?

Indeed. I'm surprised Cabal doesn't provide this. But now I'm confused. How do I export a CPP symbol that depends on a flag to other packages? Do I have to generate a header file during configuration for that? Roman

Simon Marlow

4 Jul 4 Jul

10:59 a.m.

On 18/06/2012 23:06, Roman Leshchinskiy wrote:

...

On 18/06/2012, at 19:39, Johan Tibell wrote:

...
On Mon, Jun 18, 2012 at 9:54 AM, Bas van Dijk wrote:

...
I like the idea of the vector-safe package. Are you also proposing to add this package to the HP? (I would also be +1 on that)

I think it makes sense as a separate package, but I don't think it makes sense to add to the HP. SafeHaskell isn't used enough to warrant that.

I fully agree with Johan and I wouldn't even really want to maintain this separate package. It is a lot of work for something I don't use and don't entirely understand. The *.Safe modules in vector are currently bitrotted since I forget to update them when I add new operations and I'm not really sure what is and isn't "safe" anyway. Is anybody interested in this code at all?

I respectfully disagree with this approach, I think it's heading in the wrong direction. We should be moving towards safe APIs by default, and separating out unsafe APIs into separate modules. That is what SafeHaskell is about: it's not an obscure feature that is only used by things like "Try Haskell", the boundary between safety and unsafety is something we should all be thinking about. In that sense, we are all users of SafeHaskell. We should think of it as "good style" and best practice to separate safe APIs from unsafe ones. I would argue against adding any unsafe APIs to the Haskell Platform that aren't in a .Unsafe module. (to what extent that applies to vector I don't know, so it may be that I'm causing trouble for the proposal here). Cheers, Simon

...

...
...
I see that the trustworthiness of the .Safe modules is conditional on whether bound checking is enabled in vector:

#if defined(VECTOR_BOUNDS_CHECKS) {-# LANGUAGE Trustworthy #-} #endif

The VECTOR_BOUNDS_CHECKS pragma would not be directly available in vector-safe. But I guess, by using the install-includes cabal field, vector can export a header file that exports this symbol when bound checking is enabled.

That sounds like a reasonable solution.

VECTOR_BOUNDS_CHECKS is defined in vector.cabal as follows:

if flag(BoundsChecks) cpp-options: -DVECTOR_BOUNDS_CHECKS

Doesn't Cabal provide access to the flags that a package has been compiled with? It seems like it should and depending on the BoundsChecks flag would be cleaner than depending on a CPP symbol.

Roman

_______________________________________________ Haskell-platform mailing list Haskell-platform@projects.haskell.org http://projects.haskell.org/cgi-bin/mailman/listinfo/haskell-platform

Roman Leshchinskiy

3:33 p.m.

Simon Marlow wrote:

...

On 18/06/2012 23:06, Roman Leshchinskiy wrote:

...
On 18/06/2012, at 19:39, Johan Tibell wrote:

...
On Mon, Jun 18, 2012 at 9:54 AM, Bas van Dijk wrote:

...
I like the idea of the vector-safe package. Are you also proposing to add this package to the HP? (I would also be +1 on that)

I think it makes sense as a separate package, but I don't think it makes sense to add to the HP. SafeHaskell isn't used enough to warrant that.

I fully agree with Johan and I wouldn't even really want to maintain this separate package. It is a lot of work for something I don't use and don't entirely understand. The *.Safe modules in vector are currently bitrotted since I forget to update them when I add new operations and I'm not really sure what is and isn't "safe" anyway. Is anybody interested in this code at all?

I respectfully disagree with this approach, I think it's heading in the wrong direction.

We should be moving towards safe APIs by default, and separating out unsafe APIs into separate modules.

I completely agree with separating out unsafe APIs but I don't understand why modules are the right granularity for this, especially given Haskell's rather rudimentary module system. As I said, the module-based approach results in a significant maintainance burden for vector.

...

That is what SafeHaskell is about: it's not an obscure feature that is only used by things like "Try Haskell", the boundary between safety and unsafety is something we should all be thinking about. In that sense, we are all users of SafeHaskell. We should think of it as "good style" and best practice to separate safe APIs from unsafe ones.

At the risk of being blunt, I do find SafeHaskell's notion of safety somewhat obscure. In vector, all unsafe functions have the string "unsafe" in their name. Here are two examples of functions that don't do bounds checking: unsafeIndex :: Vector a -> Int -> a unsafeRead :: IOVector a -> Int -> IO a Unless I'm mistaken, SafeHaskell considers the first one unsafe and the second one safe. Personally, I find vector's current notion of safety much more useful and wouldn't want to weaken it.

...

I would argue against adding any unsafe APIs to the Haskell Platform that aren't in a .Unsafe module. (to what extent that applies to vector I don't know, so it may be that I'm causing trouble for the proposal here).

To avoid confusion, let's first agree on what an "unsafe API" is. For vector, "unsafe" basically means no bounds checking and my understanding is that this is quite different from SafeHaskell's notion of safety. As I said, such functions have the string "unsafe" in their name. Additionally, Data.Vector.Storable is entirely unsafe even in the SafeHaskell sense (as in, it unsafePerformIOs essentially arbitrary code) due to the design of the Storable class - there are no safe bits there at all. It still uses "unsafe" to distinguish between functions that do bounds checking and those that don't. What would be the benefit of moving functions like unsafeIndex into a separate module (and would it be called Unsafe.unsafeIndex then? or would it be Unsafe.index?)? Would you advocate renaming Data.Vector.Storable to Data.Vector.Storable.Unsafe? Also, you seem to be arguing for both using SafeHaskell and having a special naming convention for modules with unsafe stuff. Wouldn't one of those be sufficient? Roman

Simon Marlow

5 Jul 5 Jul

9:42 a.m.

On 04/07/2012 16:33, Roman Leshchinskiy wrote:

...

Simon Marlow wrote:

...
We should be moving towards safe APIs by default, and separating out unsafe APIs into separate modules.

I completely agree with separating out unsafe APIs but I don't understand why modules are the right granularity for this, especially given Haskell's rather rudimentary module system. As I said, the module-based approach results in a significant maintainance burden for vector.

The choice to use the module boundary was made for pragmatic reasons - it reduces complexity in the implementation, but also it makes things much simpler from the programmer's point of view. The programmer has a clear idea where the boundary lies: in a Safe module, they can only import other Safe/Trustworthy modules. The Safe subset is a collection of modules, not some slice of the contents of all modules. The Haddock docs for a module only have to say in one place whether the module is considered safe or not. This is certainly a debatable part of the design, and we went back and forth on it once or twice already. Conceivably it could change in the future. But I don't think this is the right place to discuss the design of SafeHaskell, and at least in our experience the current design seems to work quite well. Could you say something more about the maintenance burden? I imagined that you would just separate the unsafe (in the SafeHaskell sense) operations into separate modules.

...

...
That is what SafeHaskell is about: it's not an obscure feature that is only used by things like "Try Haskell", the boundary between safety and unsafety is something we should all be thinking about. In that sense, we are all users of SafeHaskell. We should think of it as "good style" and best practice to separate safe APIs from unsafe ones.

At the risk of being blunt, I do find SafeHaskell's notion of safety somewhat obscure. In vector, all unsafe functions have the string "unsafe" in their name. Here are two examples of functions that don't do bounds checking:

unsafeIndex :: Vector a -> Int -> a unsafeRead :: IOVector a -> Int -> IO a

Unless I'm mistaken, SafeHaskell considers the first one unsafe and the second one safe. Personally, I find vector's current notion of safety much more useful and wouldn't want to weaken it.

SafeHaskell's notion of safety is very clear: it is essentially just type safety and referential transparency. It would be impossible to have a clear notion of safety that considers some IO operations unsafe and others safe: e.g. do you consider reading a file to be unsafe? Some applications would, and others wouldn't. Sticking strictly to clearly-defined properties like type safety (and a couple of other things, including module abstraction) as the definition of safety is the only sensible thing you can do. But this is beside the point. Since unsafeRead is considered safe by SafeHaskell, you have the option of either putting it in the safe API or the unsafe API; it's up to you.

...

...
I would argue against adding any unsafe APIs to the Haskell Platform that aren't in a .Unsafe module. (to what extent that applies to vector I don't know, so it may be that I'm causing trouble for the proposal here).

To avoid confusion, let's first agree on what an "unsafe API" is. For vector, "unsafe" basically means no bounds checking and my understanding is that this is quite different from SafeHaskell's notion of safety. As I said, such functions have the string "unsafe" in their name. Additionally, Data.Vector.Storable is entirely unsafe even in the SafeHaskell sense (as in, it unsafePerformIOs essentially arbitrary code) due to the design of the Storable class - there are no safe bits there at all. It still uses "unsafe" to distinguish between functions that do bounds checking and those that don't. What would be the benefit of moving functions like unsafeIndex into a separate module (and would it be called Unsafe.unsafeIndex then? or would it be Unsafe.index?)? Would you advocate renaming Data.Vector.Storable to Data.Vector.Storable.Unsafe?

Since it's the entire module in this case, I think it would be fine to just remark in the documentation for the module that the API is unsafe, and briefly explain why.

...

Also, you seem to be arguing for both using SafeHaskell and having a special naming convention for modules with unsafe stuff. Wouldn't one of those be sufficient?

Ok, let me relax that a little. I don't care nearly as much about the .Unsafe naming convention as I do about separating the unsafe parts of the API from the safe parts. When we have a mostly-type-safe API like vector, it is a shame if we can't have the compiler easily check that clients are using only the safe subset. Cheers, Simon

Henning Thielemann

11:05 a.m.

On Thu, 5 Jul 2012, Simon Marlow wrote:

...

The choice to use the module boundary was made for pragmatic reasons - it reduces complexity in the implementation, but also it makes things much simpler from the programmer's point of view. The programmer has a clear idea where the boundary lies: in a Safe module, they can only import other Safe/Trustworthy modules. The Safe subset is a collection of modules, not some slice of the contents of all modules. The Haddock docs for a module only have to say in one place whether the module is considered safe or not.

I found it quite natural to have the safety property per module. Maybe I am too much used to Modula-3.

Roman Leshchinskiy

1:20 p.m.

Simon Marlow wrote:

...

On 04/07/2012 16:33, Roman Leshchinskiy wrote:

...
Simon Marlow wrote:

...
We should be moving towards safe APIs by default, and separating out unsafe APIs into separate modules.

I completely agree with separating out unsafe APIs but I don't understand why modules are the right granularity for this, especially given Haskell's rather rudimentary module system. As I said, the module-based approach results in a significant maintainance burden for vector.

The choice to use the module boundary was made for pragmatic reasons - it reduces complexity in the implementation, but also it makes things much simpler from the programmer's point of view. The programmer has a clear idea where the boundary lies: in a Safe module, they can only import other Safe/Trustworthy modules. The Safe subset is a collection of modules, not some slice of the contents of all modules. The Haddock docs for a module only have to say in one place whether the module is considered safe or not.

This is certainly a debatable part of the design, and we went back and forth on it once or twice already. Conceivably it could change in the future. But I don't think this is the right place to discuss the design of SafeHaskell, and at least in our experience the current design seems to work quite well.

I think we're misunderstanding each other slightly here. You seem to be using "separating out unsafe APIs" and SafeHaskell as synonyms whereas I'm only talking about how to do the former in the vector package, not necessarily using SafeHaskell. So to clarify my position: I'm all for distinguishing between safe and unsafe APIs and vector already mostly does that. But I don't want to support SafeHaskell in vector because SafeHaskell's notion of safety doesn't coincide with the one prevalent in vector and because SafeHaskell imposes requirements on the module structure which I consider too heavy-weight for what would only be an additional and, in this particular library, less useful guarantee.

...

Could you say something more about the maintenance burden? I imagined that you would just separate the unsafe (in the SafeHaskell sense) operations into separate modules.

...

From the maintainance point of view, this would become easier if I had *.Unsafe modules rather than the *.Safe ones. But this is a signficant restructuring and the only reason to do it would be to support SafeHaskell. Moreover, I believe (though I haven't checked) that there are calls from safe to unsafe functions and vice versa. So now I would have to have a common base module with both safe and unsafe functions and reexport

At the moment vector has *.Safe modules which reexport the SafeHaskell-safe functions from other modules. This means that whenever I add new functions, I have to remember to reexport them from the *.Safe modules. Adding a new operation to vector already requires touching 4 modules; having to update the *.Safe modules as well is impractical. Which is why I'd like to drop them. those from the right top-level module. No, this just isn't feasible.

...

...
At the risk of being blunt, I do find SafeHaskell's notion of safety somewhat obscure. In vector, all unsafe functions have the string "unsafe" in their name. Here are two examples of functions that don't do bounds checking:

unsafeIndex :: Vector a -> Int -> a unsafeRead :: IOVector a -> Int -> IO a

Unless I'm mistaken, SafeHaskell considers the first one unsafe and the second one safe. Personally, I find vector's current notion of safety much more useful and wouldn't want to weaken it.

SafeHaskell's notion of safety is very clear: it is essentially just type safety and referential transparency. It would be impossible to have a clear notion of safety that considers some IO operations unsafe and others safe: e.g. do you consider reading a file to be unsafe? Some applications would, and others wouldn't.

IO is certainly problematic. However, it is quite possible to have a clear notion (or, rather, notions) of safety for IO in a particular problem domain, such as arrays. For vector, "natural" safety includes bounds checking.

...

Sticking strictly to clearly-defined properties like type safety (and a couple of other things, including module abstraction) as the definition of safety is the only sensible thing you can do.

As I said, I would like to be able to have multiple notions of safety. What SafeHaskell provides is essentially the lowest common denominator. I agree that it is useful to have but only in addition to other, tigher and perhaps more domain-specific concepts which I consider more useful. But the module-based approach requires me to structure the library around SafeHaskell, essentially making it the "main" concept of safety in vector and that's not a design I would be comfortable with.

...

But this is beside the point. Since unsafeRead is considered safe by SafeHaskell, you have the option of either putting it in the safe API or the unsafe API; it's up to you.

But wouldn't putting it in the unsafe API essentially be "abusing" SafeHaskell to express a notion of safety different from type safety + referential transparency? For me, the only sensible structure would be putting unsafeIndex in the *.Unsafe module and unsafeRead in the safe one. I strongly dislike this.

...

...
Additionally, Data.Vector.Storable is entirely unsafe even in the SafeHaskell sense (as in, it unsafePerformIOs essentially arbitrary code) due to the design of the Storable class - there are no safe bits there at all. It still uses "unsafe" to distinguish between functions that do bounds checking and those that don't. What would be the benefit of moving functions like unsafeIndex into a separate module (and would it be called Unsafe.unsafeIndex then? or would it be Unsafe.index?)? Would you advocate renaming Data.Vector.Storable to Data.Vector.Storable.Unsafe?

Since it's the entire module in this case, I think it would be fine to just remark in the documentation for the module that the API is unsafe, and briefly explain why.

Well, it's not that simple. Data.Vector.Storable exports exactly the same interface as Data.Vector.Unboxed and Data.Vector (modulo operations it doesn't support). Now, if I move some functions from Data.Vector.Unboxed to Data.Vector.Unboxed.Unsafe, I would also have to move the corresponding functions from Data.Vector.Storable to Data.Vector.Storable.Unsafe even though neither of these last two modules would be safe. This just seems wrong to me. Roman

Henning Thielemann

2:28 p.m.

On Thu, 5 Jul 2012, Roman Leshchinskiy wrote:

...

From the maintainance point of view, this would become easier if I had *.Unsafe modules rather than the *.Safe ones. But this is a signficant restructuring and the only reason to do it would be to support SafeHaskell.

I would prefer to call unsafe modules Unsafe to calling safe modules Safe. Safe functionality should be the default. I also see a divergence of usages of the term "safe". It is sometimes used where "total (function)" is meant. I prefer the meaning of "safe" in the sense of SafeHaskell and unsafePerformIO.

Thomas Schilling

2:58 p.m.

On 5 July 2012 15:28, Henning Thielemann wrote: fe functionality should be the default.

...

I also see a divergence of usages of the term "safe". It is sometimes used where "total (function)" is meant. I prefer the meaning of "safe" in the sense of SafeHaskell and unsafePerformIO.

Where would you classify functions that don't perform bounds checking? They can be used to read (or write!) arbitrary heap locations. -- Push the envelope. Watch it bend.

Gábor Lehel

3:34 p.m.

On Thu, Jul 5, 2012 at 4:58 PM, Thomas Schilling wrote:

...

On 5 July 2012 15:28, Henning Thielemann wrote: fe functionality should be the default.

...
I also see a divergence of usages of the term "safe". It is sometimes used where "total (function)" is meant. I prefer the meaning of "safe" in the sense of SafeHaskell and unsafePerformIO.

Where would you classify functions that don't perform bounds checking? They can be used to read (or write!) arbitrary heap locations.

Right. Even if the IO monad can theoretically allow anything, I feel like there's a qualitative difference between putStrLn and readIORef/writeIORef on the one hand, and raw pointer arithmetic (and unchecked array access which is equivalent) on the other, and I wouldn't mind putting the latter in the Unsafe module together with the functions which break referential transparency. Even in C some things are bad times, and you shouldn't be allowed to do those things in Haskell without having the word "unsafe" somewhere. Perhaps the distinction I'm looking for is that IO should be allowed to break referential transparency, but not memory safety? There's a number of languages which have no concept of referential transparency, but still don't allow raw pointering (and are quite proud of it). WRT to Data.Vector.Storable and the Storable class, I wonder if this isn't analogous to Data.Typeable. Data.Typeable is safe (Trustworthy), but it depends on Typeable instances being correct, otherwise it can do any number of bad things. Can we formulate rules which Storable instances are required to obey, which if they do they are safe? Then we put the methods of Storable in a separate module which is Unsafe, which means that any modules declaring instances must also be Trustworthy (if not Unsafe). Data.Vector.Storable can then also be Trustworthy, and the property is preserved that if all Trustworthy things the user trusts are actually safe (which in this case encompasses following the Storable rules), then unsafe things won't happen. See also this thread: https://groups.google.com/d/topic/haskell-cafe/_Qy2Z65H_xA/discussion -- Your ship was caught in a monadic eruption.

Simon Marlow

3:21 p.m.

On 05/07/2012 14:20, Roman Leshchinskiy wrote:

...

From the maintainance point of view, this would become easier if I had *.Unsafe modules rather than the *.Safe ones. But this is a signficant restructuring and the only reason to do it would be to support SafeHaskell. Moreover, I believe (though I haven't checked) that there are calls from safe to unsafe functions and vice versa. So now I would have to have a common base module with both safe and unsafe functions and reexport those from the right top-level module. No, this just isn't feasible.

It looks pretty straightforward to me. For each M: - rename M to M.Internal (or suitable alternative) - rename M.Safe to M - add a (small) M.Unsafe where necessary Of course, I can't force you to make this change, I can only say that I think it would be worthwhile, and now is a good time to make the change - it will be more difficult to do it later when the package is already in the platform. I understand your objections, but I don't think any of them is a showstopper. Standing back for a minute, one argument against doing this was that "SafeHaskell isn't widely used". In order for Safe Haskell to be widely used, we have to make safe APIs available - if we don't, then clients of the library cannot use {-# LANGUAGE Safe #-}, and the chain is broken. Arguably we should be moving towards Safe being something that we routinely put at the top of our modules, and my motivation is just to nudge us in that direction. HOWEVER, let me be clear for the purposes of this discussion about adding vector to the platform: given the choice between vector as it is (or without the .Safe modules) and no vector at all, I'll take vector every time. It's a great package and we should have it in the platform. Cheers, Simon

Gábor Lehel

15 Jul 15 Jul

11:41 p.m.

On Wed, Jul 4, 2012 at 5:33 PM, Roman Leshchinskiy wrote:

...

[...] It still uses "unsafe" to distinguish between functions that do bounds checking and those that don't. What would be the benefit of moving functions like unsafeIndex into a separate module (and would it be called Unsafe.unsafeIndex then? or would it be Unsafe.index?)? [...]

Just to pick out this small tidbit, FWIW the Vector.index and Vector.Unsafe.index scheme has the advantage that you could simply swap out a module import to switch between bounds-checked and unchecked implementations. I agree with vector going into the platform, but have no opinion on whether the Safe Haskell issues need to be resolved first. -- Your ship was caught in a monadic eruption.

Roman Leshchinskiy

16 Jul 16 Jul

10:05 a.m.

Gábor Lehel wrote:

...

On Wed, Jul 4, 2012 at 5:33 PM, Roman Leshchinskiy wrote:

...
[...] It still uses "unsafe" to distinguish between functions that do bounds checking and those that don't. What would be the benefit of moving functions like unsafeIndex into a separate module (and would it be called Unsafe.unsafeIndex then? or would it be Unsafe.index?)? [...]

Just to pick out this small tidbit, FWIW the Vector.index and Vector.Unsafe.index scheme has the advantage that you could simply swap out a module import to switch between bounds-checked and unchecked implementations.

Alas, that wouldn't really work because you would still have to import the safe module for functions like map. I've wanted to add *.Unchecked modules which would export exactly the same interface as the normal ones but without bounds checking. However, it just never seemed to be worth the extra work. Roman

Henning Thielemann

20 Jul 20 Jul

7:11 p.m.

Am 16.07.2012 01:41, schrieb Gábor Lehel:

...

On Wed, Jul 4, 2012 at 5:33 PM, Roman Leshchinskiy wrote:

...
[...] It still uses "unsafe" to distinguish between functions that do bounds checking and those that don't. What would be the benefit of moving functions like unsafeIndex into a separate module (and would it be called Unsafe.unsafeIndex then? or would it be Unsafe.index?)? [...]

Just to pick out this small tidbit, FWIW the Vector.index and Vector.Unsafe.index scheme has the advantage that you could simply swap out a module import to switch between bounds-checked and unchecked implementations.

Since I use qualified imports wherever possible, I would strongly prefer this.

Johan Tibell

10 Jul 10 Jul

8:58 p.m.

Hi Simon, Sorry for the late reply, I was on vacation. Let me preface my response below by saying that I think SafeHaskell (SH) is an interesting and worthwhile research project and this isn't meant as a diss of SH as whole. Also, my arguments below looks at SH from the perspective of the average Haskell user i.e. someone who's not trying to run untrusted code ala "Try Haskell" or Google AppEngine. On Wed, Jul 4, 2012 at 3:59 AM, Simon Marlow wrote:

...

I respectfully disagree with this approach, I think it's heading in the wrong direction.

We should be moving towards safe APIs by default, and separating out unsafe APIs into separate modules. That is what SafeHaskell is about: it's not an obscure feature that is only used by things like "Try Haskell", the boundary between safety and unsafety is something we should all be thinking about. In that sense, we are all users of SafeHaskell. We should think of it as "good style" and best practice to separate safe APIs from unsafe ones.

I would argue against adding any unsafe APIs to the Haskell Platform that aren't in a .Unsafe module. (to what extent that applies to vector I don't know, so it may be that I'm causing trouble for the proposal here).

It's hard to argue against "moving towards safe APIs by default." What I'm going to argue is that SH's extension to the type system (i.e. its definition of safe and unsafe) doesn't exclude many bad programs that aren't already excluded by the type system and excludes many good ones. This is a traditional type system power vs cost argument in other words. The bad programs rules out by SH are those that have bugs in code rejected by SH (e.g. unsafePerformIO, FFI.) I can't even come up with an example of such a program. Does anyone have an example of a program that's accepted by the current type system but rejected by SH? Another way to ask the same question: in adding all those extra Trustworthy, Save, and Unsafe language pragmas to e.g. base, were any bugs found? The good programs that are rules out by SH is any program that uses a binding to a C library (all FFI imports are unsafe), any program that uses unsafePerformIO or other unsafe functions in its implementation. The latter group includes most of our widely used libraries, including bytestring, vector, network (almost all functions are bindings to C), binary, base, etc. For example, text can't be used as it's UTF-8 decoder is written in C for speed. We'd have to maintain a separate implementation of the UTF-8 decoder for use by SH users. If you want to write real code and use SH, you need to either avoid all these libraries or to trust (as in ghc-pkg trust) all these libraries. Despite none of them having had a thorough security review. In practice Trustworthy language pragmas on top of modules to make code compile. There are 139 modules in base marked as Trustworthy. I seriously doubt that they all are. Furthermore, you will have to trust that the maintainers of those libraries, who often don't care about SH, keeps the code in Trusted modules secure or else whatever benefit you gained from SH is lost (but you keep the costs.) You will have to review the implementation of any new library that you want to unless you only use libraries that are completely safe. SH also prevents optimizations, like rewriting a particularly hot function in C, as it would change its type. SH is much more invasive than your typical language extension. It requires maintainers of libraries who don't care about SH to change their APIs (!) just to support a yet unproven language extension. The reason is that SH is an all or nothing approach. Either all your dependencies are SH aware or your code won't compile. This is the opposite of how we handle any other language extensions, where people can slowly try out new language features in their own code and only have people who chose to use that code also have to use that feature e.g. if I use type families in a library that depends on containers, that doesn't imply that the containers package have to use type families. This is however true for SH. If we start using TH in our core libraries we risk making those libraries more complicated for users to understand and use, without much perceivable benefit. Cheers, Johan

Bas van Dijk

11 Jul 11 Jul

5:54 p.m.

On 10 July 2012 22:58, Johan Tibell wrote:

...

Hi Simon,

Sorry for the late reply, I was on vacation.

Let me preface my response below by saying that I think SafeHaskell (SH) is an interesting and worthwhile research project and this isn't meant as a diss of SH as whole. Also, my arguments below looks at SH from the perspective of the average Haskell user i.e. someone who's not trying to run untrusted code ala "Try Haskell" or Google AppEngine.

On Wed, Jul 4, 2012 at 3:59 AM, Simon Marlow wrote:

...
I respectfully disagree with this approach, I think it's heading in the wrong direction.

We should be moving towards safe APIs by default, and separating out unsafe APIs into separate modules. That is what SafeHaskell is about: it's not an obscure feature that is only used by things like "Try Haskell", the boundary between safety and unsafety is something we should all be thinking about. In that sense, we are all users of SafeHaskell. We should think of it as "good style" and best practice to separate safe APIs from unsafe ones.

I would argue against adding any unsafe APIs to the Haskell Platform that aren't in a .Unsafe module. (to what extent that applies to vector I don't know, so it may be that I'm causing trouble for the proposal here).

It's hard to argue against "moving towards safe APIs by default." What I'm going to argue is that SH's extension to the type system (i.e. its definition of safe and unsafe) doesn't exclude many bad programs that aren't already excluded by the type system and excludes many good ones. This is a traditional type system power vs cost argument in other words.

The bad programs rules out by SH are those that have bugs in code rejected by SH (e.g. unsafePerformIO, FFI.) I can't even come up with an example of such a program. Does anyone have an example of a program that's accepted by the current type system but rejected by SH? Another way to ask the same question: in adding all those extra Trustworthy, Save, and Unsafe language pragmas to e.g. base, were any bugs found?

I don't think the goal of using SH in base was to catch bugs in base. I think the goal was to mark which parts of base are safe and, most importantly, which parts are unsafe.

...

The good programs that are rules out by SH is any program that uses a binding to a C library (all FFI imports are unsafe), any program that uses unsafePerformIO or other unsafe functions in its implementation. The latter group includes most of our widely used libraries, including bytestring, vector, network (almost all functions are bindings to C), binary, base, etc. For example, text can't be used as it's UTF-8 decoder is written in C for speed. We'd have to maintain a separate implementation of the UTF-8 decoder for use by SH users.

No we don't. SH users can just import text's decoder in a Safe module if they mark the text package as trusted.

...

If you want to write real code and use SH, you need to either avoid all these libraries or to trust (as in ghc-pkg trust) all these libraries.

Correct

...

Despite none of them having had a thorough security review. In practice Trustworthy language pragmas on top of modules to make code compile. There are 139 modules in base marked as Trustworthy. I seriously doubt that they all are.

Trustworthy obviously doesn't mean no-bugs. It just means that the module author claims that the API of the module can't be used to violate certain guarantees. Whether you trust his claim and how to establish this trust is up to you. For some applications it will be enough to know that the author is a Haskell hacker with a good track-record, for other applications a complete formal-proof of the module is required. Personally I don't see the point of the Trustworthy extension. I think Safe and Unsafe are enough. Importing some module M into a Safe module is allowed if M is inferred to be Safe or if M's package is trusted. I think that's the only rule you need.

...

Furthermore, you will have to trust that the maintainers of those libraries, who often don't care about SH, keeps the code in Trusted modules secure or else whatever benefit you gained from SH is lost (but you keep the costs.) You will have to review the implementation of any new library that you want to unless you only use libraries that are completely safe.

I don't see a way around that. The cost of establishing trust in a package is always on you. You can hire someone to do a safety analysis but then you still have to trust the result of the analysis. It will always come back to you. What would be very useful (but costly) is to have a group of well known, Haskell hackers (call them the trustees) which perform safety analysis on packages. The results can than be published on Hackage.

...

SH also prevents optimizations, like rewriting a particularly hot function in C, as it would change its type.

SH does not prevent the optimization. It just doesn't allow it to be used in a Safe module unless it's trusted.

...

SH is much more invasive than your typical language extension. It requires maintainers of libraries who don't care about SH to change their APIs (!) just to support a yet unproven language extension. The reason is that SH is an all or nothing approach. Either all your dependencies are SH aware or your code won't compile. This is the opposite of how we handle any other language extensions, where people can slowly try out new language features in their own code and only have people who chose to use that code also have to use that feature e.g. if I use type families in a library that depends on containers, that doesn't imply that the containers package have to use type families. This is however true for SH.

If you don't want to use SH you don't have any restrictions on which modules you import or export. Just don't mark your module as Safe!

...

If we start using TH in our core libraries we risk making those libraries more complicated for users to understand and use, without much perceivable benefit.

I guess you mean SH instead of TH. Then it's a valid point. Regards, Bas

Johan Tibell

6:30 p.m.

Hi Bas, On Wed, Jul 11, 2012 at 10:54 AM, Bas van Dijk wrote:

...

I don't think the goal of using SH in base was to catch bugs in base. I think the goal was to mark which parts of base are safe and, most importantly, which parts are unsafe.

But what is the goal then? Simon suggested that SH is useful for people who don't work with potentially malicious code (but perhaps you don't agree.)

...

...
The good programs that are rules out by SH is any program that uses a binding to a C library (all FFI imports are unsafe), any program that uses unsafePerformIO or other unsafe functions in its implementation. The latter group includes most of our widely used libraries, including bytestring, vector, network (almost all functions are bindings to C), binary, base, etc. For example, text can't be used as it's UTF-8 decoder is written in C for speed. We'd have to maintain a separate implementation of the UTF-8 decoder for use by SH users.

No we don't. SH users can just import text's decoder in a Safe module if they mark the text package as trusted.

I should have been more explicit here. Simon's suggestion seems to be that we should reorganize all of our core libraries so that users don't have to trust (very many) libraries. More to the point, we don't need all these .Safe modules if users are fine just trusting the whole vector package.

...

Trustworthy obviously doesn't mean no-bugs. It just means that the module author claims that the API of the module can't be used to violate certain guarantees. Whether you trust his claim and how to establish this trust is up to you. For some applications it will be enough to know that the author is a Haskell hacker with a good track-record, for other applications a complete formal-proof of the module is required.

I claim this is completely orthogonal to SH. I already today have to trust that e.g. Bryan didn't make unsafe use of unsafePerformIO in text such that my application is susceptible to e.g. buffer overflows.

...

...
Furthermore, you will have to trust that the maintainers of those libraries, who often don't care about SH, keeps the code in Trusted modules secure or else whatever benefit you gained from SH is lost (but you keep the costs.) You will have to review the implementation of any new library that you want to unless you only use libraries that are completely safe.

I don't see a way around that. The cost of establishing trust in a package is always on you. You can hire someone to do a safety analysis but then you still have to trust the result of the analysis. It will always come back to you.

Agree. But see above. SH forces me to use some particular system to explain my trust to GHC.

...

...
SH also prevents optimizations, like rewriting a particularly hot function in C, as it would change its type.

SH does not prevent the optimization. It just doesn't allow it to be used in a Safe module unless it's trusted.

I guess I wasn't explicit enough here. I believe Simon's argument is that we should have .Safe modules so people don't have to trust code. However, rewriting a function in C would require it to be moved from the .Safe module to some other module, breaking clients.

...

If you don't want to use SH you don't have any restrictions on which modules you import or export. Just don't mark your module as Safe!

The problem is that users of my package want to use SH and thus send me patches that rewrite my API to support their preference. Cheers, Johan

Henning Thielemann

6:49 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, 11 Jul 2012, Johan Tibell wrote:

...

...
Trustworthy obviously doesn't mean no-bugs. It just means that the module author claims that the API of the module can't be used to violate certain guarantees. Whether you trust his claim and how to establish this trust is up to you. For some applications it will be enough to know that the author is a Haskell hacker with a good track-record, for other applications a complete formal-proof of the module is required.

I claim this is completely orthogonal to SH. I already today have to trust that e.g. Bryan didn't make unsafe use of unsafePerformIO in text such that my application is susceptible to e.g. buffer overflows.

I think the difference is that currently we have to know the set of unsafe functions like unsafePerformIO and search for them in an imported package. This work is now done by the compiler which tells me if there is something suspicious in the package. If a package does not call any unsafe function the compiler can tell me. By searching for unsafePerformIO and friends I could miss something. I think that the SafeHaskell extension is also worth because some programmers do not seem to care about the use of unsafePerformIO. I hope that compiler checks about the use unsafe functions will more discourage the use of unsafePerformIO.

...

I guess I wasn't explicit enough here. I believe Simon's argument is that we should have .Safe modules so people don't have to trust code. However, rewriting a function in C would require it to be moved from the .Safe module to some other module, breaking clients.

I think the idea was to have Unsafe modules and move the unsafe functions there. :-) Are there really so many unsafe functions that must be moved? I mean, a function like unsafePerformIO :: IO a -> a is unsafe and should be in an Unsafe module. However, a function like gamma :: Double -> Double gamma x = unsafePerformIO (GSL.gamma x) should not be unsafe, but trustworthy. The function is safe to use, but the compiler cannot check it. (I hope I do not mix up the terms here.) Are there so many functions like unsafePerformIO, inlinePerformIO, unsafeInterleaveIO in packages on Hackage?

Bas van Dijk

9:11 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On 11 July 2012 20:49, Henning Thielemann wrote:

...

I think the idea was to have Unsafe modules and move the unsafe functions there. :-)

Indeed. I don't see the point about having .Safe modules. Modules should be safe by default as you mentioned before. I guess the reason we have .Safe modules in base and vector is for backwards compatibility. The ideal, but currently, impossible way of dealing with this is to mark the _export_ of unsafe functions in a module as DEPRECATED and in a later version remove the unsafe functions and mark the module as Trustworthy. However this requires support for deprecating exports: http://hackage.haskell.org/trac/ghc/ticket/4879 Bas

Johan Tibell

9:45 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, Jul 11, 2012 at 2:11 PM, Bas van Dijk wrote:

...

On 11 July 2012 20:49, Henning Thielemann wrote:

...
I think the idea was to have Unsafe modules and move the unsafe functions there. :-)

Indeed. I don't see the point about having .Safe modules. Modules should be safe by default as you mentioned before. I guess the reason we have .Safe modules in base and vector is for backwards compatibility.

The ideal, but currently, impossible way of dealing with this is to mark the _export_ of unsafe functions in a module as DEPRECATED and in a later version remove the unsafe functions and mark the module as Trustworthy. However this requires support for deprecating exports:

http://hackage.haskell.org/trac/ghc/ticket/4879

But why? The number of people who will benefit from this mass deprecation/mass migration is tiny. -- Johan

Bas van Dijk

10:08 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On 11 July 2012 23:45, Johan Tibell wrote:

...

On Wed, Jul 11, 2012 at 2:11 PM, Bas van Dijk wrote:

...
On 11 July 2012 20:49, Henning Thielemann wrote:

...
I think the idea was to have Unsafe modules and move the unsafe functions there. :-)

Indeed. I don't see the point about having .Safe modules. Modules should be safe by default as you mentioned before. I guess the reason we have .Safe modules in base and vector is for backwards compatibility.

The ideal, but currently, impossible way of dealing with this is to mark the _export_ of unsafe functions in a module as DEPRECATED and in a later version remove the unsafe functions and mark the module as Trustworthy. However this requires support for deprecating exports:

http://hackage.haskell.org/trac/ghc/ticket/4879

But why? The number of people who will benefit from this mass deprecation/mass migration is tiny.

While I agree that the number of people who will benefit is tiny I don't think it's a "mass" deprecation. In base there are only 7 functions that need to be deprecated: * Control.Monad.ST.Lazy.unsafeInterleaveST * Control.Monad.ST.Lazy.unsafeIOToST * Control.Monad.ST.unsafeInterleaveST * Control.Monad.ST.unsafeIOToST * Control.Monad.ST.unsafeSTToIO * Foreign.ForeignPtr.unsafeForeignPtrToPtr * Foreign.Marshal.unsafeLocalState And because of their unsafe nature are probably not used a lot, so the impact would not be big. However, I do think it's important to have the ability to first DEPRECATE the exports so users are warned well in advanced that they have to change their code. Bas

Johan Tibell

11:27 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, Jul 11, 2012 at 3:08 PM, Bas van Dijk wrote:

...

...
But why? The number of people who will benefit from this mass deprecation/mass migration is tiny.

While I agree that the number of people who will benefit is tiny I don't think it's a "mass" deprecation. In base there are only 7 functions that need to be deprecated:

* Control.Monad.ST.Lazy.unsafeInterleaveST * Control.Monad.ST.Lazy.unsafeIOToST

* Control.Monad.ST.unsafeInterleaveST * Control.Monad.ST.unsafeIOToST * Control.Monad.ST.unsafeSTToIO

* Foreign.ForeignPtr.unsafeForeignPtrToPtr

* Foreign.Marshal.unsafeLocalState

And because of their unsafe nature are probably not used a lot, so the impact would not be big.

However, I do think it's important to have the ability to first DEPRECATE the exports so users are warned well in advanced that they have to change their code.

I'm sorry but I no longer know what you're talking about. What Simon is arguing for is that functions that are unsafe be moved to their own module (e.g. .Unsafe). Slightly simplified, a function is unsafe (according to SH) if * it's one of the "basic" unsafe functions (e.g. unsafePerformIO), or * it calls one of these functions. For example, most functions in bytestring are unsafe because their implementation uses unsafePerformIO. To be very concrete, if 'map' on ByteStrings is unsafe it needs to be moved from Data.ByteString to Data.ByteString.Unsafe In order to do so we need to deprecate the 'map' version in Data.ByteString in order to give users a chance to upgrade their code. Do you now see why loads of functions need to be deprecated? -- Johan

Brandon Allbery

11:38 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, Jul 11, 2012 at 7:27 PM, Johan Tibell wrote:

...

For example, most functions in bytestring are unsafe because their implementation uses unsafePerformIO. To be very concrete, if 'map' on ByteStrings is unsafe it needs to be moved from

While I share your distrust of the whole Safe Haskell movement as being a lot of effort for an unproven benefit from a definition of "safe" that is not demonstrated to be of practical usefulness or practical concern, I think you're wrong here. unsafePerformIO is unsafe. Data.ByteString.map is only unsafe if it allows unsafePerformIO to be abused. If it can verify that nothing actually unsafe takes place — which it does, by dint of the promise inherent in it being exposed as pure — Data.ByteString.map is *not* unsafe. The mechanical application of "oh, it uses unsafePerformIO, we don't care whether it proves it has used it safely: it must by definition be unsafe" just complicates things even more. If indeed it's not simply a strawman. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Johan Tibell

12 Jul 12 Jul

3:52 a.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, Jul 11, 2012 at 4:38 PM, Brandon Allbery wrote:

...

unsafePerformIO is unsafe.

Data.ByteString.map is only unsafe if it allows unsafePerformIO to be abused.

If it can verify that nothing actually unsafe takes place — which it does, by dint of the promise inherent in it being exposed as pure — Data.ByteString.map is *not* unsafe. The mechanical application of "oh, it uses unsafePerformIO, we don't care whether it proves it has used it safely: it must by definition be unsafe" just complicates things even more. If indeed it's not simply a strawman.

It's unsafe in the sense that any module containing it cannot be marked as Safe (only Trustworthy) and thus won't fit the scheme with modules containing only Safe functions that Simon described. -- Johan

Bas van Dijk

7:39 a.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On 12 July 2012 05:52, Johan Tibell wrote:

...

It's unsafe in the sense that any module containing it cannot be marked as Safe (only Trustworthy) and thus won't fit the scheme with modules containing only Safe functions that Simon described.

I think what Simon described was the current situation in the base library where we have: * .Unsafe modules marked as Unsafe that export an API which should be considered unsafe. * .Safe modules that export a safe API. These modules don't necessarily need to be marked as Safe (most of them can't because they themselves import unsafe modules, as in your Data.ByteString example) but they do need to be Trustworthy. Look at [1] for an example. Bas [1] http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M...

Thomas Schilling

6:21 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

I'm still very confused about this Safe Haskell business. I read through the responses and it seems like there are several definitions and of what is meant by safe/unsafe. Is there a document describing this thoroughly? In particular I am interested in: - What definition of "safe" are we using and how can other definitions be incorporated? As I understand it, "safe" currently means "does not subvert the type system". This excludes everything based on the FFI, or any of the unsafe* primitives (and therefore anything IO). Obviously, that would put a "safe" Haskell into the "glorified calculator/useless" department. - So I assume that "trusting" is the way to say, that an code written on top unsafe primitives is indeed safe. How does that work? Who decides whether something is trustworthy? Looking at [1], the module is marked trustworthy. Shouldn't there be some sort of signing or some sort of associated authority with the trust. Just because someone declares his/her own module as trustworthy doesn't mean *I* consider it trustworthy. But I might say, that I trust a package if Johan or Bryan decided it's trustworthy. And even if it's the GHC team declares something as trustworthy, doesn't mean that I want to consider the module trustworthy for my particular use case. - How intrusive is this system? As Johan points out, it doesn't seem to be an opt-in extension, but has to be considered by package authors whether they want to or not. The other concern that Johan pointed out is how much does it violate abstraction? If the package author changes the implementation (e.g., use the FFI), does that require an API change? Obviously, it affects trust, but I don't see a way around that. So, is there a document that discusses this? [1]: http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M... On 12 July 2012 08:39, Bas van Dijk wrote:

...

On 12 July 2012 05:52, Johan Tibell wrote:

...
It's unsafe in the sense that any module containing it cannot be marked as Safe (only Trustworthy) and thus won't fit the scheme with modules containing only Safe functions that Simon described.

I think what Simon described was the current situation in the base library where we have:

* .Unsafe modules marked as Unsafe that export an API which should be considered unsafe. * .Safe modules that export a safe API. These modules don't necessarily need to be marked as Safe (most of them can't because they themselves import unsafe modules, as in your Data.ByteString example) but they do need to be Trustworthy. Look at [1] for an example.

Bas

[1] http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M...

_______________________________________________ Haskell-platform mailing list Haskell-platform@projects.haskell.org http://projects.haskell.org/cgi-bin/mailman/listinfo/haskell-platform

-- Push the envelope. Watch it bend.

Simon Marlow

9:16 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

Hi Thomas, All these questions are answered by the Haskell Symposium paper, which we'll post very shortly. FYI, the FFI is mostly safe, as long as you declare foreign imports to have an IO result type (otherwise it's unsafePerformIO, and hence unsafe). Unsafety is not viral: as soon as you have a safe API, you can declare its implementation to be Trustworthy, and then it is usable from safe code. Cheers, Simon On 12/07/12 19:21, Thomas Schilling wrote:

...

I'm still very confused about this Safe Haskell business. I read through the responses and it seems like there are several definitions and of what is meant by safe/unsafe. Is there a document describing this thoroughly? In particular I am interested in:

- What definition of "safe" are we using and how can other definitions be incorporated? As I understand it, "safe" currently means "does not subvert the type system". This excludes everything based on the FFI, or any of the unsafe* primitives (and therefore anything IO). Obviously, that would put a "safe" Haskell into the "glorified calculator/useless" department.

- So I assume that "trusting" is the way to say, that an code written on top unsafe primitives is indeed safe. How does that work? Who decides whether something is trustworthy? Looking at [1], the module is marked trustworthy. Shouldn't there be some sort of signing or some sort of associated authority with the trust. Just because someone declares his/her own module as trustworthy doesn't mean *I* consider it trustworthy. But I might say, that I trust a package if Johan or Bryan decided it's trustworthy. And even if it's the GHC team declares something as trustworthy, doesn't mean that I want to consider the module trustworthy for my particular use case.

- How intrusive is this system? As Johan points out, it doesn't seem to be an opt-in extension, but has to be considered by package authors whether they want to or not. The other concern that Johan pointed out is how much does it violate abstraction? If the package author changes the implementation (e.g., use the FFI), does that require an API change? Obviously, it affects trust, but I don't see a way around that.

So, is there a document that discusses this?

[1]: http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M...

On 12 July 2012 08:39, Bas van Dijk wrote:

...
On 12 July 2012 05:52, Johan Tibell wrote:

...
It's unsafe in the sense that any module containing it cannot be marked as Safe (only Trustworthy) and thus won't fit the scheme with modules containing only Safe functions that Simon described.

I think what Simon described was the current situation in the base library where we have:

* .Unsafe modules marked as Unsafe that export an API which should be considered unsafe. * .Safe modules that export a safe API. These modules don't necessarily need to be marked as Safe (most of them can't because they themselves import unsafe modules, as in your Data.ByteString example) but they do need to be Trustworthy. Look at [1] for an example.

Bas

[1] http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M...

_______________________________________________ Haskell-platform mailing list Haskell-platform@projects.haskell.org http://projects.haskell.org/cgi-bin/mailman/listinfo/haskell-platform

Henning Thielemann

11 Jul 11 Jul

9:46 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, 11 Jul 2012, Bas van Dijk wrote:

...

The ideal, but currently, impossible way of dealing with this is to mark the _export_ of unsafe functions in a module as DEPRECATED and in a later version remove the unsafe functions and mark the module as Trustworthy. However this requires support for deprecating exports:

We can easily re-define functions, like module Old where import MyMod.Unsafe as Unsafe {-# DEPRECATED "unsafeDoSomething" #-} unsafeDoSomething :: a -> b unsafeDoSomething = Unsafe.unsafeDoSomething Do we really need deprecating exports?

Bas van Dijk

10:16 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On 11 July 2012 23:46, Henning Thielemann wrote:

...

On Wed, 11 Jul 2012, Bas van Dijk wrote:

...
The ideal, but currently, impossible way of dealing with this is to mark the _export_ of unsafe functions in a module as DEPRECATED and in a later version remove the unsafe functions and mark the module as Trustworthy. However this requires support for deprecating exports:

We can easily re-define functions, like

module Old where

import MyMod.Unsafe as Unsafe

{-# DEPRECATED "unsafeDoSomething" #-} unsafeDoSomething :: a -> b unsafeDoSomething = Unsafe.unsafeDoSomething

Do we really need deprecating exports?

As explained in the ticket #4879 this will probably lead to lots of "ambiguous occurrence of unsafeDoSomething" errors since users will probably import both Old and MyMod.Unsafe. The only way to guard against this is to use qualified imports. Lots of people don't use them however. Bas

Ian Lynagh

10:19 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, Jul 11, 2012 at 11:46:42PM +0200, Henning Thielemann wrote:

...

On Wed, 11 Jul 2012, Bas van Dijk wrote:

...
However this requires support for deprecating exports:

We can easily re-define functions, like

module Old where

import MyMod.Unsafe as Unsafe

{-# DEPRECATED "unsafeDoSomething" #-} unsafeDoSomething :: a -> b unsafeDoSomething = Unsafe.unsafeDoSomething

If you do that, then you can't do import Old import MyMod.Unsafe f = ... unsafeDoSomething ... as you will get an ambiguity error. (you can work around it, e.g. by using a qualified name to refer to unsafeDoSomething, but different people will have differing opinions about whether that's OK) Thanks Ian

Henning Thielemann

10:22 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, 11 Jul 2012, Ian Lynagh wrote:

...

If you do that, then you can't do

import Old import MyMod.Unsafe

f = ... unsafeDoSomething ...

as you will get an ambiguity error.

(you can work around it, e.g. by using a qualified name to refer to unsafeDoSomething, but different people will have differing opinions about whether that's OK)

Names of unsafe functions cannot be long enough. :-) I think using qualified names is much easier than adding another extension to GHC.

Johan Tibell

9:43 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On Wed, Jul 11, 2012 at 11:49 AM, Henning Thielemann wrote:

...

I think the difference is that currently we have to know the set of unsafe functions like unsafePerformIO and search for them in an imported package. This work is now done by the compiler which tells me if there is something suspicious in the package. If a package does not call any unsafe function the compiler can tell me. By searching for unsafePerformIO and friends I could miss something.

I think that the SafeHaskell extension is also worth because some programmers do not seem to care about the use of unsafePerformIO. I hope that compiler checks about the use unsafe functions will more discourage the use of unsafePerformIO.

I don't find trusting the authors of the packages I use to be a burden. I already have to trust them with writing correct code and fix bugs when they fail to do so. If they use unsafe functions I trust they do so for good reasons. What would I do with the information that text uses unsafePerformIO somewhere? Most likely nothing.

...

I think the idea was to have Unsafe modules and move the unsafe functions there. :-)

This would break everything. Every single user of the vector library would break. Same goes for many other libraries. I cannot stress how dangerous an idea is from a software engineering and Haskell adoption perspective. You don't break widely used APIs. Look at Python 3. They broken a few things (not all libraries!) and several years down the road most people still don't use Python 3 because of it.

...

Are there really so many unsafe functions that must be moved? I mean, a function like

unsafePerformIO :: IO a -> a

is unsafe and should be in an Unsafe module. However, a function like

gamma :: Double -> Double gamma x = unsafePerformIO (GSL.gamma x)

should not be unsafe, but trustworthy. The function is safe to use, but the compiler cannot check it. (I hope I do not mix up the terms here.)

Are there so many functions like unsafePerformIO, inlinePerformIO, unsafeInterleaveIO in packages on Hackage?

No, all functions that use unsafe functions need to be moved to an .Unsafe module if the former is to be marked as Safe (which is what's being proposed.) Every time a function changes to/from using an unsafe function (even if indirectly) it needs to be juggled back and forth between the two different modules. Some libraries use IO in their core implementation (e.g. bytestring) and cannot be marked as Safe. Packages depending on bytestring cannot be marked as Safe either. -- Johan

Simon Marlow

12 Jul 12 Jul

9:33 p.m.

New subject: safe vs. unsafe (Was: Haskell Platform proposal: Add the vector package)

On 11/07/12 22:43, Johan Tibell wrote:

...

No, all functions that use unsafe functions need to be moved to an .Unsafe module if the former is to be marked as Safe (which is what's being proposed.) Every time a function changes to/from using an unsafe function (even if indirectly) it needs to be juggled back and forth between the two different modules.

In case it isn't clear yet, I just want to state for the record that this is *not* the case. If there's some documentation or something that gives a misleading impression here, please let me know and I'll try to fix it. The whole of Data.ByteString is safe. It's implementation is marked Trustworthy, because it uses unsafe APIs internally. Data.ByteString can be used from Safe code just fine. Cheers, Simon

Mark Lentczner

2:02 a.m.

Stepping back a bit - This thread seems like a poor place to discuss the objectives and merits of Safe Haskell. It is also a poor place to discuss Safe Haskell in the context of Haskell Platform. That is a good discussion we should have, so let's start a new thread for that... See my next post. Can we bring this back to the issue of adding vector, as it is today, to the Platform. For the record, I say "yes, add it". - Mark

Bryan O'Sullivan

5 p.m.

On Wed, Jul 11, 2012 at 7:02 PM, Mark Lentczner wrote:

...

Can we bring this back to the issue of adding vector, as it is today, to the Platform.

For the record, I say "yes, add it".

Funny, I was about to send the same message and then found this. So. I'd like to see vector as it stands now go into the Platform too. If consensus arrives on changes that ought to happen to help with Safe Haskell, that's great, but it should be obvious that gating vector's acceptance on that means that it will be years before any progress can be made.

Simon Marlow

8:43 p.m.

Hi Johan, Thanks for replying. I think there are one or two slight misconceptions about Safe Haskell - that's understandable, I think the documentation could be improved, and the paper that explains it properly isn't available yet (we only just finished the camera-ready copy, so it'll be available soon). Anyway, let me try to explain. It's not nearly as bad or onerous as you think. Safe Haskell isn't about catching bugs. It's about making it possible to program with stronger guarantees than we currently have. The problem is that Haskell provides a bunch of loopholes by which you can violate things like type-safety and referential transparency, and the purpose of Safe Haskell is to make it possible for GHC to check that you aren't using any unsafe features, so your code is guaranteed type-safe and referentially transparent. Myth #1 seems to be that Safe Haskell excludes large amounts of Haskell code that people want to write. Not at all! Normally when you use an unsafe feature, the purpose is to use it to implement a safe API - if that's the case, all you have to do is add Trustworthy to your language pragma, and the API is available to use from Safe code. 99% of Hackage should be either Safe or Trustworthy. We know that 27% is already inferred Safe (see the paper), and a lot of the rest is just waiting for other libraries to add Trustworthy where necessary. The 1% is code that really exports unsafe APIs, like vector. These are in the minority - ask yourself how many packages that you maintain contain an unsafe API (zero, I think). Furthermore, even if there's an unsafe API, so long as the unsafe API isn't part of the same module as a safe API that you might want access to from Safe code, then you have nothing to do - the API is just Unsafe. For typical Haskell programmers, using {-# LANGUAGE Safe #-} will be like -Wall: something that is considered good practice from a hygeine point of view. If you don't *need* access to unsafe features, then it's better to write in the safe subset, where you have stronger guarantees. Just like -Wall, you get to choose whether to use {-# LANGUAGE Safe #-} or not.

...

We'd have to maintain a separate implementation of the UTF-8 decoder for use by SH users.

Not at all - we have one UTF-8 decoder, marked Trustworthy.

...

If you want to write real code and use SH, you need to either avoid all these libraries or to trust (as in ghc-pkg trust) all these libraries.

No -the need to trust packages was turned off by default in 7.4.1, see the -fpackage-trust flag.

...

Despite none of them having had a thorough security review. In practice Trustworthy language pragmas on top of modules to make code compile. There are 139 modules in base marked as Trustworthy. I seriously doubt that they all are.

Whether you believe those claims or not is entirely up to you. Someone who is relying on Haskell for security has to trust a lot of things - GHC itself, the RTS, not to mention the hardware. Safe Haskell just makes it a bit clearer what you have to trust, and allows GHC to automate some of the checking.

...

SH also prevents optimizations, like rewriting a particularly hot function in C, as it would change its type.

Again, the API would be Trustworthy.

...

SH is much more invasive than your typical language extension. It requires maintainers of libraries who don't care about SH to change their APIs (!)

It will be very rare for this to happen. The vast majority of packages do not expose unsafe APIs, so at most all that will be needed is some Trustworthy pragmas, and in many cases no changes at all are needed. In the very few cases where there is a mixed safe/unsafe API, and we really care about Safe access to the safe parts, then the unsafe parts should be moved into a .Unsafe module. I don't think adding a few Trustworthy pragmas is very onerous, and arguably it's good documentation anyway.

...

just to support a yet unproven language extension. The reason is that SH is an all or nothing approach. Either all your dependencies are SH aware or your code won't compile. This is the opposite of how we handle any other language extensions, where people can slowly try out new language features in their own code and only have people who chose to use that code also have to use that feature e.g. if I use type families in a library that depends on containers, that doesn't imply that the containers package have to use type families. This is however true for SH.

Of course it's your prerogative as a library maintainer to decide whether to use Trustworthy or not. For the platform I think the bar is a little higher, and cleanly separating safe from unsafe APIs is warranted.

...

If we start using TH in our core libraries we risk making those libraries more complicated for users to understand and use, without much perceivable benefit.

I don't think separating safe from unsafe APIs makes things more complicated. We were doing that even before Safe Haskell - see e.g. System.IO.Unsafe, and Unsafe.Coerce. Cheers, Simon

Gregory Collins

13 Jul 13 Jul

4:35 a.m.

Hi Simon, On Thu, Jul 12, 2012 at 10:43 PM, Simon Marlow wrote:

...

Safe Haskell isn't about catching bugs. It's about making it possible to program with stronger guarantees than we currently have.

... Normally when you use an unsafe feature, the purpose is to use it to

...

implement a safe API - if that's the case, all you have to do is add Trustworthy to your language pragma, and the API is available to use from Safe code.

The issue I think Johan is complaining about is that this is a very weak sauce. If some muppet can upload a package on Hackage that dereferences nullPtr and just slap a "{-# LANGUAGE Trustworthy #-}" on the top, then we're back exactly where we were before: library users must trust library maintainers and/or carefully security audit the code they rely on. If you're asking library authors to do a lot of work to rearchitect their module namespaces, and increasing their maintenance overhead for the 6-12 months a deprecation cycle would take, I think you have to have a compelling story to offer about how life will be better in the end. Now, if functions could be cryptographically *signed*, meaning that "user X asserts that he's audited this code and it's actually safe", then you could start building the web of trust necessary for this feature to be useful. (Of course, the code would have to be re-signed every time code it depends on changed..... I don't actually think this would work!). As it stands, one miscreant can cause a lot of damage, especially when you consider that right now anyone can upload any version of any package to Hackage --- Safe Haskell or not. G -- Gregory Collins

Simon Marlow

8:21 a.m.

On 13/07/2012 05:35, Gregory Collins wrote:

...

Hi Simon,

On Thu, Jul 12, 2012 at 10:43 PM, Simon Marlow mailto:marlowsd@gmail.com> wrote:

Safe Haskell isn't about catching bugs. It's about making it possible to program with stronger guarantees than we currently have.

...

Normally when you use an unsafe feature, the purpose is to use it to implement a safe API - if that's the case, all you have to do is add Trustworthy to your language pragma, and the API is available to use from Safe code.

The issue I think Johan is complaining about is that this is a very weak sauce. If some muppet can upload a package on Hackage that dereferences nullPtr and just slap a "{-# LANGUAGE Trustworthy #-}" on the top, then we're back exactly where we were before: library users must trust library maintainers and/or carefully security audit the code they rely on.

We're still better off than before: currently you have to trust *all* the code, whereas with Safe Haskell you only have to trust the Trustworthy code. Furthermore, nobody is saying that you as a library maintainer or a user have to audit anything. Only a user that wants to run untrusted code has to worry about what they're trusting, and Safe Haskell gives them three things they didn't have before: (a) automatic checking for Safe code, (b) a way to see which modules have to be trusted, and (c) a mechanism for telling the system which packages are trusted.

...

If you're asking library authors to do a lot of work to rearchitect their module namespaces, and increasing their maintenance overhead for the 6-12 months a deprecation cycle would take, I think you have to have a compelling story to offer about how life will be better in the end.

Well, first of all I think vector is very much a special case. It is very rare to have a mixed safe/unsafe API. The majority of libraries will need either no changes or a few Trustworthy pragmas. For evidence of this you can see what changes had to be made to the packages that come with GHC. If vector doesn't change to accomodate Safe Haskell, it's not a disaster - it just means that clients of vector cannot use Safe. They can still use Trustworthy, so the net result is we'll end up with a few more Trustworthy pragmas than before, and users who care about security have a bit more trusting to do. As for whether the story is compelling, apparently I'm not doing a very good job of selling it :-) But I know that at least some people are very excited about Safe Haskell already. One of the anonymous reviewers on our paper called it (paraphrased) the most important thing to happen in Haskell for a good few years. Maybe that's overstating it a little, but it's clearly important to some users.

...

Now, if functions could be cryptographically *signed*, meaning that "user X asserts that he's audited this code and it's actually safe", then you could start building the web of trust necessary for this feature to be useful. (Of course, the code would have to be re-signed every time code it depends on changed..... I don't actually think this would work!).

As it stands, one miscreant can cause a lot of damage, especially when you consider that right now anyone can upload any version of any package to Hackage --- Safe Haskell or not.

It makes more sense if you think about it the other way around: there is a great deal of Haskell code that we can statically detect does not use any unsafe features, and that we can guarantee is type-safe and referentially transparent. The purpose of Safe Haskell is to let GHC do that checking for you - so with Safe Haskell you have to trust *less* code than before. The rest of the design falls out from this simple idea. Cheers, Simon

Gregory Collins

12:02 p.m.

On Fri, Jul 13, 2012 at 10:21 AM, Simon Marlow wrote:

...

We're still better off than before: currently you have to trust *all* the code, whereas with Safe Haskell you only have to trust the Trustworthy code.

Furthermore, nobody is saying that you as a library maintainer or a user have to audit anything. Only a user that wants to run untrusted code has to worry about what they're trusting, and Safe Haskell gives them three things they didn't have before: (a) automatic checking for Safe code, (b) a way to see which modules have to be trusted, and (c) a mechanism for telling the system which packages are trusted.

Yes, I see your point now, after reading the docs more carefully. 27% of the packages on Hackage being inferred safe, however, means you might have an uphill battle on your hands :) G -- Gregory Collins

Roman Leshchinskiy

3:10 p.m.

Simon, I'm still trying to figure out if there is a sane way to support Safe Haskell's module structure in vector. I'll post my thoughts later. Here are a couple of quick questions and observations, though. Simon Marlow wrote:

...

Myth #1 seems to be that Safe Haskell excludes large amounts of Haskell code that people want to write. Not at all! Normally when you use an unsafe feature, the purpose is to use it to implement a safe API - if that's the case, all you have to do is add Trustworthy to your language pragma, and the API is available to use from Safe code. 99% of Hackage should be either Safe or Trustworthy. We know that 27% is already inferred Safe (see the paper), and a lot of the rest is just waiting for other libraries to add Trustworthy where necessary.

Is "inferred Safe" the same as being marked "Safe-Infered" on Hackage? It does say that for Data.Vector.Unboxed, for instance, but that module certainly contains unsafe functions.

...

...
Despite none of them having had a thorough security review. In practice Trustworthy language pragmas on top of modules to make code compile. There are 139 modules in base marked as Trustworthy. I seriously doubt that they all are.

Whether you believe those claims or not is entirely up to you. Someone who is relying on Haskell for security has to trust a lot of things - GHC itself, the RTS, not to mention the hardware. Safe Haskell just makes it a bit clearer what you have to trust, and allows GHC to automate some of the checking.

FWIW, I went looking through the libraries and GHC does indeed come with a Trustworthy unsafePerformIO out of the box (it's in Data.Binary.Builder.Internal). Now, I'm not trying to knock the binary package. I think it just shows that unsafe functions *will* slip through the net, even in code written by absolute experts, so Johan's concerns are quite warranted.

...

...
SH is much more invasive than your typical language extension. It requires maintainers of libraries who don't care about SH to change their APIs (!)

It will be very rare for this to happen. The vast majority of packages do not expose unsafe APIs, so at most all that will be needed is some Trustworthy pragmas, and in many cases no changes at all are needed. In the very few cases where there is a mixed safe/unsafe API, and we really care about Safe access to the safe parts, then the unsafe parts should be moved into a .Unsafe module.

It seems that anything array-related would have to be split up in this way, just like it was in base. FWIW, I think Debug.Trace should be split up, too, since traceIO and especially traceEventIO are perfectly safe and useful. What about floating point arithmetic? Is it safe if I turn on FP exceptions?

...

I don't think adding a few Trustworthy pragmas is very onerous, and arguably it's good documentation anyway.

One problem I noticed here is Haskell's treatment of instances. It just isn't always obvious what exactly your module exports. It's easy to say that a particular function is safe. It is much harder to say the same for a whole module because you might be importing and hence exporting unsafe instances without realising. I think more language support would be needed here.

...

Of course it's your prerogative as a library maintainer to decide whether to use Trustworthy or not. For the platform I think the bar is a little higher, and cleanly separating safe from unsafe APIs is warranted.

Independent of vector, I'm against making Safe Haskell compliance a prerequisite for anything at this point. If it proves itself and if enough people start using it and relying on it - sure. But right now, it an experimental language extension with a very small user base. Roman

Brandon Allbery

3:24 p.m.

On Fri, Jul 13, 2012 at 11:10 AM, Roman Leshchinskiy wrote:

...

Is "inferred Safe" the same as being marked "Safe-Infered" on Hackage? It does say that for Data.Vector.Unboxed, for instance, but that module certainly contains unsafe functions.

I think this is a haddock bug: ghc gets the inference right, haddock prints the wrong thing.

...

FWIW, I went looking through the libraries and GHC does indeed come with a Trustworthy unsafePerformIO out of the box (it's in Data.Binary.Builder.Internal). Now, I'm not trying to knock the binary

That's a bit worrisome, and comes closer to *my* fears about this feature, second to people expecting "safe" to mean something more meaningful to them than "doesn't subvert the type system". Independent of vector, I'm against making Safe Haskell compliance a

...

prerequisite for anything at this point. If it proves itself and if enough people start using it and relying on it - sure. But right now, it an experimental language extension with a very small user base.

Indeed; especially in light of the above about Data.Binary, it can only be considered unproven, and a Platform which took it on now would be making false assurances. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Mark Lentczner

15 Jul 15 Jul

4:03 a.m.

At the risk of throwing more wood on the fire here - I went back and looked at Vector and now I see that there are large set of ".Safe" variants that are no more than re-exports of the exact same functions from the non .Safe versions of the modules with an extra safe haskell declaration added. What is the point of this? Shouldn't the declaration just be on the normal module (safe or trustworthy). Do we really need .Safe versions of all the modules? If we don't, please, let's take these out of the package before committing it to the platform (where we have to make a commitment to the API). If we do need these modules to support Safe Haskell --- then something is seriously wrong with the either construct, or the way vector is using it. I think the state of affairs stinks. It will do nothing but confuse the heck out of users - and present exactly what the platform is there it remove: uncertainty and instability. - Mark

Antoine Latter

1:44 p.m.

On Sat, Jul 14, 2012 at 11:03 PM, Mark Lentczner wrote:

...

At the risk of throwing more wood on the fire here - I went back and looked at Vector and now I see that there are large set of ".Safe" variants that are no more than re-exports of the exact same functions from the non .Safe versions of the modules with an extra safe haskell declaration added. What is the point of this? Shouldn't the declaration just be on the normal module (safe or trustworthy). Do we really need .Safe versions of all the modules? If we don't, please, let's take these out of the package before committing it to the platform (where we have to make a commitment to the API). If we do need these modules to support Safe Haskell --- then something is seriously wrong with the either construct, or the way vector is using it.

I was going to use the Data.Vector.Fusion.Stream module to explain why this was done, but that's a bad example because I don't see anything unsafe (from a SafeHaskell perspective) in the nonSafe module. I better example would be "Data.Vector.Generic.Mutable" - it exports the function "unsafeWrite" which can poke data into arbitrary memory offsets. So the module shouldn't be marked as trustworthy, but the majority of the symbols are things that should be usable from SafeHaskell. Hence the "Safe" module. The better solution would have been to break the API and move dangerous functions into "Unsafe" modules.

...

I think the state of affairs stinks. It will do nothing but confuse the heck out of users - and present exactly what the platform is there it remove: uncertainty and instability.

- Mark

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Yitzchak Gale

1:53 p.m.

On Sun, Jul 15, 2012 at 7:03 AM, Mark Lentczner wrote:

...

...I went back and looked at Vector and now I see that there are large set of ".Safe" variants that are no more than re-exports of the exact same functions from the non .Safe versions of the modules with an extra safe haskell declaration added.... I think the state of affairs stinks. It will do nothing but confuse the heck out of users

Simon already pointed out that, in his opinion, the correct way to support SH would be: - rename M to M.Internal (or suitable alternative) - rename M.Safe to M - add a (small) M.Unsafe where necessary But that would break backwards compatibility for the unsafe parts of the API. It's up to the package maintainers whether or not they want to do that. If not, I would say revert to no Safe Haskell support and accept it in the platform. However, add a haddock comment something like this: "Safe Haskell: If you do not use any functions in this module whose name contains the word 'unsafe', you can mark your module as 'Trustworthy'. Otherwise, please consult ." Thanks, Yitz

Thomas Schilling

3:11 p.m.

On 15 July 2012 14:53, Yitzchak Gale wrote:

...

On Sun, Jul 15, 2012 at 7:03 AM, Mark Lentczner wrote:

...
...I went back and looked at Vector and now I see that there are large set of ".Safe" variants that are no more than re-exports of the exact same functions from the non .Safe versions of the modules with an extra safe haskell declaration added.... I think the state of affairs stinks. It will do nothing but confuse the heck out of users

Simon already pointed out that, in his opinion, the correct way to support SH would be:

- rename M to M.Internal (or suitable alternative) - rename M.Safe to M - add a (small) M.Unsafe where necessary

But that would break backwards compatibility for the unsafe parts of the API. It's up to the package maintainers whether or not they want to do that. If not, I would say revert to no Safe Haskell support and accept it in the platform. However, add a haddock comment something like this:

"Safe Haskell: If you do not use any functions in this module whose name contains the word 'unsafe', you can mark your module as 'Trustworthy'. Otherwise, please consult ."

To be fair, regardless of SH, I'd consider it good API design to put unsafe things into a separate module. I'd be interested to know what exactly the problem is with moving these functions into a separate module. If the only argument for not making this change is to avoid breaking the API then we should do it *before* including vector into the platform. P.S.: I really wish we had a tool like "gofix" that automates trivial API updates. That way simple API changes could be automated. There are tools like this for Java and C, too.

Roman Leshchinskiy

16 Jul 16 Jul

10:01 a.m.

Thomas Schilling wrote:

...

To be fair, regardless of SH, I'd consider it good API design to put unsafe things into a separate module.

I'll ask again: why is putting unsafe* functions into a separate module preferable to just following the unsafe* naming convention? I'm honestly interested in an answer - it seems to me that for someone who doesn't want to use unsafe functions, the two approaches are essentially equivalent and for someone who does, a separate module is more cumbersome.

...

I'd be interested to know what exactly the problem is with moving these functions into a separate module. If the only argument for not making this change is to avoid breaking the API then we should do it *before* including vector into the platform.

I tried to explain the problems elsewhere in the thread. Roman

Simon Marlow

11:44 a.m.

On 16/07/2012 11:01, Roman Leshchinskiy wrote:

...

Thomas Schilling wrote:

...
To be fair, regardless of SH, I'd consider it good API design to put unsafe things into a separate module.

I'll ask again: why is putting unsafe* functions into a separate module preferable to just following the unsafe* naming convention? I'm honestly interested in an answer - it seems to me that for someone who doesn't want to use unsafe functions, the two approaches are essentially equivalent and for someone who does, a separate module is more cumbersome.

Well, for one thing you can tell whether a module has access to unsafe stuff by just looking at its imports. This argument applies both to users (it's clearer when things are separated by module) and to the implementation (we don't have to maintain a per-identifier safe flag, which would complicate the implementation and bloat interface files). The real problem here seems to be the clash between the meaning of "unsafe" in the context of Vector, and the meaning of "unsafe" in Safe Haskell. I don't see a good way to resolve that conflict, though of course I think the definition of unsafe in Safe Haskell is sensible and I'd like that to become the accepted meaning of the term. So far in Haskell there has been no consistent definition of unsafe, which is confusing for users. Just to repeat what I said earlier, I don't see there being any objection to putting unsafeRead with the other unsafe functions in vector, even though technically it is safe. Cheers, Simon

Gábor Lehel

12:18 p.m.

On Mon, Jul 16, 2012 at 1:44 PM, Simon Marlow wrote:

...

On 16/07/2012 11:01, Roman Leshchinskiy wrote:

...
Thomas Schilling wrote:

...
To be fair, regardless of SH, I'd consider it good API design to put unsafe things into a separate module.

I'll ask again: why is putting unsafe* functions into a separate module preferable to just following the unsafe* naming convention? I'm honestly interested in an answer - it seems to me that for someone who doesn't want to use unsafe functions, the two approaches are essentially equivalent and for someone who does, a separate module is more cumbersome.

Well, for one thing you can tell whether a module has access to unsafe stuff by just looking at its imports. This argument applies both to users (it's clearer when things are separated by module) and to the implementation (we don't have to maintain a per-identifier safe flag, which would complicate the implementation and bloat interface files).

The real problem here seems to be the clash between the meaning of "unsafe" in the context of Vector, and the meaning of "unsafe" in Safe Haskell. I don't see a good way to resolve that conflict, though of course I think the definition of unsafe in Safe Haskell is sensible and I'd like that to become the accepted meaning of the term. So far in Haskell there has been no consistent definition of unsafe, which is confusing for users.

Just to repeat what I said earlier, I don't see there being any objection to putting unsafeRead with the other unsafe functions in vector, even though technically it is safe.

With apologies for repeating myself, isn't the fact that unsafeRead and unsafeWrite can access arbitrary memory locations a problem? Does memory safety not matter? Shouldn't it be represented at some level? I recognize that if we consider them unsafe, that it means that FFI code that deals with pointers would all have to be Trustworthy or Unsafe, but I'm not sure that's a bad thing. Heck, with a raw pointer (or unchecked array access) you could use it to write into GHC's heap and change the contents of some immutable object, thereby violating referential transparency, which is what Safe Haskell is supposed to prevent, that is if you don't just crash the program outright. -- Your ship was caught in a monadic eruption.

Simon Marlow

1:58 p.m.

On 16/07/2012 13:18, Gábor Lehel wrote:

...

With apologies for repeating myself, isn't the fact that unsafeRead and unsafeWrite can access arbitrary memory locations a problem? Does memory safety not matter?

The definition of safety in Safe Haskell requires type safety, it does not impose any extra restrictions on what you can do in the IO monad. In the terminology we use in the paper, the latter is called a "security" requirement, as distinct from safety. Since security requirements tend to be application-specific, it wouldn't make sense to build one into Safe Haskell itself. Safe Haskell is the mechanism on which you can implement whatever security policy you need - there's an example in the paper of defining a restricted IO monad for use by untrusted code. Cheers, Simon

Gábor Lehel

2:44 p.m.

On Mon, Jul 16, 2012 at 3:58 PM, Simon Marlow wrote:

...

On 16/07/2012 13:18, Gábor Lehel wrote:

...
With apologies for repeating myself, isn't the fact that unsafeRead and unsafeWrite can access arbitrary memory locations a problem? Does memory safety not matter?

The definition of safety in Safe Haskell requires type safety, it does not impose any extra restrictions on what you can do in the IO monad. In the terminology we use in the paper, the latter is called a "security" requirement, as distinct from safety. Since security requirements tend to be application-specific, it wouldn't make sense to build one into Safe Haskell itself. Safe Haskell is the mechanism on which you can implement whatever security policy you need - there's an example in the paper of defining a restricted IO monad for use by untrusted code.

Cheers, Simon

I see, thanks. I'll have to read the paper. -- Your ship was caught in a monadic eruption.

Roman Leshchinskiy

12:55 p.m.

Simon Marlow wrote:

...

Just to repeat what I said earlier, I don't see there being any objection to putting unsafeRead with the other unsafe functions in vector, even though technically it is safe.

Actually, this particular bit probably isn't a problem. I actually simplified the example slightly. The real type of unsafeRead is: unsafeRead :: (PrimMonad m, MVector v a) => v (PrimState m) a -> Int -> m a Hier, m is either IO or ST. What I didn't realise was that runST is marked as Trustworthy. This means that for ST-based code to be Trustworthy, it must really be safe when executed. This is different from IO where Safe Haskell doesn't care what happens when it's executed. I'm not sure if this is documented anywhere, it certainly wasn't obvious to me. The end effect is that while the IO instantiation of unsafeRead is safe, the ST one isn't. Hence, it can't be marked as Trustworthy anyway. All functions on mutable vectors are overloaded in this way, so it seems that the presence of ST makes Safe Haskell's notion of safety much closer to the one vector uses. Roman

Simon Marlow

1:43 p.m.

On 16/07/2012 13:55, Roman Leshchinskiy wrote:

...

Simon Marlow wrote:

...
Just to repeat what I said earlier, I don't see there being any objection to putting unsafeRead with the other unsafe functions in vector, even though technically it is safe.

Actually, this particular bit probably isn't a problem. I actually simplified the example slightly. The real type of unsafeRead is:

unsafeRead :: (PrimMonad m, MVector v a) => v (PrimState m) a -> Int -> m a

Hier, m is either IO or ST. What I didn't realise was that runST is marked as Trustworthy. This means that for ST-based code to be Trustworthy, it must really be safe when executed. This is different from IO where Safe Haskell doesn't care what happens when it's executed. I'm not sure if this is documented anywhere, it certainly wasn't obvious to me. The end effect is that while the IO instantiation of unsafeRead is safe, the ST one isn't. Hence, it can't be marked as Trustworthy anyway. All functions on mutable vectors are overloaded in this way, so it seems that the presence of ST makes Safe Haskell's notion of safety much closer to the one vector uses.

Ok, that's good then. The point about IO is made in the paper, but should probably be more clear in the documentation. Cheers, Simon

Roman Leshchinskiy

2:25 p.m.

Simon Marlow wrote:

...

On 16/07/2012 13:55, Roman Leshchinskiy wrote:

...
Simon Marlow wrote:

...
Just to repeat what I said earlier, I don't see there being any objection to putting unsafeRead with the other unsafe functions in vector, even though technically it is safe.

Actually, this particular bit probably isn't a problem. I actually simplified the example slightly. The real type of unsafeRead is:

unsafeRead :: (PrimMonad m, MVector v a) => v (PrimState m) a -> Int -> m a

Hier, m is either IO or ST. What I didn't realise was that runST is marked as Trustworthy. This means that for ST-based code to be Trustworthy, it must really be safe when executed. This is different from IO where Safe Haskell doesn't care what happens when it's executed. I'm not sure if this is documented anywhere, it certainly wasn't obvious to me. The end effect is that while the IO instantiation of unsafeRead is safe, the ST one isn't. Hence, it can't be marked as Trustworthy anyway. All functions on mutable vectors are overloaded in this way, so it seems that the presence of ST makes Safe Haskell's notion of safety much closer to the one vector uses.

Ok, that's good then.

The point about IO is made in the paper, but should probably be more clear in the documentation.

It's not IO I was confused about, it's ST which isn't mentioned anywhere in the paper AFAICS. There is a choice here. Either runST is declared Trustworthy and then the semantics of all ST code when executed affects safety. Or runST is *not* declared Trustworthy and then the semantics of ST code when executed doesn't matter because there is no Safe way to execute it, just like IO. I agree that the former is the right choice but it wasn't obvious to me that this is what has been implemented. In fact, the only way for me to find out was to look at the modules that export runST and friends and see if any of them is marked Trustworthy. Roman

Simon Marlow

2:48 p.m.

On 16/07/2012 15:25, Roman Leshchinskiy wrote:

...

Simon Marlow wrote:

...
On 16/07/2012 13:55, Roman Leshchinskiy wrote:

...
Simon Marlow wrote:

...
Just to repeat what I said earlier, I don't see there being any objection to putting unsafeRead with the other unsafe functions in vector, even though technically it is safe.

Actually, this particular bit probably isn't a problem. I actually simplified the example slightly. The real type of unsafeRead is:

unsafeRead :: (PrimMonad m, MVector v a) => v (PrimState m) a -> Int -> m a

Hier, m is either IO or ST. What I didn't realise was that runST is marked as Trustworthy. This means that for ST-based code to be Trustworthy, it must really be safe when executed. This is different from IO where Safe Haskell doesn't care what happens when it's executed. I'm not sure if this is documented anywhere, it certainly wasn't obvious to me. The end effect is that while the IO instantiation of unsafeRead is safe, the ST one isn't. Hence, it can't be marked as Trustworthy anyway. All functions on mutable vectors are overloaded in this way, so it seems that the presence of ST makes Safe Haskell's notion of safety much closer to the one vector uses.

Ok, that's good then.

The point about IO is made in the paper, but should probably be more clear in the documentation.

It's not IO I was confused about, it's ST which isn't mentioned anywhere in the paper AFAICS. There is a choice here. Either runST is declared Trustworthy and then the semantics of all ST code when executed affects safety. Or runST is *not* declared Trustworthy and then the semantics of ST code when executed doesn't matter because there is no Safe way to execute it, just like IO. I agree that the former is the right choice but it wasn't obvious to me that this is what has been implemented. In fact, the only way for me to find out was to look at the modules that export runST and friends and see if any of them is marked Trustworthy.

Ah ok, so your concern was that you couldn't easily find out whether runST was safe or not? If you look at the library docs: http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M... you'll see that Control.Monad.ST is Unsafe (because it exports three unsafe functions). But there's a Trustworthy API: http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.5.1.0/Control-M... I wasn't aware that we'd split things this way, it was probably for backwards compatibility reasons. Cheers, Simon

Roman Leshchinskiy

3:17 p.m.

Simon Marlow wrote:

...

Ah ok, so your concern was that you couldn't easily find out whether runST was safe or not? If you look at the library docs:

Whether runST is safe or not has a huge impact on what ST code I can declare Trustworthy even if I don't use runST at all. IMO, the fact that ST code that doesn't do bounds checking must not be declared Trustworthy should be stated somewhere prominent especially since this does *not* apply to similar monads like IO and (presumably) STM. The library docs are not really a good place for this - who would ever look at the documentation of ST when figuring out Safe Haskell? Roman

Simon Marlow

17 Jul 17 Jul

9:16 a.m.

On 16/07/2012 16:17, Roman Leshchinskiy wrote:

...

Simon Marlow wrote:

...
Ah ok, so your concern was that you couldn't easily find out whether runST was safe or not? If you look at the library docs:

Whether runST is safe or not has a huge impact on what ST code I can declare Trustworthy even if I don't use runST at all. IMO, the fact that ST code that doesn't do bounds checking must not be declared Trustworthy should be stated somewhere prominent especially since this does *not* apply to similar monads like IO and (presumably) STM. The library docs are not really a good place for this - who would ever look at the documentation of ST when figuring out Safe Haskell?

Ok, we can put something in the docs about ST. The fact that you can't do arbitrary side effects in ST follows from the definition of safety and the fact that runST injects ST computations into pure computations. So there's really no design choice here. The same applies to the Par monad, and any monad that injects into pure computations. Cheers, Simon

Roman Leshchinskiy

9:28 a.m.

Simon Marlow wrote:

...

The fact that you can't do arbitrary side effects in ST follows from the definition of safety and the fact that runST injects ST computations into pure computations. So there's really no design choice here. The same applies to the Par monad, and any monad that injects into pure computations.

I disagree about there not being a choice. As I wrote, if the injection function is Unsafe (i.e., not Trustworthy), then there is no safe way to inject the computations and then those computations can do what they want (as far as Safe Haskell is concerned) because they can't be safely executed anyway. So it really depends on whether or not the injection function has been declared Trustworthy. Roman

Simon Marlow

9:40 p.m.

On 17/07/12 10:28, Roman Leshchinskiy wrote:

...

Simon Marlow wrote:

...
The fact that you can't do arbitrary side effects in ST follows from the definition of safety and the fact that runST injects ST computations into pure computations. So there's really no design choice here. The same applies to the Par monad, and any monad that injects into pure computations.

I disagree about there not being a choice. As I wrote, if the injection function is Unsafe (i.e., not Trustworthy), then there is no safe way to inject the computations and then those computations can do what they want (as far as Safe Haskell is concerned) because they can't be safely executed anyway. So it really depends on whether or not the injection function has been declared Trustworthy.

Well I suppose you *could* imagine having an Unsafe runST and then it would be fine to mark your unsafe ST operations Trustworthy, but that would be strange to say the least! Anyway, what I'm taking away from all this is that we need to revise the Safe Haskell docs in the GHC user's guide, with lots of examples and guidance. Cheers, Simon

Henning Thielemann

20 Jul 20 Jul

6:58 p.m.

Am 15.07.2012 17:11, schrieb Thomas Schilling:

...

On 15 July 2012 14:53, Yitzchak Gale wrote:

...
Simon already pointed out that, in his opinion, the correct way to support SH would be:

- rename M to M.Internal (or suitable alternative) - rename M.Safe to M - add a (small) M.Unsafe where necessary

But that would break backwards compatibility for the unsafe parts of the API. It's up to the package maintainers whether or not they want to do that. If not, I would say revert to no Safe Haskell support and accept it in the platform. However, add a haddock comment something like this:

"Safe Haskell: If you do not use any functions in this module whose name contains the word 'unsafe', you can mark your module as 'Trustworthy'. Otherwise, please consult ."

To be fair, regardless of SH, I'd consider it good API design to put unsafe things into a separate module. I'd be interested to know what exactly the problem is with moving these functions into a separate module. If the only argument for not making this change is to avoid breaking the API then we should do it *before* including vector into the platform.

Roman Leshchinskiy

16 Jul 16 Jul

11:14 a.m.

Mark Lentczner wrote:

...

If we don't, please, let's take these out of the package before committing it to the platform (where we have to make a commitment to the API).

I know I'm woefully uninformed about the platform requirements but this caught my eye. What exactly is the commitment to the API that the platform requires? Roman

Roman Leshchinskiy

11:10 a.m.

Roman Leshchinskiy wrote:

...

Simon,

I'm still trying to figure out if there is a sane way to support Safe Haskell's module structure in vector. I'll post my thoughts later.

...

From a technical point of view, I would have to write a tool which autogenerates various modules in vector. This is something that I've wanted to do for a while anyway but finding the time is tricky. At the moment, I have to manually update 4 modules whenever I add new function to vector which is right at the "nuisance threshold". With Safe Haskell support, I would have to touch more modules which will put the whole thing way above the threshold. With autogeneration, I would hopefully only have to touch one module in either case.

However, I'm still rather unconvinced by the whole thing for several reasons. The main one really is that in the entire discussion so far nobody has said that Safe Haskell support in vector is something they actually need. When I asked if someone wanted to maintain the *.Safe modules in vector, nobody was interested. This leads me to believe that the demand for this is really low and that what time I have would be better spent working on other features. I'm quite open to being proved wrong here, though. Roman

Henning Thielemann

20 Jul 20 Jul

6:55 p.m.

Am 16.07.2012 13:10, schrieb Roman Leshchinskiy:

...

However, I'm still rather unconvinced by the whole thing for several reasons. The main one really is that in the entire discussion so far nobody has said that Safe Haskell support in vector is something they actually need.

Since GHC now has the SafeHaskell feature I will certainly check the packages that I use for proper use of SafeHaskell in future. If I have the choice between a package that exports all modules as Unsafe or Trustworthy and a package with mostly Safe modules, I will certainly choose the latter one.

...

When I asked if someone wanted to maintain the *.Safe modules in vector, nobody was interested. This leads me to believe that the demand for this is really low and that what time I have would be better spent working on other features. I'm quite open to being proved wrong here, though.

I remember that the state of discussion was that there should be an Unsafe module that exports few unsafe functions in the Safe Haskell sense and that the large set of safe and trustworthy functions should be exported from modules without any suffix like Internal, Safe or so. Thus there would not be more maintenance necessary for supporting SafeHaskell.

Bryan O'Sullivan

22 Jun 22 Jun

10:38 p.m.

On Fri, Jun 15, 2012 at 5:45 PM, Johan Tibell wrote:

...

I am, with Roman's support, making a formal proposal to have the vector package included in the Haskell Platform:

Very much in favour of this. Thanks for running the proposal!

Bryan O'Sullivan

26 Jul 26 Jul

2:06 a.m.

On Fri, Jun 15, 2012 at 2:45 PM, Johan Tibell wrote:

...

I am, with Roman's support, making a formal proposal to have the vector package included in the Haskell Platform:

We've had six weeks of intermittent discussions now, but I haven't seen anything potentially dangerous to the proposal except for some hand-wringing and crossed wires over Safe Haskell. I'd like to continue to manage the discussion process fairly actively, namely by trying to draw it to a close. So. On the "pro" side, vector is already widely used and liked. On the "con" side, there's the Safe Haskell question. Simon is quite reasonably in favour of SH, and also very reasonably not treating this question as a blocker. I'll admit to not having found a compelling reason to care about SH yet, and I would similarly be sad to see a good library get hung up on a compiler feature that is not yet widely accepted. Upon looking through the threads of discussion around this proposal, I haven't been able to find anything else. Are there other Big Questions we need to resolve before Aug 13?

Henning Thielemann

2:29 p.m.

On Wed, 25 Jul 2012, Bryan O'Sullivan wrote:

...

So. On the "pro" side, vector is already widely used and liked.

On the "con" side, there's the Safe Haskell question. Simon is quite reasonably in favour of SH, and also very reasonably not treating this question as a blocker. I'll admit to not having found a compelling reason to care about SH yet, and I would similarly be sad to see a good library get hung up on a compiler feature that is not yet widely accepted.

I expect that API changes to vector become more difficult once 'vector' is in the platform. Thus I would prefer to use the opportunity of platform addition for bigger API changes, otherwise this might be delayed for years. And I don't think that we are talking about a not-well-adopted new GHC feature but about good style in general, namely the clear distinction safe and unsafe functions (that now happens to become formalized by a new GHC feature).

Johan Tibell

25 Sep 25 Sep

2:29 p.m.

Hi all, After discussing this proposal at ICFP and with the Haskell Platform committee, we've decided that there's a rough consensus for adding vector to the platform. We will leave the following open issue for the future: * Using SafeHaskell in the platform in general, and in vector in particular. This would be a large commitment for the platform, as we'd implicitly be telling our users that we've deemed the packages in the platform trustworthy. This is not something we should do without committing to making sure they are, which is something we are not willing to do just yet (as it requires a large amount of work, now and later). The vector package will be added without the .Safe modules (which no one wants). P.S. I you haven't watched Mark's state of the platform talk already, I recommend that you do so: http://www.youtube.com/watch?v=st22QE-g0uo Cheers, Johan, with his HP committee hat on

Simon Marlow

8 Oct 8 Oct

11:58 a.m.

On 25/09/2012 15:29, Johan Tibell wrote:

...

Hi all,

After discussing this proposal at ICFP and with the Haskell Platform committee, we've decided that there's a rough consensus for adding vector to the platform.

We will leave the following open issue for the future:

* Using SafeHaskell in the platform in general, and in vector in particular. This would be a large commitment for the platform, as we'd implicitly be telling our users that we've deemed the packages in the platform trustworthy. This is not something we should do without committing to making sure they are, which is something we are not willing to do just yet (as it requires a large amount of work, now and later). The vector package will be added without the .Safe modules (which no one wants).

With my "Safe Haskell pedantry" hat on, and at the risk of reviving that long thread, I would just like to point out that this is not quite right. You would not be guaranteeing anything about Trustworthy modules: the point of Trustworthy is to tell the user which modules they need to trust in order to get the Safe Haskell guarantees. The user gets to choose whether to actually trust the modules or not. Note that GHC already comes with a lot of libraries that are marked Trustworthy, but we don't consider ourselves to have made any absolute guarantees about anything. Trustworthy is badly named, it should really be called "TrustNeeded", or something. Now of course we should go to reasonable efforts to make sure that those modules *are* trustworthy, but there's no need to do it all at once. I think we should gradually move in the direction of safety, where it makes sense. Cheers, Simon

4656

Age (days ago)

4771

Last active (days ago)

List overview

Download

82 comments

16 participants

participants (16)

Antoine Latter
Bas van Dijk
Brandon Allbery
Bryan O'Sullivan
Gregory Collins
Gábor Lehel
Henning Thielemann
Ian Lynagh
Johan Tibell
Mark Lentczner
Roman Leshchinskiy
Ryan Newton
Simon Marlow
Simon Peyton-Jones
Thomas Schilling
Yitzchak Gale