
Hello, The "packCString" function (and other similar functions) in the ByteString library break referential transperancy, which is one of the big selling points of Haskell (and its libraries). Here is an example: main = do x <- newCString "Hello" let s = packCString x h1 = B.head s print s -- print h1 poke x (toEnum 97) print s let h2 = B.head s print h1 print h2 Output: "Hello" "aello" 97 97 This is already confusing because the "pure" value 's' has magically changed. Also notice that the evaluation order of the program affects the output. If we include the commented out statement (which forces h1 to be evaluated earlier) the output becomes: "Hello" 72 "aello" 72 97 I think that because of this we should either remove these functions, or at least follow the convention of other libraries and give them "unsafe" names. -Iavor

Iavor Diatchki wrote:
Hello, The "packCString" function (and other similar functions) in the ByteString library break referential transperancy, which is one of the big selling points of Haskell (and its libraries). Here is an example:
main = do x <- newCString "Hello" let s = packCString x h1 = B.head s print s -- print h1 poke x (toEnum 97) print s let h2 = B.head s print h1 print h2 Output: "Hello" "aello" 97 97
This is already confusing because the "pure" value 's' has magically changed. Also notice that the evaluation order of the program affects the output. If we include the commented out statement (which forces h1 to be evaluated earlier) the output becomes: "Hello" 72 "aello" 72 97
I think that because of this we should either remove these functions, or at least follow the convention of other libraries and give them "unsafe" names.
-Iavor
The Data.ByteString type is really a fancy interface to ForeignPtr, which leads to these issues. I mostly agree. Perhaps the pack* function should have unsafe* names. Or at least very large warnings in the documentation.
From looking at the Haddock documentation:
The "safe" versions already exist as copyCString and copyCStringLen, which look like (copy . packCString) and (copy . packCStringLen). And Data.ByteString.useAsCStringLen also looks "unsafe", but Data.ByteString.useAsCString is safe. Even more oddly Data.ByteString.useAsCStringLen looks identical in the documentation to Data.ByteString.Base.unsafeUseAsCStringLen but does not call itself "unsafe" and neither have the same dire warnings as Data.ByteString.Base.unsafeUseAsCString. -- Chris

On Sun, Jan 28, 2007 at 07:16:16PM +0000, Chris Kuklewicz wrote:
Iavor Diatchki wrote:
Hello, The "packCString" function (and other similar functions) in the ByteString library break referential transperancy, which is one of the big selling points of Haskell (and its libraries). Here is an example:
main = do x <- newCString "Hello" let s = packCString x h1 = B.head s print s -- print h1 poke x (toEnum 97) print s let h2 = B.head s print h1 print h2 Output: "Hello" "aello" 97 97
This is already confusing because the "pure" value 's' has magically changed. Also notice that the evaluation order of the program affects the output. If we include the commented out statement (which forces h1 to be evaluated earlier) the output becomes: "Hello" 72 "aello" 72 97
It gets more fun: import Foreign.C.String import Data.ByteString import Control.Exception shouldBePure :: CString -> () shouldBePure str = packMallocCString str `seq` () main = do x <- newCString "Hello" evaluate $ shouldBePure x evaluate $ shouldBePure x -- force a GC evaluate $ Prelude.reverse [0..] --> stefan@stefans:/tmp$ ./X *** glibc detected *** double free or corruption (fasttop): 0x08086628 *** Aborted Now what if GHC did more CSE... <shudder>

On Sun, 2007-01-28 at 09:58 -0800, Iavor Diatchki wrote:
Hello, The "packCString" function (and other similar functions) in the ByteString library break referential transperancy, which is one of the big selling points of Haskell (and its libraries).
I think that because of this we should either remove these functions, or at least follow the convention of other libraries and give them "unsafe" names.
We can put them in the IO monad. The same applies to packCStringLen and packMallocCString. Duncan

Hello,
On 1/28/07, Duncan Coutts
We can put them in the IO monad. The same applies to packCStringLen and packMallocCString.
Adding packCString (and friends) to the IO monad does not make them any safer (the IO monad hides some sins but not all ;-). To see the problem, imagine that packCString was of type "CString -> IO ByteString" (as Don suggested). Then we still get the same weird behavior if we replace: let s = packCString x with s <- packCString x in the example that I posted. The problem is that 'x' allows us to mutate the memory area occupied by s' but Haskell values are supposed to be immutable. So if we want to have these functions, then it seems that the best we can do is to mark them with the "unsafe" label and be very careful how we use them... Out of curiosity (I have not looked at the implementation of ByteString) is it true that ByteStrings created in this fashion (i.e., with packCString) will not be garbage collected? -Iavor

iavor.diatchki:
Hello,
On 1/28/07, Duncan Coutts
wrote: We can put them in the IO monad. The same applies to packCStringLen and packMallocCString.
Adding packCString (and friends) to the IO monad does not make them any safer (the IO monad hides some sins but not all ;-). To see the problem, imagine that packCString was of type "CString -> IO ByteString" (as Don suggested). Then we still get the same weird behavior if we replace: let s = packCString x with s <- packCString x in the example that I posted.
The problem is that 'x' allows us to mutate the memory area occupied by s' but Haskell values are supposed to be immutable. So if we want to have these functions, then it seems that the best we can do is to mark them with the "unsafe" label and be very careful how we use them...
Out of curiosity (I have not looked at the implementation of ByteString) is it true that ByteStrings created in this fashion (i.e., with packCString) will not be garbage collected?
packCString :: CString -> IO ByteString packCString cstr = do fp <- newForeignPtr_ (castPtr cstr) l <- c_strlen cstr return $! PS fp 0 (fromIntegral l) The string is managed on the CString side. So however that CString is managed (maybe its on the Haskell heap, maybe its on the C side, maybe malloc looks after it). As indicated in the my api post, copyCString will be the default after today, which will just produce a normal Haskell value. unsafePackCString can then be left for those who know what they're doing. Does that sound reasonable? -- Don

Hello, I think that your proposal is reasonable. By the way there is also another option. The main point of packCString is to "convert" a CString to byte string, presumably so that we can use the byte string operations on it. An alternative would be to provide the byte string operations directly on CStrings (perhaps with overloading, perhaps not). Then we would not need packCString, and then when working with ByteStrings we would know that they are safe, but when we work with CStrings, we would have to watch out, as usual. Just a thought. -Iavor

iavor.diatchki:
Hello, The "packCString" function (and other similar functions) in the ByteString library break referential transperancy, which is one of the big selling points of Haskell (and its libraries). Here is an example:
main = do x <- newCString "Hello" let s = packCString x h1 = B.head s print s -- print h1 poke x (toEnum 97) print s let h2 = B.head s print h1 print h2 Output: "Hello" "aello" 97 97
This is already confusing because the "pure" value 's' has magically
Right, in order to support zero-copy string between C and Haskell, for efficiency, the default packCString just wraps up the Ptr CChar to look like a ByteString. It doesn't copy it. If you the mutate the C String, well, as you see above. Here you should be using: copyCString :: CString -> IO ByteString The functions should document this behaviour better. Of course, you're paying with poke and C strings so you should be careful anyway. I'll correct the documentation to explain all this. Thanks for noticing! -- Don

The functions should document this behaviour better. Of course, you're
dons@cse.unsw.edu.au wrote: paying with poke and C strings so you
should be careful anyway. I'll correct the documentation to explain all this.
Forgive a non-experts comment: This approach ("read the doco and be careful when you use these functions") feels contrary to the haskell ethos of pure code being reliably pure. What's the argument for not prefixing such functions with "unsafe"? Tim

timd:
dons@cse.unsw.edu.au wrote:
The functions should document this behaviour better. Of course, you're paying with poke and C strings so you should be careful anyway. I'll correct the documentation to explain all this.
Forgive a non-experts comment:
This approach ("read the doco and be careful when you use these functions") feels contrary to the haskell ethos of pure code being reliably pure. What's the argument for not prefixing such functions with "unsafe"?
No no. The types should certainly reflect the safety. And by default people should try the safe versions :) Its a tricy area to get exactly right, since we're straddling the C / Haskell interface here. * The current api: Zero-copying, efficient. Unsafe if you mutate the C string, in the case of packMallocCString, if you run the finalisers twice: packCString :: CString -> ByteString packCStringLen :: CStringLen -> ByteString packMallocCString :: CString -> ByteString useAsCStringLen :: ByteString -> (CStringLen -> IO a) -> IO a Safe. Copies the ByteString: copyCString :: CString -> IO ByteString copyCStringLen :: CStringLen -> IO ByteString useAsCString :: ByteString -> (CString -> IO a) -> IO a * The proposed api: The following will be safe, copying across the C/Haskell boundary by default: packCString :: CString -> IO ByteString packCStringLen :: CStringLen -> IO ByteString packMallocCString :: CString -> IO ByteString useAsCString :: ByteString -> (CString -> IO a) -> IO a useAsCStringLen :: ByteString -> (CStringLen -> IO a) -> IO a Those with extra efficiency needs, but also extra safety requirements: unsafePackCString :: CString -> IO ByteString unsafePackCStringLen :: CStringLen -> IO ByteString unsafePackMallocCString :: CString -> IO ByteString unsafeUseAsCString :: ByteString -> (CString -> IO a) -> IO a unsafeUseAsCStringLen :: ByteString -> (CStringLen -> IO a) -> IO a Which will all do zero-copying calls acroos into C (or Haskell), but you'll need to ensure you don't mutate the C string. The safety requirements for the unsafe* versions will be clearly stated. -- Don

Some function in the interface *MUST* be tagged unsafe. The current situation is very anti-Haskell. I don't care how efficient it is. First and foremost it must have pure Haskell semantics, otherwise it doesn't belong as a pure function. -- Lennart On Jan 28, 2007, at 23:58 , Tim Docker wrote:
dons@cse.unsw.edu.au wrote:
The functions should document this behaviour better. Of course, you're paying with poke and C strings so you should be careful anyway. I'll correct the documentation to explain all this.
Forgive a non-experts comment:
This approach ("read the doco and be careful when you use these functions") feels contrary to the haskell ethos of pure code being reliably pure. What's the argument for not prefixing such functions with "unsafe"?
Tim _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

iavor.diatchki:
Hello, The "packCString" function (and other similar functions) in the ByteString library break referential transperancy, which is one of the big selling points of Haskell (and its libraries).
The Data.ByteString functions relating to CString have now been modified as follows, in the darcs repository. These changes will be propogated into base in due course. Public CString functions: Data.ByteString: packCString :: CString -> IO ByteString packCStringLen :: CStringLen -> IO ByteString useAsCString :: ByteString -> (CString -> IO a) -> IO a useAsCStringLen :: ByteString -> (CStringLen -> IO a) -> IO a These are safe, copying functions. Never can modifying the CString affect the Haskell ByteString, or any substrings of it. Private, unsafe functions, only available by importing Data.ByteString.Base: Dangerous, efficient api, suitable for constant CStrings only (the CString functions may also require null termination): unsafeUseAsCString :: ByteString -> (CString -> IO a) -> IO a unsafeUseAsCStringLen :: ByteString -> (CStringLen -> IO a) -> IO a unsafePackCString :: CString -> IO ByteString unsafePackCStringLen :: CStringLen -> IO ByteString unsafePackMallocCString :: CString -> IO ByteString The documentation has also been extensively revised. In particular, all unsafe functions contain text explaining in what way they are unsafe. For example: unsafeUseAsCString :: ByteString -> (CString -> IO a) -> IO a O(1) construction Use a ByteString with a function requiring a CString. This function does zero copying, and merely unwraps a ByteString to appear as a CString. It is unsafe in two ways: * After calling this function the CString shares the underlying byte buffer with the original ByteString. Thus modifying the CString, either in C, or using poke, will cause the contents of the ByteString to change, breaking referential transparency. Other ByteStrings created by sharing (such as those produced via take or drop) will also reflect these changes. Modifying the CString will break referential transparency. To avoid this, use useAsCString, which makes a copy of the original ByteString. * CStrings are often passed to functions that require them to be null-terminated. If the original ByteString wasn't null terminated, neither will the CString be. It is the programmers responsibility to guarantee that the ByteString is indeed null terminated. If in doubt, use useAsCString. The plain old Data.ByteString CString api should now be safe from FFI manipulation. Note that Iavor's original demo looks like: import qualified Data.ByteString as B import Data.ByteString (packCString) import Foreign.C.String import Foreign main = do x <- newCString "Hello" s <- packCString x let h1 = B.head s print s poke x (toEnum 97) print s let h2 = B.head s print h1 print h2 And now produces: $ runhaskell iavor.hs "Hello" "Hello" 72 72 No more ghostly telekinesis from the CString side! Thanks to everyone for feedback and criticism. -- Don

Hello Donald, Tuesday, January 30, 2007, 6:08:50 AM, you wrote:
The Data.ByteString functions relating to CString have now been modified as follows, in the darcs repository. These changes will be propogated into base in due course.
i.e. only in ghc 6.8. i use this chance to remind about our suggestion to remove ByteString from Base in ghc 6.6.1 and include it as *separate* library. it will make upgrading possible and will conform with the situation in ghc 6.4 and, i hope, ghc 6.8. now one forced to use different .cabal files for 6.4 and 6.6 just to include/exclude this lib -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin
The Data.ByteString functions relating to CString have now been modified as follows, in the darcs repository. These changes will be propogated into base in due course.
i.e. only in ghc 6.8. i use this chance to remind about our suggestion to remove ByteString from Base in ghc 6.6.1 and include it as *separate* library.
I'd like to echo this. The Data.ByteString library is wonderful, but its API is still changing fast. It was definitely a mistake to include it in the 'base' package so early. Furthermore, users of ghc cannot even override the 'base' version by adding "-package fps" on the command-line. (Well, you can, but only if your importing module does not use any of the other 'base' package libraries. Is that right?) Regards, Malcolm

I am not sure this is actually an issue. if you are willing to use 'poke' then _nothing_ preserves referential transparency in Haskell, you can arbitrarily overwrite various parts of the heap. Of course anything that writes to memory directly is unsafe, but 'poke' is the unsafe thing here, not packCString. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (10)
-
Bulat Ziganshin
-
Chris Kuklewicz
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
Iavor Diatchki
-
John Meacham
-
Lennart Augustsson
-
Malcolm Wallace
-
Stefan O'Rear
-
Tim Docker