expose strlen from Foreign.C.String

I've wanted the following before: foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int# cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s) A natural place for this seems to be Foreign.C.String. Thoughts?

Seems reasonable to me.
On Thu, 21 Jan 2021 at 03:55, chessai
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Thoughts? _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1): 9.0.1-notes.rst: ``ghc-prim`` library 9.0.1-notes.rst: ~~~~~~~~~~~~~~~~~~~~ 9.0.1-notes.rst: 9.0.1-notes.rst: - Add a known-key ``cstringLength#`` to ``GHC.CString`` that is eligible 9.0.1-notes.rst: for constant folding by a built-in rule. ghc-prim/changelog.md: - Add known-key `cstringLength#` to `GHC.CString`. This is just the ghc-prim/changelog.md: C function `strlen`, but a built-in rewrite rule allows GHC to ghc-prim/changelog.md: compute the result at compile time when the argument is known. CString.hs: -- | Compute the length of a NUL-terminated string. This address CString.hs: -- must refer to immutable memory. GHC includes a built-in rule for CString.hs: -- constant folding when the argument is a statically-known literal. CString.hs: -- That is, a core-to-core pass reduces the expression CString.hs: -- @cstringLength# "hello"#@ to the constant @5#@. CString.hs: cstringLength# :: Addr# -> Int# CString.hs: {-# INLINE[0] cstringLength# #-} CString.hs: cstringLength# = c_strlen Which is in turn re-exported by GHC.Exts: GHC/Exts.hs: -- * CString GHC/Exts.hs: unpackCString#, GHC/Exts.hs: unpackAppendCString#, GHC/Exts.hs: unpackFoldrCString#, GHC/Exts.hs: unpackCStringUtf8#, GHC/Exts.hs: unpackNBytes#, GHC/Exts.hs: cstringLength#, It is perhaps somewhat disappointing that the cstringLength# optimisations for `bytestring` (in master) aren't included in the `bytestring` version in 9.0.1. -- Viktor.

I forgot about that addition. In that case we would just need the lifted
wrapper
On Wed, Jan 20, 2021, 17:01 Viktor Dukhovni
On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1):
9.0.1-notes.rst: ``ghc-prim`` library 9.0.1-notes.rst: ~~~~~~~~~~~~~~~~~~~~ 9.0.1-notes.rst: 9.0.1-notes.rst: - Add a known-key ``cstringLength#`` to ``GHC.CString`` that is eligible 9.0.1-notes.rst: for constant folding by a built-in rule.
ghc-prim/changelog.md: - Add known-key `cstringLength#` to `GHC.CString`. This is just the ghc-prim/changelog.md: C function `strlen`, but a built-in rewrite rule allows GHC to ghc-prim/changelog.md: compute the result at compile time when the argument is known.
CString.hs: -- | Compute the length of a NUL-terminated string. This address CString.hs: -- must refer to immutable memory. GHC includes a built-in rule for CString.hs: -- constant folding when the argument is a statically-known literal. CString.hs: -- That is, a core-to-core pass reduces the expression CString.hs: -- @cstringLength# "hello"#@ to the constant @5#@. CString.hs: cstringLength# :: Addr# -> Int# CString.hs: {-# INLINE[0] cstringLength# #-} CString.hs: cstringLength# = c_strlen
Which is in turn re-exported by GHC.Exts:
GHC/Exts.hs: -- * CString GHC/Exts.hs: unpackCString#, GHC/Exts.hs: unpackAppendCString#, GHC/Exts.hs: unpackFoldrCString#, GHC/Exts.hs: unpackCStringUtf8#, GHC/Exts.hs: unpackNBytes#, GHC/Exts.hs: cstringLength#,
It is perhaps somewhat disappointing that the cstringLength# optimisations for `bytestring` (in master) aren't included in the `bytestring` version in 9.0.1.
-- Viktor. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Jan 21, 2021, at 1:39 AM, chessai
wrote: On Wed, Jan 20, 2021, 17:01 Viktor Dukhovni
wrote: On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1):
I forgot about that addition. In that case we would just need the lifted wrapper
No worries, sure the lifted wrapper makes sense, and Foreign.C.String does look like a reasonable place in which to define, and from which to export it. -- Viktor.

Both the unboxed variant and the wrapper are only sound on primitive string literals. You cannot use them on anything that was allocated at runtime, only on stuff baked into the rodata section. This is a pretty onerous restriction. What use case did you have in mind? Sent from my iPhone
On Jan 20, 2021, at 11:02 PM, Viktor Dukhovni
wrote:
On Jan 21, 2021, at 1:39 AM, chessai
wrote: On Wed, Jan 20, 2021, 17:01 Viktor Dukhovni wrote: On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1):
I forgot about that addition. In that case we would just need the lifted wrapper
No worries, sure the lifted wrapper makes sense, and Foreign.C.String does look like a reasonable place in which to define, and from which to export it.
-- Viktor.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

That doesn't sound right. I don't think it allocates any data on the heap
which could cause reallocation and move an unpinned ByteArray#, which is
the only way I can think it would be unsafe.
On Thu, Jan 21, 2021, 17:50 Andrew Martin
Both the unboxed variant and the wrapper are only sound on primitive string literals. You cannot use them on anything that was allocated at runtime, only on stuff baked into the rodata section. This is a pretty onerous restriction. What use case did you have in mind?
Sent from my iPhone
On Jan 20, 2021, at 11:02 PM, Viktor Dukhovni
wrote:
On Jan 21, 2021, at 1:39 AM, chessai
wrote: On Wed, Jan 20, 2021, 17:01 Viktor Dukhovni On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1):
I forgot about that addition. In that case we would just need the
wrote: lifted wrapper
No worries, sure the lifted wrapper makes sense, and Foreign.C.String does look like a reasonable place in which to define, and from which to export it.
-- Viktor.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

This is unsound: x <- malloc ... memcpy ... copy a nul-terminated string into x let len = cstringLength x free x Because GHC can float the let binding down to where it is used after free. Sent from my iPhone
On Jan 21, 2021, at 7:45 PM, Zemyla
wrote: That doesn't sound right. I don't think it allocates any data on the heap which could cause reallocation and move an unpinned ByteArray#, which is the only way I can think it would be unsafe.
On Thu, Jan 21, 2021, 17:50 Andrew Martin
wrote: Both the unboxed variant and the wrapper are only sound on primitive string literals. You cannot use them on anything that was allocated at runtime, only on stuff baked into the rodata section. This is a pretty onerous restriction. What use case did you have in mind? Sent from my iPhone
On Jan 20, 2021, at 11:02 PM, Viktor Dukhovni
wrote:
On Jan 21, 2021, at 1:39 AM, chessai
wrote: On Wed, Jan 20, 2021, 17:01 Viktor Dukhovni wrote: On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1):
I forgot about that addition. In that case we would just need the lifted wrapper
No worries, sure the lifted wrapper makes sense, and Foreign.C.String does look like a reasonable place in which to define, and from which to export it.
-- Viktor.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

andrew! this is a really good point. would the with# or touch# combinators
be needed to fix it (to force gc liveness?)? OR would we need to have the
foreign c call defined to have an -> IO result, then use unsafePerformIO
to "purefy it correctly"?
i think the best way to explain *why* the proposed definition runs into
trouble is to look at how we annotate delicate/complicated prims in primops
are annotated
https://gitlab.haskell.org/ghc/ghc/-/blob/4bb9a349b5d002463b9fc4e9a3b6dbf77e...
otoh, the last time i was playing with an ostensibly pure primop that had
really delicate effect ordering, the prefetch stuff in the NCG,
my conclusion was that it *needed* explicit state tokens to make sure it
didn't get reordered, and for this primop that pure version would need to
be via unsafeperformio i think
On Fri, Jan 22, 2021 at 8:46 AM Andrew Martin
This is unsound:
x <- malloc ... memcpy ... copy a nul-terminated string into x let len = cstringLength x free x
Because GHC can float the let binding down to where it is used after free.
Sent from my iPhone
On Jan 21, 2021, at 7:45 PM, Zemyla
wrote: That doesn't sound right. I don't think it allocates any data on the heap which could cause reallocation and move an unpinned ByteArray#, which is the only way I can think it would be unsafe.
On Thu, Jan 21, 2021, 17:50 Andrew Martin
wrote: Both the unboxed variant and the wrapper are only sound on primitive string literals. You cannot use them on anything that was allocated at runtime, only on stuff baked into the rodata section. This is a pretty onerous restriction. What use case did you have in mind?
Sent from my iPhone
On Jan 20, 2021, at 11:02 PM, Viktor Dukhovni
wrote:
On Jan 21, 2021, at 1:39 AM, chessai
wrote: On Wed, Jan 20, 2021, 17:01 Viktor Dukhovni On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
Why a new FFI call, rather than `cstringLength#` from ghc-prim: GHC.CString (as of GHC 9.0.1):
I forgot about that addition. In that case we would just need the
wrote: lifted wrapper
No worries, sure the lifted wrapper makes sense, and Foreign.C.String does look like a reasonable place in which to define, and from which to export it.
-- Viktor.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Fri, Jan 22, 2021 at 08:45:54AM -0500, Andrew Martin wrote:
x <- malloc ... memcpy ... copy a nul-terminated string into x let len = cstringLength x free x
Isn't this broadly true for general uses of CString? Which is why we have `withCString`: https://hackage.haskell.org/package/base-4.14.1.0/docs/Foreign-C-String.html... Is there any particularly different about the proposed `cstringLength`? Are you suggesting that it should have an "IO Int" result type to force sequencing? Is this warranted? Shouldn't users of CString (Ptr CChar) be already aware of the liveness issue in general. -- Viktor.

Are you suggesting that it should have an "IO Int" result type to force sequencing? Is this warranted?
Yes. This is warranted. That's why Foreign.Storable.peek has IO in its
result type. On any CString with a finite lifetime, it is necessary to
sequence any reads and writes, and IO is the way this is done in base. By
contrast, on a CString that is both immutable and has an infinite lifetime,
we do not need to sequence reads. What kinds of CStrings fit the bill? Only
those backed by primitive string literals. So, for example, if you have:
myString :: CString
myString = Ptr "foobar"#
Since, myString is backed by something in the rodata section of a binary
(meaning that it will never change and it will never be deallocated), then
we do not care if reads get floated around. There are no functions in base
for unsequenced reads, but in primitive, you'll find
Data.Primitive.Ptr.indexOffPtr, which is unsequenced. So something like
this would be ok:
someOctet :: Word8
someOctet = Data.Primitive.Ptr.indexOffPtr myString 3
The cstringLength# in GHC.CString is similar to indexOffPtr. In fact, it
could be implemented using indexOffPtr. The reason that cstringLength#
exists (and in base of all places) is so that a built-in rewrite rule
perform this transformation:
cstringLength "foobar"#
==>
6#
This will eventually be used to great effect in bytestring. See
https://github.com/haskell/bytestring/pull/191.
To get back to the original question, I think that any user-facing
cstringLength
function should probably be:
cstringLength :: CString -> IO Int
We need a separate FFI call that returns its result in IO to accomplish
this. But
this just be done in base rather than ghc-prim. There are no interesting
rewrite
rules that exist for such a function.
On Fri, Jan 22, 2021 at 3:31 PM Viktor Dukhovni
On Fri, Jan 22, 2021 at 08:45:54AM -0500, Andrew Martin wrote:
x <- malloc ... memcpy ... copy a nul-terminated string into x let len = cstringLength x free x
Isn't this broadly true for general uses of CString? Which is why we have `withCString`:
https://hackage.haskell.org/package/base-4.14.1.0/docs/Foreign-C-String.html...
Is there any particularly different about the proposed `cstringLength`?
Are you suggesting that it should have an "IO Int" result type to force sequencing? Is this warranted? Shouldn't users of CString (Ptr CChar) be already aware of the liveness issue in general.
-- Viktor. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
-- -Andrew Thaddeus Martin

On Fri, Jan 22, 2021 at 04:56:33PM -0500, Andrew Martin wrote:
This will eventually be used to great effect in bytestring. See https://github.com/haskell/bytestring/pull/191.
Yes, you might recall that I'm well aware of that (already merged) PR, indeed that's how I happened to recall that cstringLength# is present in 9.0.
To get back to the original question, I think that any user-facing cstringLength function should probably be:
cstringLength :: CString -> IO Int
We need a separate FFI call that returns its result in IO to accomplish this. But this just be done in base rather than ghc-prim. There are no interesting rewrite rules that exist for such a function.
So I guess your suggestion in response to @chessai's original post:
On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
would be to instead directly implement the lifted FFI variant: foreign import ccall unsafe "strlen" cstringLength :: CString -> IO Int which probably would not need a wrapper and can be exported directly. module Main (main) where import Control.Monad ( (>=>) ) import Foreign.C.String (CString, withCString) foreign import ccall unsafe "strlen" cstringLength :: CString -> IO Int main :: IO () main = withCString "Hello, World!" $ cstringLength >=> print The cost of this safety net is that it results in more sequencing than is strictly necessary. It is enough for the enclosing IO action to not embed the length in its result in some not yet fully evaluated thunk. I guess @chessai can let us know whether the more strictly sequenced variant meets his needs. -- Viktor.

I agree with Andrew, let's just export the lifted ffi call
This suits my needs, but, regardless of my needs, seems like a perfectly
sensible addition to Foreign.C.String
Concrete addition:
foreign import unsafe "strlen"
cstringLength :: CString -> IO Int
On Fri, Jan 22, 2021, 17:09 Viktor Dukhovni
On Fri, Jan 22, 2021 at 04:56:33PM -0500, Andrew Martin wrote:
This will eventually be used to great effect in bytestring. See https://github.com/haskell/bytestring/pull/191.
Yes, you might recall that I'm well aware of that (already merged) PR, indeed that's how I happened to recall that cstringLength# is present in 9.0.
To get back to the original question, I think that any user-facing cstringLength function should probably be:
cstringLength :: CString -> IO Int
We need a separate FFI call that returns its result in IO to accomplish this. But this just be done in base rather than ghc-prim. There are no interesting rewrite rules that exist for such a function.
So I guess your suggestion in response to @chessai's original post:
On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
would be to instead directly implement the lifted FFI variant:
foreign import ccall unsafe "strlen" cstringLength :: CString -> IO Int
which probably would not need a wrapper and can be exported directly.
module Main (main) where import Control.Monad ( (>=>) ) import Foreign.C.String (CString, withCString)
foreign import ccall unsafe "strlen" cstringLength :: CString -> IO Int
main :: IO () main = withCString "Hello, World!" $ cstringLength >=> print
The cost of this safety net is that it results in more sequencing than is strictly necessary. It is enough for the enclosing IO action to not embed the length in its result in some not yet fully evaluated thunk.
I guess @chessai can let us know whether the more strictly sequenced variant meets his needs.
-- Viktor. _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

I’m on board with this import, but we’ll need to get the type right if we’re going to bind to libc’s strlen directly foreign import unsafe "strlen" cstringLength :: CString -> IO CSize
On Jan 22, 2021, at 6:04 PM, chessai
wrote: I agree with Andrew, let's just export the lifted ffi call
This suits my needs, but, regardless of my needs, seems like a perfectly sensible addition to Foreign.C.String
Concrete addition:
foreign import unsafe "strlen" cstringLength :: CString -> IO Int
On Fri, Jan 22, 2021, 17:09 Viktor Dukhovni
mailto:ietf-dane@dukhovni.org> wrote: On Fri, Jan 22, 2021 at 04:56:33PM -0500, Andrew Martin wrote: This will eventually be used to great effect in bytestring. See https://github.com/haskell/bytestring/pull/191 https://github.com/haskell/bytestring/pull/191.
Yes, you might recall that I'm well aware of that (already merged) PR, indeed that's how I happened to recall that cstringLength# is present in 9.0.
To get back to the original question, I think that any user-facing cstringLength function should probably be:
cstringLength :: CString -> IO Int
We need a separate FFI call that returns its result in IO to accomplish this. But this just be done in base rather than ghc-prim. There are no interesting rewrite rules that exist for such a function.
So I guess your suggestion in response to @chessai's original post:
On Wed, Jan 20, 2021 at 09:54:30AM -0800, chessai wrote:
I've wanted the following before:
foreign import ccall unsafe "strlen" cstringLength# :: Addr# -> Int#
cstringLength :: CString -> Int cstringLength (Ptr s) = I# (cstringLength# s)
A natural place for this seems to be Foreign.C.String.
would be to instead directly implement the lifted FFI variant:
foreign import ccall unsafe "strlen" cstringLength :: CString -> IO Int
which probably would not need a wrapper and can be exported directly.
module Main (main) where import Control.Monad ( (>=>) ) import Foreign.C.String (CString, withCString)
foreign import ccall unsafe "strlen" cstringLength :: CString -> IO Int
main :: IO () main = withCString "Hello, World!" $ cstringLength >=> print
The cost of this safety net is that it results in more sequencing than is strictly necessary. It is enough for the enclosing IO action to not embed the length in its result in some not yet fully evaluated thunk.
I guess @chessai can let us know whether the more strictly sequenced variant meets his needs.
-- Viktor. _______________________________________________ Libraries mailing list Libraries@haskell.org mailto:Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Fri, Jan 22, 2021 at 06:07:22PM -0800, Eric Mertens wrote:
I’m on board with this import, but we’ll need to get the type right if we’re going to bind to libc’s strlen directly
foreign import unsafe "strlen" cstringLength :: CString -> IO CSize
Yes, definitely. The final all-nits-addressed variant would be: foreign import ccall unsafe "string.h strlen" cstringLength :: CString -> IO CSize which is differs from the example in section 8.4.3 of the Haskell 2010 report https://www.haskell.org/onlinereport/haskell2010/haskellch8.html#x15-1590008... foreign import ccall "string.h strlen" cstrlen :: Ptr CChar -> IO CSize only in the addition of "unsafe" and the name of the resulting function. -- Viktor.
participants (7)
-
Andrew Martin
-
Carter Schonwald
-
chessai
-
Eric Mertens
-
George Wilson
-
Viktor Dukhovni
-
Zemyla