Proposal: add ByteString support to unix:System.Posix.IO API

Hello all, I've written variants of the System.Posix.IO API for strict and lazy ByteStrings which are currently lingering in their own unpublished package[1]. It's silly to have a separate package for just four functions, so I'd like to see it combined into the unix package. I don't have access to create a new ticket with the patch, but it's available at [2]. Discussion time: 2 weeks. [1] darcs: http://community.haskell.org/~wren/bytestring-unix haddock: http://community.haskell.org/~wren/bytestring-unix/dist/doc/html/bytestring-... [2] Darcs patch: http://community.haskell.org/~wren/bytestring-unix/bytestring-unix-0.1.0.dpa... -- Live well, ~wren

On Sun, Feb 27, 2011 at 9:18 PM, wren ng thornton
I've written variants of the System.Posix.IO API for strict and lazy ByteStrings which are currently lingering in their own unpublished package[1]. It's silly to have a separate package for just four functions, so I'd like to see it combined into the unix package. I don't have access to create a new ticket with the patch, but it's available at [2].
I think that the strict bytestring version should just replace the current function in System.Posix.IO, and the lazy version should not go in at all. This is the approach taken by the network package, and it's cleanest.

On Mon, Feb 28, 2011 at 11:20 AM, Bryan O'Sullivan
On Sun, Feb 27, 2011 at 9:18 PM, wren ng thornton
wrote: I've written variants of the System.Posix.IO API for strict and lazy ByteStrings which are currently lingering in their own unpublished package[1]. It's silly to have a separate package for just four functions, so I'd like to see it combined into the unix package. I don't have access to create a new ticket with the patch, but it's available at [2].
I think that the strict bytestring version should just replace the current function in System.Posix.IO, and the lazy version should not go in at all. This is the approach taken by the network package, and it's cleanest.
'network' has a lazy variant, which calls 'writev'. Antoine

On Mon, Feb 28, 2011 at 11:28 AM, Bryan O'Sullivan
On Mon, Feb 28, 2011 at 9:27 AM, Antoine Latter
wrote: 'network' has a lazy variant, which calls 'writev'.
That's been replaced in the git tree with 'sendMany', which takes a list of strict bytestrings.
That's likely a lot less spooky than making the caller know if it's safe to force the spine of a lazy bytestring. Antoine

On Mon, Feb 28, 2011 at 9:20 AM, Bryan O'Sullivan
On Sun, Feb 27, 2011 at 9:18 PM, wren ng thornton
wrote: I've written variants of the System.Posix.IO API for strict and lazy ByteStrings which are currently lingering in their own unpublished package[1]. It's silly to have a separate package for just four functions, so I'd like to see it combined into the unix package. I don't have access to create a new ticket with the patch, but it's available at [2].
I think that the strict bytestring version should just replace the current function in System.Posix.IO, and the lazy version should not go in at all. This is the approach taken by the network package, and it's cleanest.
Thanks for working on this Wren. I agree with Bryan's point. Johan

On 2/28/11 12:20 PM, Bryan O'Sullivan wrote:
I think that the strict bytestring version should just replace the current function in System.Posix.IO, and the lazy version should not go in at all. This is the approach taken by the network package, and it's cleanest.
I still see Network.Socket.ByteString.Lazy in the latest version network-2.3.0.2[1]. Am I missing something? [1] http://hackage.haskell.org/package/network -- Live well, ~wren

On Tue, Mar 1, 2011 at 1:28 AM, wren ng thornton
I still see Network.Socket.ByteString.Lazy in the latest version network-2.3.0.2[1]. Am I missing something?
It's gone from the development tree. https://github.com/haskell/network

When I wrote the original bytestring-unix v0.1.0 I tried to stick as close as possible to the System.Posix.IO interface. However, it seems that folks aren't too fond of that interface. So, for your consideration I announce: unix-bytestring v0.2.0 (note the name change) Darcs: http://community.haskell.org/~wren/unix-bytestring/ Haddock: http://community.haskell.org/~wren/unix-bytestring/dist/doc/html/unix-bytest... This currently consists of: * System.Posix.IO.ByteString * fdRead -- like System.Posix.IO.fdRead, the same as before. * fdWrite -- like System.Posix.IO.fdWriteBuf, the same as before. * fdWrites -- like the previous lazy bytestring implementation. It performs a write(2) call for each chunk, but it supports lazy streaming. * fdWritev -- Convert a list of bytestrings into a C array of iovec structs, and then perform a single writev(2) call. * System.Posix.IO.ByteString.Lazy -- I'm keeping this around for now, for my own convenience. * fdRead -- simple wrapper around System.Posix.IO.ByteString.fdRead, the same as before. * fdWrites -- like System.Posix.IO.ByteString.Writes, and essentially the same as before * fdWritev -- simple wrapper around System.Posix.IO.ByteString.fdWritev * System.Posix.Types.Iovec -- The data type representing a C struct iovec, plus some helper functions for converting between bytestrings and iovecs. I think the writev implementations are correct after some basic testing, though I haven't used hsc2hs nor writev before, so there may be some odd corner cases lurking in there. I'm not dead-set on the current API, though I've tried to keep it minimal and maximally expressive. Play around with it. Try to break it. Let me know what you'd still like to see changed. P.S., what the heck is readv(2) supposed to be good for? Does anyone want to see a binding for it? P.P.S, does anyone want to see pread(2) or pwrite(2) bindings? -- Live well, ~wren

On 5 March 2011 13:30, wren ng thornton
This currently consists of: * System.Posix.IO.ByteString * fdRead -- like System.Posix.IO.fdRead, the same as before.
What about changing this: fdRead :: Fd -> ByteCount -> IO (ByteString, ByteCount) to this: fdRead :: Fd -> ByteCount -> IO ByteString I agree that it's nice to be consistent with the String version but since that version might go away and since the length (O(1)) of the ByteString should always equal the ByteCount I think it's better to remove that invariant.
* System.Posix.IO.ByteString.Lazy -- I'm keeping this around for now, for my own convenience. * fdRead -- simple wrapper around System.Posix.IO.ByteString.fdRead, the same as before.
Although calculating the length of a lazy ByteString is O(n/c), since you always return a single chunk I think the same argument applies here. Nice work! Bas

On 3/5/11 10:03 AM, Bas van Dijk wrote:
On 5 March 2011 13:30, wren ng thornton
wrote: This currently consists of: * System.Posix.IO.ByteString * fdRead -- like System.Posix.IO.fdRead, the same as before.
What about changing this:
fdRead :: Fd -> ByteCount -> IO (ByteString, ByteCount)
to this:
fdRead :: Fd -> ByteCount -> IO ByteString
I agree that it's nice to be consistent with the String version but since that version might go away and since the length (O(1)) of the ByteString should always equal the ByteCount I think it's better to remove that invariant.
I even have a note in the source file to that effect :) One thing I considered was to offer both versions, one for compatibility with the old string versions (and for decluttering client code)[1] and then one that just gives the bytestring. Of course, then the issue is what to name them... So the big question is: how minimal should we be, vs how much in the way of convenience functions should we offer? Update: In v0.2.1 I added an fdReads function which lets you pass a predicate for determining whether to retry after incomplete reads. -- Live well, ~wren

On 6 March 2011 13:16, wren ng thornton
One thing I considered was to offer both versions, one for compatibility with the old string versions (and for decluttering client code)[1] and then one that just gives the bytestring. Of course, then the issue is what to name them...
So the big question is: how minimal should we be, vs how much in the way of convenience functions should we offer?
It would be great if you could analyse the direct reverse dependencies of unix to see how much fdRead is used and how much work it is to adapt packages to an fdRead which only returns the ByteString: http://bifunctor.homelinux.net/~roel/cgi-bin/hackage-scripts/revdeps/unix-2.... Bas

On 3/6/11 12:22 PM, Bas van Dijk wrote:
On 6 March 2011 13:16, wren ng thornton
wrote: One thing I considered was to offer both versions, one for compatibility with the old string versions (and for decluttering client code)[1] and then one that just gives the bytestring. Of course, then the issue is what to name them...
So the big question is: how minimal should we be, vs how much in the way of convenience functions should we offer?
It would be great if you could analyse the direct reverse dependencies of unix to see how much fdRead is used and how much work it is to adapt packages to an fdRead which only returns the ByteString:
http://bifunctor.homelinux.net/~roel/cgi-bin/hackage-scripts/revdeps/unix-2....
What an excellent idea! It looks like unix has about 234 direct revdeps. Of which, some hasty grepping detects only 26 files which may call fdRead or fdWrite. Namely: ./HFuse-0.2.3/System/Fuse.hsc ./bindings-bfd-0.2.0/src/Bindings/Bfd/Disasm.hs ./bindings-bfd-0.2.0/src/Bindings/Bfd/Symbol.hsc ./cautious-file-0.1.5/src/System/Posix/ByteLevel.hsc ./epoll-0.2.2/examples/Stdin.hs ./epoll-0.2.2/src/System/Linux/Epoll/Buffer.hs ./funion-0.0.2/Funion.hs ./halfs-0.2/Binary.hs ./halfs-0.2/System/RawDevice/File.hs ./iteratee-0.8.1.2/src/Data/Iteratee/IO/Fd.hs ./iteratee-0.8.1.2/src/Data/Iteratee/IO/Posix.hs ./iteratee-mtl-0.5.0.0/src/Data/Iteratee/IO/Fd.hs ./iteratee-mtl-0.5.0.0/src/Data/Iteratee/IO/Posix.hs ./lhc-0.10/lib/base/src/Control/Concurrent.hs ./lhc-0.10/lib/base/src/GHC/IO/FD.hs ./lhc-0.10/src/Grin/Eval/Primitives.hs ./lhc-0.10/src/Grin/Stage2/Backend/C.hs ./liboleg-2010.1.10.0/System/IterateeM.hs ./liboleg-2010.1.10.0/System/LowLevelIO.hs ./liboleg-2010.1.10.0/System/RandomIO.hs ./liboleg-2010.1.10.0/System/SysOpen.hs ./miniplex-0.3.4/lib/System/Miniplex/Sink.hs ./serialport-0.3.3/System/Hardware/Serialport/Posix.hsc ./vty-4.6.0.4/src/Graphics/Vty/LLInput.hs ./ztail-1.0.1/TailHandle.hs Of these, many are false positives due to some local function called fdReady (and ignored hereafter), many are false positives due to local definitions of fdRead, myfdRead, etc. (discussed later), and only about half a dozen appear to be direct calls to the unix package version. Of these half dozen files, most of the use sites completely ignore the ByteCount, two or three use it only to check (==0) which can be efficiently detected by either String or ByteString's null predicate, and only two of them appear to use it in any nontrivial way (e.g., returning it from the current function, or printing it). Thus, I conclude, nobody wants the ByteCount. The switch from String to ByteString would involve far more work than correcting literally a couple pattern matches per affected project. And less than half a dozen use sites across all of Hackage might consider calling BS.length to get at the information. Of the local definitions there were two classes I noticed. * The HFuse project defines their own bindings to the pread(2) and pwrite(2) functions, thus answering my question about whether anyone'd want them. They use them as handling ByteString buffers too, no less! * The liboleg, iteratee, and iteratee-mtl packages all have copies of their own versions of fdRead (apparently a cut&paste job). In the documentation they level numerous complaints against the unix package's version of fdRead. The two notable ones are (1) that fdRead allocates a new buffer every time it's called, this seems to be addressed by fdReadBuf but the packages haven't been updated to use it; and (2) that fdRead throws errors (at all, let alone on EOF). They'd prefer the type: myfdRead :: Fd -> Ptr CChar -> ByteCount -> IO (Either Errno ByteCount) Considering that fdReadBuf already addresses the first complaint, it seems that it might be worthwhile to provide a safe version of fdReadBuf which captures the error in an Either rather than throwing it (should be cheaper than throwing it and having a wrapper catch it and convert it into Errno?). Similarly, it may be worthwhile to have a version of fdRead which doesn't throw an exception on EOF, but just returns the empty ByteString instead. -- Live well, ~wren

On 3/7/11 5:52 PM, wren ng thornton wrote:
On 3/6/11 12:22 PM, Bas van Dijk wrote:
It would be great if you could analyse the direct reverse dependencies of unix to see how much fdRead is used and how much work it is to adapt packages to an fdRead which only returns the ByteString:
http://bifunctor.homelinux.net/~roel/cgi-bin/hackage-scripts/revdeps/unix-2....
What an excellent idea!
[...]
Thus, I conclude, nobody wants the ByteCount. The switch from String to ByteString would involve far more work than correcting literally a couple pattern matches per affected project. And less than half a dozen use sites across all of Hackage might consider calling BS.length to get at the information.
Of the local definitions there were two classes I noticed.
* The HFuse project defines their own bindings to the pread(2) and pwrite(2) functions, thus answering my question about whether anyone'd want them. They use them as handling ByteString buffers too, no less!
Adding these into the mix, I submit v0.3.0: darcs get http://community.haskell.org/~wren/unix-bytestring/ I've greatly fleshed out the functions in System.Posix.IO.ByteString: http://community.haskell.org/~wren/unix-bytestring/dist/doc/html/unix-bytest... Still no versions which return (Either Errno _) instead of throwing exceptions, but most of the rest is there. If anyone else has suggestions, feel free to let me know. There's starting to be enough there that it might be worth throwing it up on Hackage; so much for a simple patch to @unix@ :) -- Live well, ~wren

On March 5, 2011 07:30:24 wren ng thornton wrote:
P.S., what the heck is readv(2) supposed to be good for? Does anyone want to see a binding for it?
Strikes me as something you might want to use in code that is orientated towards working with chunks of data of a possibly fixed maximum size (i.e., a chunk orientated steaming environment). The readv call gives you the ability to fill a series of chunks (e.g., from a free chunk list or something) with one call into the kernel. This could be a significant improvement from being forced to fill them individually. Cheers! -Tyson

On 3/6/11 10:19 PM, Tyson Whitehead wrote:
On March 5, 2011 07:30:24 wren ng thornton wrote:
P.S., what the heck is readv(2) supposed to be good for? Does anyone want to see a binding for it?
Strikes me as something you might want to use in code that is orientated towards working with chunks of data of a possibly fixed maximum size (i.e., a chunk orientated steaming environment).
The readv call gives you the ability to fill a series of chunks (e.g., from a free chunk list or something) with one call into the kernel. This could be a significant improvement from being forced to fill them individually.
Doh. For some reason I overlooked the fact that each iovec carries its length, and was wondering how it "performs the same action, but scatters the input data into the @iovcnt@ buffers". Makes a bit more sense if it's just filling up a segmented buffer. I might as well give a raw binding for it and think about what a decent non-raw version should look like. -- Live well, ~wren
participants (6)
-
Antoine Latter
-
Bas van Dijk
-
Bryan O'Sullivan
-
Johan Tibell
-
Tyson Whitehead
-
wren ng thornton