Re: Proposal: add ByteString support to unix:System.Posix.IO API

On 2/28/11 12:28 PM, Johan Tibell wrote:
On Mon, Feb 28, 2011 at 9:20 AM, Bryan O'Sullivan
wrote: On Sun, Feb 27, 2011 at 9:18 PM, wren ng thornton
wrote: I've written variants of the System.Posix.IO API for strict and lazy ByteStrings which are currently lingering in their own unpublished package[1]. It's silly to have a separate package for just four
functions,
so I'd like to see it combined into the unix package. I don't have access to create a new ticket with the patch, but it's available at [2].
I think that the strict bytestring version should just replace the current function in System.Posix.IO, and the lazy version should not go in at all. This is the approach taken by the network package, and it's cleanest.
Thanks for working on this Wren.
I agree with Bryan's point.
So then what would become of the string variants? Backwards compatibility and all... not _everyone_ uses ByteStrings yet. Also, FWIW, the real impetus behind writing these was to deal with lazy ByteStrings, since that's what one of my dependencies uses[1]. It seems like a common enough situation that it should be handled directly. Forcing people to call (BS.concat . BL.toChunks) first muddies up the code and introduces additional copying, whereas making people use (mapM_ fdWrite . BL.toChunks) is still rather messy and it's buggy in the face of partial writes. For network usage I agree that the strict-only approach is clean and effective. For general purpose writing to Fds, I'm not so sure. [1] http://hackage.haskell.org/packages/archive/protocol-buffers/1.8.0/doc/html/... -- Live well, ~wren

On Mon, Feb 28, 2011 at 2:57 PM, wren ng thornton
So then what would become of the string variants? Backwards compatibility and all... not _everyone_ uses ByteStrings yet.
If you need the String API, specify a dependency on a version of the package that still uses String. Also, FWIW, the real impetus behind writing these was to deal with lazy
ByteStrings, since that's what one of my dependencies uses[1].
What you really really want, then, is fdWriteAll, which repeatedly uses writev to write all of a list of strict (or a single lazy) bytestring to an fd, no? That's why I added sendAll to network-bytestring, because it's absolutely the common case that you want to write everything you can to a file descriptor, and not have to worry about short writes.
It seems like a common enough situation that it should be handled directly. Forcing people to call (BS.concat . BL.toChunks) first muddies up the code and introduces additional copying, whereas making people use (mapM_ fdWrite . BL.toChunks) is still rather messy and it's buggy in the face of partial writes.
Well, wait. That paragraph portion seems to confuse things. At the very least, it confuses me, because I surely didn't suggest using concat, as that would be silly. I want to see four entry points for writing: fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO () People would normally use the 'All' variants, but there are times when you really do want to know if you've performed a short write so that you can handle it yourself.

On 3/1/11 1:35 AM, Bryan O'Sullivan wrote:
I want to see four entry points for writing:
fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO ()
People would normally use the 'All' variants, but there are times when you really do want to know if you've performed a short write so that you can handle it yourself.
What's an "iovec"? -- Live well, ~wren

On Tue, Mar 1, 2011 at 1:29 AM, wren ng thornton
What's an "iovec"?

Hi, On 01/03/2011 06:35, Bryan O'Sullivan wrote:
Well, wait. That paragraph portion seems to confuse things. At the very least, it confuses me, because I surely didn't suggest using concat, as that would be silly. I want to see four entry points for writing:
fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO ()
People would normally use the 'All' variants, but there are times when you really do want to know if you've performed a short write so that you can handle it yourself.
What's the practical difference between a (lazy) list of strict bytestrings and a lazy bytestring? Cheers, Ganesh

ganesh:
Hi,
On 01/03/2011 06:35, Bryan O'Sullivan wrote:
Well, wait. That paragraph portion seems to confuse things. At the very least, it confuses me, because I surely didn't suggest using concat, as that would be silly. I want to see four entry points for writing:
fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO ()
People would normally use the 'All' variants, but there are times when you really do want to know if you've performed a short write so that you can handle it yourself.
What's the practical difference between a (lazy) list of strict bytestrings and a lazy bytestring?
Nothing. Lazy ByteStrings are slightly more efficient (avoid one indirection from the (:) cell).

On Tuesday 01 March 2011 23:51:05, Ganesh Sittampalam wrote:
What's the practical difference between a (lazy) list of strict bytestrings and a lazy bytestring?
Lazy ByteStrings are head-strict. I don't know whether that's relevant here, though.

On 3/1/11 6:01 PM, Daniel Fischer wrote:
On Tuesday 01 March 2011 23:51:05, Ganesh Sittampalam wrote:
What's the practical difference between a (lazy) list of strict bytestrings and a lazy bytestring?
Lazy ByteStrings are head-strict. I don't know whether that's relevant here, though.
Or rather, element-strict. But I don't think that matters here, since writing requires element strictness anyways. It may be significant if I write new versions to use writev for lists/lazys instead of calling write repeatedly, since we'd have to force all elements before calling into C. But then writev would require forcing the spine anyways (to get the length in chunks) so it wouldn't affect asymptotic behavior or memory retention. -- Live well, ~wren

On 3/1/11 10:00 PM, wren ng thornton wrote:
On 3/1/11 6:01 PM, Daniel Fischer wrote:
On Tuesday 01 March 2011 23:51:05, Ganesh Sittampalam wrote:
What's the practical difference between a (lazy) list of strict bytestrings and a lazy bytestring?
Lazy ByteStrings are head-strict. I don't know whether that's relevant here, though.
Or rather, element-strict. But I don't think that matters here, since writing requires element strictness anyways.
It may be significant if I write new versions to use writev for lists/lazys instead of calling write repeatedly, since we'd have to force all elements before calling into C. But then writev would require forcing the spine anyways (to get the length in chunks) so it wouldn't affect asymptotic behavior or memory retention.
Where "it" in the last sentence means "forcing all the elements". -- Live well, ~wren

On 3/1/11 1:35 AM, Bryan O'Sullivan wrote:
I want to see four entry points for writing:
fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO ()
Using writev requires the length of the list in order to get a count of chunks, which forces us to hold the whole list/lazy-bytestring in memory at once and also adds O(n) time for traversing it. Also it'd require converting each of the ByteString structs into iovec structs (whereas using write allows this to be unpacked into the call frames for write). What's the benefit of doing this? Is writev that much more efficient than Haskell code with the same semantics[1]?
People would normally use the 'All' variants, but there are times when you really do want to know if you've performed a short write so that you can handle it yourself.
What are the desired semantics for the All variants? Should it retry, or fail? How many times should it retry? etc. For manual recovery from partial writes, it would be better to have the basic function be: fdWriteFoo :: [Strict.ByteString] -- or Lazy.ByteString, whichever -> IO -- The total count of bytes written ( ByteCount -- The remaining content, with the first chunk already -- accounting for the last partial write (by adjusting -- the ByteString's offset). , [Strict.ByteString] ) So that the head of the lazy bytestring can be garbage collected and so you don't have to traverse it again to figure out where printing left off. Or actually, we'd want: fdWriteBar :: [Strict.ByteString] -> IO -- The total count of bytes written ( ByteCount -- The count of bytes written from the first chunk of -- the remaining content. , ByteCount -- The remaining content, with the first chunk not -- accounting for the last partial write (use the second -- ByteCount to account for it). , [Strict.ByteString] ) When using a lazy bytestring for the input then this latter version would be silly. However, with a list of bytestrings, there can be semantics encoded into how the string is chopped up and we shouldn't corrupt that information by adjusting any of the chunks. This is one reason why using lazy bytestrings gives cleaner semantics. [1] fdWrite :: Fd -> BL.ByteString -> IO ByteCount fdWrite fd = go 0 where -- We want to do a left fold in order to avoid stack overflows, -- but we need to have an early exit for incomplete writes -- (which normally requires a right fold). Hence this recursion. go acc BLI.Empty = return acc go acc (BLI.Chunk c cs) = do rc <- PosixBS.fdWrite fd c let acc' = acc+rc in acc' `seq` do if rc == fromIntegral (BS.length c) then go acc' cs else return acc' -- Live well, ~wren

On Tue, Mar 1, 2011 at 9:23 PM, wren ng thornton
On 3/1/11 1:35 AM, Bryan O'Sullivan wrote:
I want to see four entry points for writing:
fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO ()
Using writev requires the length of the list in order to get a count of chunks, which forces us to hold the whole list/lazy-bytestring in memory at once and also adds O(n) time for traversing it. Also it'd require converting each of the ByteString structs into iovec structs (whereas using write allows this to be unpacked into the call frames for write).
What's the benefit of doing this? Is writev that much more efficient than Haskell code with the same semantics[1]?
The benefit of using 'writev' over multiple calls to 'write' is the 'writev' is frequently a single kernel call - avoiding multiple context switches. Whether or not it is worth it to hang on to the entire bytestring to get this advantage probably depends on the circumstance. Antoine

On 3/2/11 1:58 AM, Antoine Latter wrote:
On Tue, Mar 1, 2011 at 9:23 PM, wren ng thornton
wrote: On 3/1/11 1:35 AM, Bryan O'Sullivan wrote:
I want to see four entry points for writing:
fdWrite :: Strict.ByteString -> IO Int fdWriteAll :: Strict.ByteString -> IO () fdWritev :: [Strict.ByteString] -> IO Int -- turn the list into an iovec, then call writev fdWritevAll :: [Strict.ByteString] -> IO ()
Using writev requires the length of the list in order to get a count of chunks, which forces us to hold the whole list/lazy-bytestring in memory at once and also adds O(n) time for traversing it. Also it'd require converting each of the ByteString structs into iovec structs (whereas using write allows this to be unpacked into the call frames for write).
What's the benefit of doing this? Is writev that much more efficient than Haskell code with the same semantics[1]?
The benefit of using 'writev' over multiple calls to 'write' is the 'writev' is frequently a single kernel call - avoiding multiple context switches.
Ah, of course.
Whether or not it is worth it to hang on to the entire bytestring to get this advantage probably depends on the circumstance.
In that case I'd suggest having both writev and iterated write versions of handling lazy/lists-of bytestrings. I think the increased API size would be worth it to cover both situations. -- Live well, ~wren

On 28/02/2011 22:57, wren ng thornton wrote:
On 2/28/11 12:28 PM, Johan Tibell wrote:
On Mon, Feb 28, 2011 at 9:20 AM, Bryan O'Sullivan
wrote: On Sun, Feb 27, 2011 at 9:18 PM, wren ng thornton
wrote: I've written variants of the System.Posix.IO API for strict and lazy ByteStrings which are currently lingering in their own unpublished package[1]. It's silly to have a separate package for just four
functions,
so I'd like to see it combined into the unix package. I don't have access to create a new ticket with the patch, but it's available at [2].
I think that the strict bytestring version should just replace the current function in System.Posix.IO, and the lazy version should not go in at all. This is the approach taken by the network package, and it's cleanest.
Thanks for working on this Wren.
I agree with Bryan's point.
So then what would become of the string variants? Backwards compatibility and all... not _everyone_ uses ByteStrings yet.
The unix package is tied to GHC releases, where the usual convention is to DEPRECATE for one major release (~ 12 months), then remove in the next release. Cheers, Simon
Also, FWIW, the real impetus behind writing these was to deal with lazy ByteStrings, since that's what one of my dependencies uses[1]. It seems like a common enough situation that it should be handled directly. Forcing people to call (BS.concat . BL.toChunks) first muddies up the code and introduces additional copying, whereas making people use (mapM_ fdWrite . BL.toChunks) is still rather messy and it's buggy in the face of partial writes.
For network usage I agree that the strict-only approach is clean and effective. For general purpose writing to Fds, I'm not so sure.
[1] http://hackage.haskell.org/packages/archive/protocol-buffers/1.8.0/doc/html/...

On 3/2/11 8:49 AM, Simon Marlow wrote:
On 28/02/2011 22:57, wren ng thornton wrote:
On 2/28/11 12:28 PM, Johan Tibell wrote:
On Mon, Feb 28, 2011 at 9:20 AM, Bryan O'Sullivan
wrote: I think that the strict bytestring version should just replace the current function in System.Posix.IO, and [...]
So then what would become of the string variants? Backwards compatibility and all... not _everyone_ uses ByteStrings yet.
The unix package is tied to GHC releases, where the usual convention is to DEPRECATE for one major release (~ 12 months), then remove in the next release.
I'm fine with that. I don't particularly care what happens to the string functions, I'd just like to see the bytestring versions incorporated. But that does raise the issue: if we are to (eventually) remove the string versions and put the bytestring versions in situ, then how should the migration proceed? I'd suggest putting the functions in System.Posix.IO.ByteString for the interim; this has the benefit that I could release a compatibility library allowing people to upgrade without changing their GHC, if need be. But then what happens after the interim? We can't just replace the deprecated string versions with the bytestring versions directly, can we? So would we then deprecate System.Posix.IO.ByteString (another major release...) or just keep it around and have System.Posix.IO re-export it? This is part of why I wasn't suggesting to remove the string functions, much as we'd like people to migrate. -- Live well, ~wren

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 3/2/11 21:38 , wren ng thornton wrote:
But that does raise the issue: if we are to (eventually) remove the string versions and put the bytestring versions in situ, then how should the migration proceed? I'd suggest putting the functions in System.Posix.IO.ByteString for the interim; this has the benefit that I could release a compatibility library allowing people to upgrade without changing their GHC, if need be.
It occurs to me that, if we're going to remove String versions, it should happen everywhere and in a coordinated way. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery.b@gmail.com system administrator [openafs,heimdal,too many hats] kf8nh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1vyAYACgkQIn7hlCsL25USmACeJJn+QpMZJa3c5Nmf0TAxC/5R dVkAn0CI5LrxgjVc6ZugdXebPzZu87J9 =dtDw -----END PGP SIGNATURE-----

On Thu, Mar 3, 2011 at 5:55 PM, Brandon S Allbery KF8NH
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 3/2/11 21:38 , wren ng thornton wrote:
But that does raise the issue: if we are to (eventually) remove the string versions and put the bytestring versions in situ, then how should the migration proceed? I'd suggest putting the functions in System.Posix.IO.ByteString for the interim; this has the benefit that I could release a compatibility library allowing people to upgrade without changing their GHC, if need be.
It occurs to me that, if we're going to remove String versions, it should happen everywhere and in a coordinated way.
What is the benefit from removing String versions (as opposed to just adding ByteString (and/or Text) ones alongside them)?
- -- brandon s. allbery [linux,solaris,freebsd,perl] allbery.b@gmail.com system administrator [openafs,heimdal,too many hats] kf8nh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk1vyAYACgkQIn7hlCsL25USmACeJJn+QpMZJa3c5Nmf0TAxC/5R dVkAn0CI5LrxgjVc6ZugdXebPzZu87J9 =dtDw -----END PGP SIGNATURE-----
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Work is punishment for failing to procrastinate effectively.

2011/3/3 Bryan O'Sullivan
2011/3/3 Gábor Lehel
What is the benefit from removing String versions (as opposed to just adding ByteString (and/or Text) ones alongside them)?
Strings are unicode. A file descriptor handles bytes.
I understand, and agree that ByteStrings are more appropriate. I'm not particularly attached to Strings. It might nonetheless make sense to keep the String versions around for the convenience of people who happen to have gotten a String from somewhere (there are, after all, quite a few such places), and wish to output it to a file descriptor. -- Work is punishment for failing to procrastinate effectively.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 3/3/11 13:18 , Gábor Lehel wrote:
keep the String versions around for the convenience of people who happen to have gotten a String from somewhere (there are, after all, quite a few such places), and wish to output it to a file descriptor.
Which was my point. If you're going to deprecate Strings, do so across the board in an organized manner --- not via ad hoc deprecations to individual libraries. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery.b@gmail.com system administrator [openafs,heimdal,too many hats] kf8nh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1v5OQACgkQIn7hlCsL25Vc+gCfTcVoIf6e1gS8MOWXTqrNNg28 mN0Anjw9RCwJr3tudomvP8/HNkkUZ+2o =tRUw -----END PGP SIGNATURE-----

2011/3/3 Brandon S Allbery KF8NH
Which was my point. If you're going to deprecate Strings, do so across the board in an organized manner --- not via ad hoc deprecations to individual libraries.
The only problem with that is that ad hoc deprecations are achievable by individual library maintainers, while a "stop-the-world" approach won't, I think, work.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 3/3/11 14:19 , Bryan O'Sullivan wrote:
2011/3/3 Brandon S Allbery KF8NH
mailto:allbery.b@gmail.com> Which was my point. If you're going to deprecate Strings, do so across the board in an organized manner --- not via ad hoc deprecations to individual libraries.
The only problem with that is that ad hoc deprecations are achievable by individual library maintainers, while a "stop-the-world" approach won't, I think, work.
Isn't this what the Haskell Platform process is for? Individual libraries, yes, but as part of an overarching plan so you don't have e.g. network desupporting String while some other library decides to hold onto them as default a bit longer. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery.b@gmail.com system administrator [openafs,heimdal,too many hats] kf8nh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk1wQAsACgkQIn7hlCsL25XzUgCgrWbum/11DA8f4YrBS26hn6dI 8h0AniUs5rf6/+gdB269dQLGpWmYX4uX =Y2FV -----END PGP SIGNATURE-----

On Mar 3, 2011, at 8:27 PM, Brandon S Allbery KF8NH wrote:
Isn't this what the Haskell Platform process is for? Individual libraries, yes, but as part of an overarching plan so you don't have e.g. network desupporting String while some other library decides to hold onto them as default a bit longer.
I think that's a bit stronger onus on the platform process than intended. Certainly the platform discussion should have some strong advisory weight on decisions of maintainers, but package maintainers have the real final say. And in the case of strings/bytestrings, I think the issue is minor anyway, since converting between the two is so simple. What we can and should expect is that individual libraries don't just ax the string functions abruptly, but provide some measured process of deprecation. Which is, I think, precisely what's being posed in this case. Cheers, Sterl.

We don't need to stop-the-world, but we can at least agree a general
policy in advance of starting to make such changes.
In this specific case the unicode/byte mismatch means there's a strong
case for doing it anyway; but one might argue for replacing String with
[Char8] instead of removing it completely. Again, if we had a generally
agreed approach to what "string" types to support, we could justify
individual decisions like this easily by reference to that policy.
________________________________
From: libraries-bounces@haskell.org
[mailto:libraries-bounces@haskell.org] On Behalf Of Bryan O'Sullivan
Sent: 03 March 2011 19:20
To: Brandon S Allbery KF8NH
Cc: libraries@haskell.org
Subject: Re: Proposal: add ByteString support to unix:System.Posix.IO
API
2011/3/3 Brandon S Allbery KF8NH
participants (11)
-
Antoine Latter
-
Brandon S Allbery KF8NH
-
Bryan O'Sullivan
-
Daniel Fischer
-
Don Stewart
-
Ganesh Sittampalam
-
Gábor Lehel
-
Simon Marlow
-
Sittampalam, Ganesh
-
Sterling Clover
-
wren ng thornton