Proposal: ByteString based datagram communication (Ticket #1238 )

I've made a proposal to add ByteString based datagram communication to Network.Socket and Network. Details are at: http://hackage.haskell.org/trac/ghc/ticket/1238#preview I rushed to get this done before I go on a trip tomorrow so I haven't completed testing and won't be available to discuss it for the next 9 days. As such, if discussion is needed, an extended deadline would be appreciated. Testing windows is a bit awkward for me since I don't have a windows machine, so if anyone can test that platform I'd be very appreciative. I'll try to work through the problems I was having with hugs and test when I get back unless someone else wants to test it first. Thanks. -- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

rob:
I've made a proposal to add ByteString based datagram communication to Network.Socket and Network. Details are at:
http://hackage.haskell.org/trac/ghc/ticket/1238#preview
I rushed to get this done before I go on a trip tomorrow so I haven't completed testing and won't be available to discuss it for the next 9 days. As such, if discussion is needed, an extended deadline would be appreciated.
Testing windows is a bit awkward for me since I don't have a windows machine, so if anyone can test that platform I'd be very appreciative. I'll try to work through the problems I was having with hugs and test when I get back unless someone else wants to test it first.
Thanks.
I'd quickly note that you might also want to check the bytestring level stuff in HAppS/network-alt and in HaskellNet. -- Don

I had a quick look through but other than some DNS support in HAppS, could only see support for Stream based (TCP) communication. The DNS code is done through Network.Socket rather than providing a higher-level interface similar to what is available for Stream communication in Network as I've tried to provide in this patch. My hope is that my patch allows things such as DNS to be implemented more easily and uses lazy ByteStrings to allow easy integration with the Binary package. Using ByteStreams for Stream based communication with the existing Network library seems fairly trivial. My concern is that there is no support for Datagrams (especially UDP), and the functions named in a way that would imply packet-based communication such as Datagrams to most network people (sendTo and recvFrom) are instead throw away TCP utilities not intended for "real work". On Wed, 2007-03-21 at 09:44 +1100, Donald Bruce Stewart wrote:
rob:
I've made a proposal to add ByteString based datagram communication to Network.Socket and Network. Details are at:
http://hackage.haskell.org/trac/ghc/ticket/1238#preview
I rushed to get this done before I go on a trip tomorrow so I haven't completed testing and won't be available to discuss it for the next 9 days. As such, if discussion is needed, an extended deadline would be appreciated.
Testing windows is a bit awkward for me since I don't have a windows machine, so if anyone can test that platform I'd be very appreciative. I'll try to work through the problems I was having with hugs and test when I get back unless someone else wants to test it first.
Thanks.
I'd quickly note that you might also want to check the bytestring level stuff in HAppS/network-alt and in HaskellNet.
-- Don
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

Hi all I'm back and am keen to work on getting this patch accepted. Concerns I've so far seen raised include 1. bos: It breaks the existing stable API I need more information on this; I can't see what's broken unless the change of sendTo and recvFrom to datagram functions is what is considered broken. I'd argue the current sendTo and recvFrom functions are what is broken in terms of usefulness and how their functionality fits their names. This patch fixes that. 2. bos: It restricts to AF_INET As far as my testing has shown it should also work with AF_UNIX. To get it working with AF_UNIX the patch also includes a bugfix in Network.Socket. As such, the patch should hopefully reduce the current AF_INET restriction. 3. dons: Similar work has been done on HAppS and HaskellNet I'm keen to integrate any ideas from these platforms. I'm discussing it with S. Alexander Jacobson. If anyone can outline some good ideas from these platforms that should be included in the patch I'll gladly look into it. However, It's meant to be a simple change to make sendTo and recvFrom act as expected of network utilities with those names while providing convenient and efficient datagram utilities similar to what exists for stream based connections rather than add too many features. 4. me: It's not tested as well as I'd like I'm having trouble testing with hugs. runhugs -98 ./Setup.hs build returns 'ERROR "dist/build/Network/BSD.hs" - Can't find imported module "GHC.IOBase"'. Something screwy with hsc2hs or cpphs? I don't have windows so still haven't tested that. Can any Windows users help please? On Wed, 2007-03-21 at 00:17 +0900, Robert Marlow wrote:
I've made a proposal to add ByteString based datagram communication to Network.Socket and Network. Details are at:
http://hackage.haskell.org/trac/ghc/ticket/1238#preview
I rushed to get this done before I go on a trip tomorrow so I haven't completed testing and won't be available to discuss it for the next 9 days. As such, if discussion is needed, an extended deadline would be appreciated.
Testing windows is a bit awkward for me since I don't have a windows machine, so if anyone can test that platform I'd be very appreciative. I'll try to work through the problems I was having with hugs and test when I get back unless someone else wants to test it first.
Thanks.
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

On Thu, Apr 05, 2007 at 06:14:25PM +0900, Robert Marlow wrote:
4. me: It's not tested as well as I'd like I'm having trouble testing with hugs. runhugs -98 ./Setup.hs build returns 'ERROR "dist/build/Network/BSD.hs" - Can't find imported module "GHC.IOBase"'. Something screwy with hsc2hs or cpphs?
I think it's because the GHC version of hsc2hs uses ghc as its C compiler, which sets __GLASGOW_HASKELL__. You should be able to work around this by adding --with-hsc2hs=/usr/local/bin/hsc2hs-hugs or similar to setup configure when building for Hugs.

Gah, I tried that a couple of times and it made no difference. I tried it again after getting your email and it did something different. Your email must have been the magic key it was waiting for. Computers can be mysterious. Anyway, the difference unfortunately isn't that it now works. Instead it spits the same error it gave me when I tried running hsc2hs-hugs on BSD.hsc directly: $ runhugs -98 ./Setup.hs build Preprocessing library network-2.0... Network/BSD_hsc_make.c:1:49: error: /usr/share/hsc2hs-0.67/template-hsc.h: No such file or directory BSD.hsc: In function ‘main’: BSD.hsc:578: warning: incompatible implicit declaration of built-in function ‘printf’ BSD.hsc:571: error: ‘stdout’ undeclared (first use in this function) BSD.hsc:571: error: (Each undeclared identifier is reported only once BSD.hsc:571: error: for each function it appears in.) BSD.hsc:150: error: expected expression before ‘struct’ BSD.hsc:151: error: expected expression before ‘struct’ BSD.hsc:154: error: expected expression before ‘struct’ BSD.hsc:155: error: expected expression before ‘struct’ BSD.hsc:254: error: expected expression before ‘struct’ BSD.hsc:255: error: expected expression before ‘struct’ BSD.hsc:265: error: expected expression before ‘struct’ BSD.hsc:348: error: expected expression before ‘struct’ BSD.hsc:349: error: expected expression before ‘struct’ BSD.hsc:352: error: expected expression before ‘struct’ BSD.hsc:354: error: expected expression before ‘struct’ BSD.hsc:453: error: expected expression before ‘struct’ BSD.hsc:454: error: expected expression before ‘struct’ BSD.hsc:457: error: expected expression before ‘struct’ BSD.hsc:458: error: expected expression before ‘struct’ compiling Network/BSD_hsc_make.c failed command was: /usr/bin/gcc -c -I/usr/lib/hugs/include -Iinclude Network/BSD_hsc_make.c -o Network/BSD_hsc_make.o ./Setup.hs: got error code while preprocessing: Network.BSD Any ideas? On Thu, 2007-04-05 at 10:26 +0100, Ross Paterson wrote:
On Thu, Apr 05, 2007 at 06:14:25PM +0900, Robert Marlow wrote:
4. me: It's not tested as well as I'd like I'm having trouble testing with hugs. runhugs -98 ./Setup.hs build returns 'ERROR "dist/build/Network/BSD.hs" - Can't find imported module "GHC.IOBase"'. Something screwy with hsc2hs or cpphs?
I think it's because the GHC version of hsc2hs uses ghc as its C compiler, which sets __GLASGOW_HASKELL__. You should be able to work around this by adding --with-hsc2hs=/usr/local/bin/hsc2hs-hugs or similar to setup configure when building for Hugs.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

On Thu, Apr 05, 2007 at 06:57:24PM +0900, Robert Marlow wrote:
Anyway, the difference unfortunately isn't that it now works. Instead it spits the same error it gave me when I tried running hsc2hs-hugs on BSD.hsc directly:
$ runhugs -98 ./Setup.hs build Preprocessing library network-2.0... Network/BSD_hsc_make.c:1:49: error: /usr/share/hsc2hs-0.67/template-hsc.h: No such file or directory
How odd. It works for me from a vanilla hugs98-Sep2006 build. Are you using a pre-packaged version?

Ok, I seem to have managed to get hugs to install the network library. Ross: thanks, it was indeed a pre-packaged version (for debian). Compiling hugs from scratch seemed to help. I've now tested the patch against hugs and can confirm its compatibility. For windows I've only managed to get as far as confirming compilation. -- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

you might want to consider any API consequences of supporting DCCP http://en.wikipedia.org/wiki/Datagram_Congestion_Control_Protocol I am not saying full DCCP support is needed off the bat, but it would be nice if at some point in the future the API could be extended gracefully to support it as well as UDP as far as unreliable datagram style communication goes. John -- John Meacham - ⑆repetae.net⑆john⑈

I think this is a bit of a sidetrack from the goal of this patch but I agree; It would definitely be nice to be able to easily extend the API in such a manner. My goal in this particular patch was to give a set of (primarily UDP) datagram functions with an interface analogous to what existed for the stream (TCP) functions. TCP and UDP are currently the primary network technologies used today. Though DCCP (and ICMP) would be nice too. My knee-jerk opinion is that the current use of HostName and PortID already assume IPv4 protocols such as TCP and UDP more than they could. Unix socket functions for example make no use of the HostName argument at all and the UnixSocket constructor just doesn't sound like it fits the name of the type "PortID" to me. Consequently, I think making the Network module more flexible for extension would involve getting rid of the current addressing scheme and implementing some new Address type which includes not just the address, but the protocol used. So something like data Address = TCP HostName PortNumber | UDP HostName PortNumber | DCCP HostName PortNumber | UnixSocket FilePath -- Maybe also inet6: | TCP6 HostName6 PortNumber | UDP6 HostName6 PortNumber and so forth. Of course this is beyond the scope of this patch. On Thu, 2007-04-05 at 02:32 -0700, John Meacham wrote:
you might want to consider any API consequences of supporting DCCP
http://en.wikipedia.org/wiki/Datagram_Congestion_Control_Protocol
I am not saying full DCCP support is needed off the bat, but it would be nice if at some point in the future the API could be extended gracefully to support it as well as UDP as far as unreliable datagram style communication goes.
John
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

Robert Marlow wrote:
My knee-jerk opinion is that the current use of HostName and PortID already assume IPv4 protocols such as TCP and UDP more than they could.
They have TCP baked in, but not IPv4. I posted a patch yesterday that generalises the functions in Network to work with IPv6 as well as v4, and it required no API changes.
Consequently, I think making the Network module more flexible for extension would involve getting rid of the current addressing scheme and implementing some new Address type which includes not just the address, but the protocol used.
I don't think there's much point in this. The Network module is not especially well put together, but it's at least stable. It would be more profitable to work on the network-alt package or something else instead of trying to remould the existing API, while breaking it in the process.

Hello Robert, Thursday, April 5, 2007, 1:14:25 PM, you wrote:
1. bos: It breaks the existing stable API I need more information on this; I can't see what's broken unless the change of sendTo and recvFrom to datagram functions is what is considered broken. I'd argue the current sendTo and recvFrom functions are what is broken in terms of usefulness and how their functionality fits their names. This patch fixes that.
but why you provide ByteString-only API?? i think that more common idiom is to provide String functions here and use somewhat like Network.ByteString, Network.ByteString.Lazy modules to provide ByteString/ByteStringLazy equivalents of String function from Network.hs -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi Bulat On Thu, 2007-04-05 at 15:08 +0400, Bulat Ziganshin wrote:
but why you provide ByteString-only API?? i think that more common idiom is to provide String functions here and use somewhat like Network.ByteString, Network.ByteString.Lazy modules to provide ByteString/ByteStringLazy equivalents of String function from Network.hs
Mostly because I wanted ByteStrings so that's what I implemented :) Good point though. I've uploaded a replacement patch changing the Network functions to use String and adding Network.ByteString and Network.ByteString.Lazy. Thanks for the suggestion. Ideally it'd be nice if this all used some sort of String typeclass interface. But this should do for now. -- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

On Fri, 2007-04-06 at 00:26 +0900, Robert Marlow wrote:
Hi Bulat
On Thu, 2007-04-05 at 15:08 +0400, Bulat Ziganshin wrote:
but why you provide ByteString-only API?? i think that more common idiom is to provide String functions here and use somewhat like Network.ByteString, Network.ByteString.Lazy modules to provide ByteString/ByteStringLazy equivalents of String function from Network.hs
Mostly because I wanted ByteStrings so that's what I implemented :)
Good point though. I've uploaded a replacement patch changing the Network functions to use String and adding Network.ByteString and Network.ByteString.Lazy. Thanks for the suggestion.
I'm not sure this really makes sense. In most situations there is an obvious candidate amongst String, strict ByteString and lazy ByteString. In this case, for datagram communication the obvious choice is indeed strict ByteString. Correct me if I'm wrong but datagrams are relatively small contiguous chunks and they arrive in our memory space all in one go. So they are not at all like a continuous stream of data which is what a lazy ByteString models. So there would never be any advantage to using a lazy ByteString in this case, it would always just have one chunk. Similarly, for String, one has to go via a strict contiguous chunk representation in the first place so any String interface would be a trivial wrapper on a ByteString representation. Remember that the types are trivially inter-convertible with a single function call[1]. I'm not sure that we need two whole extra module to replace a single pack/unpack call in a calling module. It's exactly this kind of thing that makes me worry about people creating a Stringlike class. By passing the operations in via a class rather than converting representations on the boundary we are in danger of loosing all the performance benefits we were after in the first place. I'm sure it makes more sense to provide a class to give us a string equivalent of fromIntegral. That way operations that want to provide an api that works on any string can chose the best internal representation and just use the conversion on the boundary. That way we only need to inline the conversion into the calling program to make it fast. As with fromIntegral, that conversion can often be optimised or turned into a no-op. For performance, class dictionary use should be kept as near to the 'surface' as possible. For example, consider this standard List module function: elemIndex :: Eq a => a -> [a] -> Maybe Int elemIndex x = findIndex (x==) This is not a naive definition. It is very cunning. If we wrote a full version of elemIndex in the style of findIndex but using == at the appropriate point then to optimise uses of elemIndex where we know the particular Eq class instance we'd have to inline the whole of elemIndex. This isn't a tiny amount of code and GHC is normally disinclined to do that. So we'd end up passing an Eq dictionary. Disaster! Instead, with the above definition we've lifted the use of the class right to the surface. Now elemIndex looks tiny and ghc will inline it in the calling context where we know the Eq instance. So now we just build a little specialised (x==) function and make a call to the findIndex function. So we get minimal code duplication and pretty fast results. And all this happens without having to bludgeon the compiler with INLINE or SPECIALISE pragmas. In other words it works just fine on ordinary user code. Ok, enough ranting. Duncan [1] Well two to get between strict and lazy bytestrings, but that's kind of deliberate to encourage people to think twice about doing that

I suspected in my sleepy state last night that overwriting all instances of my original patch with this new one would probably come back to bite me. I probably shouldn't have ignored that gut instinct. Thanks for the rant, Duncan. That clears up a bunch of things for me regarding how ByteStrings are intended to be used. I had written the original patch with lazy bytestrings because I had integration with Data.Binary in mind and wanted to make it as simple for that as possible. But I do see the merit in your argument for just using strict ByteStrings; it didn't feel quite right making trivial conversions between strict and lazy bytestrings when I was doing it. So then. If I revert to a patch similar to the one I originally had only using strict ByteStrings would it raise any further concerns (besides out-of-scope concerns such as a nice coherent string class interface). -- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

Hello Duncan, Friday, April 6, 2007, 3:52:51 AM, you wrote:
In most situations there is an obvious candidate amongst String, strict ByteString and lazy ByteString. In this case, for datagram communication the obvious choice is indeed strict ByteString.
you see from the POV of *implementor* and for this case you are probably perfectly right. but from POV of user, in most cases, speed doesn't matter and i want to use just the types what are most convenient for me. if i work with strings here, i want to be able to get strings from any sources and send strings to any receivers. learning which libs provide string api, which ByteString one and so on is not interesting, adding conversions between all those types clutters the code it seems easier for me to just import Network if i want to use standard string type, or Network.ByteString which, i know, provides exactly the same operations, only on ByteStrings instead of memoizing which operations was easier to implement in which type due to some internal reasons just imagine that we got to exclude concat :: [ByteString] -> ByteString operation because it's natural return type is lazy ByteString so for me the best variant is to *implement* these operations using strict ByteStrings and provide wrappers which deal with other types automatic conversion of arguments and results using typeclass is really bad idea, i agree/ may be, it would become better with some kind of defaulting, where, say, lazy UTF8-encoded FPS will be default type for such class -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

I see your point, but if we extend this line of reasoning, every time a new String-based API is created, the author may be looking at having to provide 3-5 separate APIs to handle all the (current) String types available. I think Duncan's proposal of a solution analogous to fromIntegral is correct. Your arguments are applicable to APIs which use Int rather than Integer as a datatype. We don't provide Data.List.Integer for Integer based indexing because we have a simple conversion mechanism in fromIntegral. Likewise, I think the solution for simplifying String interfaces is to provide some sort of convertString utility (such as http://hpaste.org/1276 ). With convertString anybody who wants a String or lazy ByteString interface to Network can simply write their own wrappers using that utility without too much difficulty. As can be seen in the current patch, even without convertString, the wrappers are fairly trivial. Such trivial wrappers don't really warrant cluttering the API so much. On Fri, 2007-04-06 at 18:28 +0400, Bulat Ziganshin wrote:
you see from the POV of *implementor* and for this case you are probably perfectly right. but from POV of user, in most cases, speed doesn't matter and i want to use just the types what are most convenient for me. if i work with strings here, i want to be able to get strings from any sources and send strings to any receivers. learning which libs provide string api, which ByteString one and so on is not interesting, adding conversions between all those types clutters the code
it seems easier for me to just import Network if i want to use standard string type, or Network.ByteString which, i know, provides exactly the same operations, only on ByteStrings instead of memoizing which operations was easier to implement in which type due to some internal reasons
just imagine that we got to exclude concat :: [ByteString] -> ByteString operation because it's natural return type is lazy ByteString
so for me the best variant is to *implement* these operations using strict ByteStrings and provide wrappers which deal with other types
automatic conversion of arguments and results using typeclass is really bad idea, i agree/ may be, it would become better with some kind of defaulting, where, say, lazy UTF8-encoded FPS will be default type for such class
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

Disclaimer: I'm not trying to raise convertString as a potential patch right now. I mention it merely to indicate that a convenient interface for Network belongs in a more general string solution and is outside the scope of this patch. On Sat, 2007-04-07 at 10:35 +0900, Robert Marlow wrote:
Likewise, I think the solution for simplifying String interfaces is to provide some sort of convertString utility (such as http://hpaste.org/1276 ).
With convertString anybody who wants a String or lazy ByteString interface to Network can simply write their own wrappers using that utility without too much difficulty.
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

I've reverted this patch to be similar to the original one posted (ie, with just Network instead of Network.ByteString*) only using strict ByteStrings instead of lazy ones. Any further concerns about this patch? -- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

There doesn't seem to be any further concerns about this patch and it has been tested against ia32 linux ghc and hugs. Can this patch now be accepted or can somebody test it against windows? On Wed, 2007-03-21 at 00:17 +0900, Robert Marlow wrote:
I've made a proposal to add ByteString based datagram communication to Network.Socket and Network. Details are at:
http://hackage.haskell.org/trac/ghc/ticket/1238#preview
I rushed to get this done before I go on a trip tomorrow so I haven't completed testing and won't be available to discuss it for the next 9 days. As such, if discussion is needed, an extended deadline would be appreciated.
Testing windows is a bit awkward for me since I don't have a windows machine, so if anyone can test that platform I'd be very appreciative. I'll try to work through the problems I was having with hugs and test when I get back unless someone else wants to test it first.
Thanks.
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

Robert Marlow wrote:
There doesn't seem to be any further concerns about this patch and it has been tested against ia32 linux ghc and hugs. Can this patch now be accepted or can somebody test it against windows?
For a single patch, it does rather a lot of different things. It should at least be split into four different patches. Also, am I not mistaken, or does it not change the existing API? -sendTo, -- :: HostName -> PortID -> String -> IO () +sendTo, -- :: HostName -> PortID -> B.ByteString -> IO Socket +sendTo_, -- :: HostName -> PortID -> B.ByteString -> IO ()

The old API is considered only useful for "testing" anyway, since it uses hGetContents and consequently can cause open socket leaks. This patch is intended to make those functions more useful and closer to what a network programmer might expect from a functions named sendTo / recvFrom. How do you think the patch should be broken up and why? On Wed, 2007-05-16 at 23:02 -0700, Bryan O'Sullivan wrote:
Robert Marlow wrote:
There doesn't seem to be any further concerns about this patch and it has been tested against ia32 linux ghc and hugs. Can this patch now be accepted or can somebody test it against windows?
For a single patch, it does rather a lot of different things. It should at least be split into four different patches.
Also, am I not mistaken, or does it not change the existing API?
-sendTo, -- :: HostName -> PortID -> String -> IO () +sendTo, -- :: HostName -> PortID -> B.ByteString -> IO Socket +sendTo_, -- :: HostName -> PortID -> B.ByteString -> IO ()
-- Robert Marlow MITS Co-operative Limited http://www.mits.coop/

Robert Marlow wrote:
The old API is considered only useful for "testing" anyway, since it uses hGetContents and consequently can cause open socket leaks. This patch is intended to make those functions more useful and closer to what a network programmer might expect from a functions named sendTo / recvFrom.
That's fair enough, but this kind of rationale belongs in the header of the patch that makes that change, so that people won't be scratching their heads, wondering what happened.
How do you think the patch should be broken up and why?
The description of the patch says that it does four different things. That's your cue :-)

On Thu, 2007-05-17 at 07:50 -0700, Bryan O'Sullivan wrote:
That's fair enough, but this kind of rationale belongs in the header of the patch that makes that change, so that people won't be scratching their heads, wondering what happened.
Oh! Thanks for pointing that out. Fixed.
The description of the patch says that it does four different things. That's your cue :-)
Oh I see what you mean. They're all related to the same change. I've changed the patch header to try to explain that too. The overall goal of fixing up sendTo / recvFrom in Network is simpler to achieve with the listed changes. Given those additional changes wouldn't be terribly important if it weren't for the the implementation of datagram communication in Network, I don't think it's a good idea to split the patch up. -- Robert Marlow MITS Co-operative Limited http://www.mits.coop/
participants (7)
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
John Meacham
-
Robert Marlow
-
Ross Paterson