Potential Network SIG

newer
Data.IntMap: intersection function...

older
darcs patch: Fix unportable use of...

Thomas DuBuisson

22 Aug 2009 22 Aug '09

4:49 a.m.

Hello All, If you are CCed it's because you are listed as a maintainer of a network-* package that I consider related to the Haskell network library. I'm hoping to roll much of the functionality of network-{bytestring, multicast, fancy} etc into a single package that the community will agree on (namely, "network"). Johan suggested starting a SIG to hammer out a design for a new Network API seeing as the current API, a straight-forward Berkeley binding, doesn't seem to please anyone in a Haskell context. If you want to partake then this e-mail if your heads up. If there is some formal method of setting up a Haskell SIG then please let me know. My thoughts on some important parts are below - I'm sure not everyone will agree as these thoughts directly contradict some designs found in current libraries. 1) Separate low level functions / bindings from high level / productive code by placing each in different modules. The low level bindings should remain available for those cases we fail to have the needed functionality in our high level packages. That said, I'm hoping to cover more than the 80% of users with any new design. 2) Maintain type safety by using type classes for most things. Unlike Network.Fancy and Network.Socket (which have IPv4 and IPv6 as constructors of the same data type), I think we should allow for the possibility that some users of the library will be limited to just one IP version without resorting to partial functions. I suggest type classes to cover this aspect (class Address, class Port, etc). 3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String. 4) Support more features Features such as Multicast, Header inclusion (IP_HDRINCL), address binding, etc. IOW, most the IP_ and SO_ options of socket (7) and ip (7) man pages. It would be rather nice if we were able to expose these in a friendly way - but with our cross platform concerns that might not be a good idea (e.g. I'm not familiar with windows). 5) Separate IO-less data declarations from IO and any platform dependent code in different packages. I've already got some work in this area via network-data. Not claiming its current design will stand the test of time, its just an example of keeping the data structures separate from any IO operations that will need platform specific work. 6) Integrate with the rest of hackage. This means instance of PrettyClass, Parsec, and Binary. I noticed a number of parsing utilities for IP addresses - lots of duplicated effort here. I'm currently looking at how the network library is being used - particularly when Network.Socket is invoked. So I guess I'll go sort through some of the source code for scurry, happstack, and adhoc-network. Thomas

Show replies by date

Duncan Coutts

22 Aug 22 Aug

2:43 p.m.

On Fri, 2009-08-21 at 21:49 -0700, Thomas DuBuisson wrote:

...

Hello All,

If you are CCed it's because you are listed as a maintainer of a network-* package that I consider related to the Haskell network library. I'm hoping to roll much of the functionality of network-{bytestring, multicast, fancy} etc into a single package that the community will agree on (namely, "network").

That sounds like an excellent idea. Go for it!

...

Johan suggested starting a SIG to hammer out a design for a new Network API seeing as the current API, a straight-forward Berkeley binding, doesn't seem to please anyone in a Haskell context. If you want to partake then this e-mail if your heads up. If there is some formal method of setting up a Haskell SIG then please let me know.

Your email suggesting it is the "formal" method. :-) So yes, get the network people together, talk things over for as long as you need and post your conclusions to this list so we know what is going on and get a chance to comment. If you end up needing some kind of transition or compatibility plan (e.g. if you end up changing APIs etc) then you'll want to talk to the Haskell Platform release team. The aim would be that if we need some transition then we plan it deliberately and get all platform packages to switch in one major release cycle. Duncan

taruti

6:26 p.m.

Excerpts from Thomas DuBuisson's message of Sat Aug 22 07:49:58 +0300 2009:

...

Johan suggested starting a SIG to hammer out a design for a new Network API seeing as the current API, a straight-forward Berkeley binding, doesn't seem to please anyone in a Haskell context. If you want to partake then this e-mail if your heads up. If there is some formal method of setting up a Haskell SIG then please let me know.

Sounds good, some notes below.

...

1) Separate low level functions / bindings from high level / productive code by placing each in different modules.

Good.

...

2) Maintain type safety by using type classes for most things.

Unlike Network.Fancy and Network.Socket (which have IPv4 and IPv6 as constructors of the same data type), I think we should allow for the possibility that some users of the library will be limited to just one IP version without resorting to partial functions. I suggest type classes to cover this aspect (class Address, class Port, etc).

The limitation to one IP-version is a runtime issue not a compile time one. All the platforms may support either IPv4 or IPv6, both or none depending on the machine the binary is run on. I think it should be possible to say "use v4", "use v6" and "use something" on all target platforms. A typeclass might work fine if designed correctly. Also a typeclass makes extensions easier if designed carefully.

...

3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Ambivalent here. Does it make more sense to have a send :: StringLike s => ... or sendS :: String -> ... sendBS :: ByteString -> ... sendLBS :: L.ByteString -> ... Also if we have separate functions we need yet another set of functions when Text is ready.

...

4) Support more features Features such as Multicast, Header inclusion (IP_HDRINCL), address binding, etc. IOW, most the IP_ and SO_ options of socket (7) and ip (7) man pages. It would be rather nice if we were able to expose these in a friendly way - but with our cross platform concerns that might not be a good idea (e.g. I'm not familiar with windows).

Most of these seem to be for the low level binding?

...

5) Separate IO-less data declarations from IO and any platform dependent code in different packages.

Would create lots of nearly identical implementation packages. Most of the implementation differences across platforms are quite small.

...

6) Integrate with the rest of hackage. This means instance of PrettyClass, Parsec, and Binary. I noticed a number of parsing utilities for IP addresses - lots of duplicated effort here.

Sounds good. - Taru Karttunen

Duncan Coutts

10:43 p.m.

On Sat, 2009-08-22 at 21:26 +0300, taruti wrote:

...

...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Ambivalent here. Does it make more sense to have a send :: StringLike s => ... or sendS :: String -> ... sendBS :: ByteString -> ... sendLBS :: L.ByteString -> ...

I've never understood why people want these type classes. Just pick the right one. Each of those types has functions for converting to each other. Let the caller do the conversion if any needs doing, it's just a function call.

...

Also if we have separate functions we need yet another set of functions when Text is ready.

Which also comes with functions for converting to ByteString or String. Duncan

Johan Tibell

24 Aug 24 Aug

7:53 a.m.

On Sun, Aug 23, 2009 at 12:43 AM, Duncan Coutts wrote:

...

On Sat, 2009-08-22 at 21:26 +0300, taruti wrote:

...
...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Ambivalent here. Does it make more sense to have a send :: StringLike s => ... or sendS :: String -> ... sendBS :: ByteString -> ... sendLBS :: L.ByteString -> ...

I've never understood why people want these type classes. Just pick the right one. Each of those types has functions for converting to each other. Let the caller do the conversion if any needs doing, it's just a function call.

Arguably, ByteString is the right one, at least for the lower level binding. ByteString map well to the types used by the underlying system calls and it is semantically the right type. I can see a reason to provide both the strict and the lazy version as going from lazy to strict might be inefficient as it involves copying. This is why network-bytestring supports both.

...

...
Also if we have separate functions we need yet another set of functions when Text is ready.

Which also comes with functions for converting to ByteString or String.

I'm not even sure what sending 'String's or 'Text's would mean without specifying an encoding. We shouldn't be afraid to use composition to prevent a combinatorial explosion i.e. don't have a sendAsUtf8, sendAsUtf16 but instead do send $ Text.encode Utf8 "myString" Cheers, Johan

Johan Tibell

7:59 a.m.

Hi, Thanks for getting this started Thomas. I've added some of my thought inline: On Sat, Aug 22, 2009 at 6:49 AM, Thomas DuBuisson wrote:

...

1) Separate low level functions / bindings from high level / productive code by placing each in different modules.

I think this is crucial to avoid having a percentage of the users have to write their own bindings because we simplified away their use case. In order for the library to be extensible it should also allow access to the underlying file descriptor. That way someone could e.g. write a library for the 'select' system call and have it work with sockets.

...

3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Agreed. String is also semantically the wrong types. The OS socket API doesn't understand Unicode.

...

6) Integrate with the rest of hackage. This means instance of PrettyClass, Parsec, and Binary. I noticed a number of parsing utilities for IP addresses - lots of duplicated effort here.

I must disagree here. We don't want hard coupling to lots of other libraries. i.e. we don't need convenience functions like: get :: Socket -> Binary a These ends are better reach by the client writing the glue code. It's easy to get a combinatorial explosion in the number of functions if we try to provide an integration point for each library out there.

...

I'm currently looking at how the network library is being used - particularly when Network.Socket is invoked. So I guess I'll go sort through some of the source code for scurry, happstack, and adhoc-network.

I think the best way to proceed is to discuss specific. Lets dig up some examples of things that are suboptimal in the current design and write down some alternative designs on a wiki page. Cheers, Johan

Yitzchak Gale

8:43 a.m.

Johan Tibell wrote:

...

We shouldn't be afraid to use composition to prevent a combinatorial explosion i.e. don't have a sendAsUtf8, sendAsUtf16 but instead do send $ Text.encode Utf8 "myString"

OK, makes sense. But then, that should appear very explicitly in the documentation. It will be the most common usage, and people using it will not be focused on Network, Bytestring, and Text, but on other things. They should not be forced to thrash around and do a research project just to do something simple. Thomas DuBuisson wrote:

...

...
6) Integrate with the rest of hackage. This means instance of PrettyClass, Parsec, and Binary.

...

I must disagree here. We don't want hard coupling to lots of other libraries.

It won't be a problem if the convenience glue is in separate packages that pull in all of the dependencies. But then, these other packages need to be mentioned in the documentation for the main package. Regards, Yitz

Johan Tibell

9:23 a.m.

On Mon, Aug 24, 2009 at 10:43 AM, Yitzchak Gale wrote:

...

Johan Tibell wrote:

...
We shouldn't be afraid to use composition to prevent a combinatorial explosion i.e. don't have a sendAsUtf8, sendAsUtf16 but instead do send $ Text.encode Utf8 "myString"

OK, makes sense. But then, that should appear very explicitly in the documentation. It will be the most common usage, and people using it will not be focused on Network, Bytestring, and Text, but on other things. They should not be forced to thrash around and do a research project just to do something simple.

I agree. The network package could use a lot more documentation in general. For example, I put a echo client/server pair example in my network-bytestring package as the network library lacked even something that basic.

...

Thomas DuBuisson wrote:

...
...
6) Integrate with the rest of hackage. This means instance of PrettyClass, Parsec, and Binary.

...
I must disagree here. We don't want hard coupling to lots of other libraries.

It won't be a problem if the convenience glue is in separate packages that pull in all of the dependencies. But then, these other packages need to be mentioned in the documentation for the main package.

That is an option. Perhaps something we could look into once we dealt with the basic design of the network library itself? -- Johan

Duncan Coutts

9:59 p.m.

On Mon, 2009-08-24 at 09:59 +0200, Johan Tibell wrote:

...

...
6) Integrate with the rest of hackage. This means instance of PrettyClass, Parsec, and Binary. I noticed a number of parsing utilities for IP addresses - lots of duplicated effort here.

I must disagree here. We don't want hard coupling to lots of other libraries. i.e. we don't need convenience functions like:

get :: Socket -> Binary a

These ends are better reach by the client writing the glue code. It's easy to get a combinatorial explosion in the number of functions if we try to provide an integration point for each library out there.

Yes, I think this is a trap that we often fall into, that we think we make libraries better by providing exactly the one function that people need, in some kind of "high level" "all in one" operation. As an example, I'm going to pick on Don because I know he can take it ;-). The download package provides: openURI :: String -> IO (Either String ByteString) which is great, but also these: openAsTags :: String -> IO (Either String [Tag]) openAsXML :: String -> IO (Either String [Content]) openAsFeed :: String -> IO (Either String Feed) which are all instances of (roughly) this pattern: openAsFoo = Foo.parseFoo `fmap` openURI The point is: *** don't fear the composition!! *** Let the user / caller do the composition. The API is simpler and easier to understand if the user does the composition rather than providing every use case. Also the package dependencies are much simpler. Sure, point out in the docs that the user could use it that way with other libs but don't provide every composition in the name of convenience. A Microsoft blogger has a nice short (and rather sarky) article on this topic: "Programming means that sometimes you have to snap two blocks together" http://blogs.msdn.com/oldnewthing/archive/2009/08/04/9856634.aspx Duncan

Johan Tibell

25 Aug 25 Aug

8:49 a.m.

On Mon, Aug 24, 2009 at 11:59 PM, Duncan Coutts wrote:

...

On Mon, 2009-08-24 at 09:59 +0200, Johan Tibell wrote: Yes, I think this is a trap that we often fall into, that we think we make libraries better by providing exactly the one function that people need, in some kind of "high level" "all in one" operation.

As an example, I'm going to pick on Don because I know he can take it ;-).

I think I mentioned this to Don months ago when I decided to against using the library due to all the dependencies. :)

...

Let the user / caller do the composition. The API is simpler and easier to understand if the user does the composition rather than providing every use case. Also the package dependencies are much simpler.

Sure, point out in the docs that the user could use it that way with other libs but don't provide every composition in the name of convenience.

A Microsoft blogger has a nice short (and rather sarky) article on this topic:

"Programming means that sometimes you have to snap two blocks together" http://blogs.msdn.com/oldnewthing/archive/2009/08/04/9856634.aspx

Another example: At work we have a File class that has a whopping 20 methods for opening a file. Most methods are combinations of different ways to do error handling, locking, setting attributes and options. The whole class has more than 100 methods! I've seen arguments on the form Lots of people write programs that contains the function foo = bar . baz. Therefor we should put foo in library X. on the libraries list lately. I think this argument is bogus most of the time. If you want 'foo', stick it at the top of your module. New functions should add functionality that is difficult to express using the current set of functions in the library. -- Johan

Yusaku Hashimoto

24 Aug 24 Aug

4:46 p.m.

Hello, On Sat, Aug 22, 2009 at 1:49 PM, Thomas DuBuisson wrote:

...

2) Maintain type safety by using type classes for most things.

Unlike Network.Fancy and Network.Socket (which have IPv4 and IPv6 as constructors of the same data type), I think we should allow for the possibility that some users of the library will be limited to just one IP version without resorting to partial functions. I suggest type classes to cover this aspect (class Address, class Port, etc).

I was trying to add another type safety for socket operations, and you can get it by ` darcs get http://patch-tag.com/r/phantom-socket/pullrepo phantom-socket`. My approach is having state of socket in its type, and having states of socket before/after actions in its types. This forces programmers not to write wrong operations like listening on socket before binding. E.g., Function for creating socket which is connected to somewhere has the type `HostName -> PortNumber -> IO (Socket Connected)`. Note type Socket has parameter in its type of returning value, this represents the state of connected socket. And a function for listening with specifying backlog's size has the type `BackLog -> SockAct Bound Listening ()`. SockAct is an indexed monad[1] has socket state before/after its action in its *type*. Here, SockAct has three parameters. The first is the before-state of the action, the second is the after-state, and the last is the type of value wrapped in monad. Working example is at here[2]. If you swapped some lines in ixdo-block, you would see compile error, not run-time error. (ixdo is do-notation in indexed monad, it is provided by ixdopp, preprocessor for de-sugaring ixdo-block before actual compiling) But it ensures only compile-time's safety, can't ensure run-time's safety (consider if client closes socket while sending). Currently, if run-time and compile-time's states become not same, program just crushes. So I think my implementation can't be used in real world, but I hope this would give you some ideas for type-safe network libraries. [1]: http://hackage.haskell.org/packages/archive/category-extras/0.44.4/doc/html/... [2]: http://patch-tag.com/r/phantom-socket/snapshot/current/content/pretty/test/T... Cheers, -nwn

taruti

25 Aug 25 Aug

7:54 a.m.

Excerpts from Yusaku Hashimoto's message of Mon Aug 24 19:46:46 +0300 2009:

...

I was trying to add another type safety for socket operations, and you can get it by ` darcs get http://patch-tag.com/r/phantom-socket/pullrepo phantom-socket`.

I think we should not have these operations. A) it makes it hard to pass sockets between threads B) the state distinction is useful only for servers and servers could be handled much better with a record and hiding the operations from users. - Taru Karttunen

Bryan O'Sullivan

24 Aug 24 Aug

6 p.m.

On Fri, Aug 21, 2009 at 9:49 PM, Thomas DuBuisson < thomas.dubuisson@gmail.com> wrote:

...

Johan suggested starting a SIG to hammer out a design for a new Network API seeing as the current API, a straight-forward Berkeley binding, doesn't seem to please anyone in a Haskell context.

Thanks for getting the conversation started! Your agenda is very ambitious, which worries me a little. As a "how to get things moving" tip, I'd very strongly suggest trying to make progress on the lowest level bindings first, as they will be the most concrete, and the least likely to provoke prolonged discussion and disagreement.

...

2) Maintain type safety by using type classes for most things.

Unlike Network.Fancy and Network.Socket (which have IPv4 and IPv6 as constructors of the same data type), I think we should allow for the possibility that some users of the library will be limited to just one IP version without resorting to partial functions. I suggest type classes to cover this aspect (class Address, class Port, etc).

I don't understand what this might mean. Code examples or type signatures are going to be much easier to follow than English descriptions.

...

3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency.

That's already a step up from the lowest-level bindings, which should be using Ptr a.

...

4) Support more features Features such as Multicast, Header inclusion (IP_HDRINCL), address binding, etc. IOW, most the IP_ and SO_ options of socket (7) and ip (7) man pages. It would be rather nice if we were able to expose these in a friendly way - but with our cross platform concerns that might not be a good idea (e.g. I'm not familiar with windows).

Providing Network.Windows and Network.Linux and Network.BSD etc modules would work fine for non-portable platform-specific features (of which there are many). As for providing instances for the likes of Binary, there are good reasons not to do that in the core network package, because it will drag in dependencies on quite a few third-party packages that are either not in core (binary) or have questions over their futures (Parsec).

Thomas DuBuisson

6:34 p.m.

Bryan O'Sullivan wrote:

...

Your agenda is very ambitious, which worries me a little. As a "how to get things moving" tip, I'd very strongly suggest trying to make progress on the lowest level bindings first, as they will be the most concrete, and the least likely to provoke prolonged discussion and disagreement.

Thanks for the tip - I'll probably do that. It's my intention to reuse much of the current Network low level bindings. It still doesn't have all the constants we want... and it has some extra fluff such as inet_ntoa, but its a great place to start.

...

...
2) Maintain type safety by using type classes for most things.

Unlike Network.Fancy and Network.Socket (which have IPv4 and IPv6 as constructors of the same data type), I think we should allow for the possibility that some users of the library will be limited to just one IP version without resorting to partial functions. I suggest type classes to cover this aspect (class Address, class Port, etc).

I don't understand what this might mean. Code examples or type signatures are going to be much easier to follow than English descriptions.

Network.Fancy (and Network.Socket) defines a data structure morally equivalent to: data Address = IPv4 Word32 | IPv6 Word128 | ... I Object! Code that uses addresses _might_ be very specific to IPv4 or IPv6 - such code should be able to restrict itself to the proper category of addresses, headers etc without resorting to partial functions. I propose a solution akin to: class Address a where ... instance Address IPv4 instance Address IPv6 This could allow connect functions, such as those in Network.Fancy, to look like the below signature. connectDgram :: Address a => a -> IO Socket While still giving type safety to those of us who do low level munging with protocol headers. I'm still not clear on how the routines would be built - this idea was born from a desire for type safety when building headers.

...

...
4) Support more features Features such as Multicast, Header inclusion (IP_HDRINCL), address binding, etc. IOW, most the IP_ and SO_ options of socket (7) and ip (7) man pages. It would be rather nice if we were able to expose these in a friendly way - but with our cross platform concerns that might not be a good idea (e.g. I'm not familiar with windows).

Providing Network.Windows and Network.Linux and Network.BSD etc modules would work fine for non-portable platform-specific features (of which there are many).

I was actually thinking of even rarer platforms that might not even have a concept of Sockets (halvm anyone? House, hOp?) - keeping important data declarations separate from IO allows such platforms to provide their own IO but not have to rebuild everything in a manner incompatible with the rest of hackage.

...

As for providing instances for the likes of Binary, there are good reasons not to do that in the core network package, because it will drag in dependencies on quite a few third-party packages that are either not in core (binary) or have questions over their futures (Parsec).

I understand the issue, but it would be a shame if a new network package failed to have functional Address parsing - its just such a basic thing. I suppose this gives even more weight to a dual package solution. Thomas

taruti

25 Aug 25 Aug

7:39 a.m.

Excerpts from Thomas DuBuisson's message of Mon Aug 24 21:34:01 +0300 2009:

...

I Object! Code that uses addresses _might_ be very specific to IPv4 or IPv6 - such code should be able to restrict itself to the proper category of addresses, headers etc without resorting to partial functions. I propose a solution akin to:

class Address a where ... instance Address IPv4 instance Address IPv6

Also we could have data ConnectOptions t where ... instance Address t => Address (ConnectOptions t) where ...

...

I understand the issue, but it would be a shame if a new network package failed to have functional Address parsing - its just such a basic thing. I suppose this gives even more weight to a dual package solution.

IP-address parsing does not need Parsec. It is really quite simple. Which type system extensions can we use? e.g. GADTs are attractive but not supported by Hugs... Here is a little bit of formulation for a possible high-level API. Connections

...

type HostName = String type Port = Word16

...

data IPv4 = IPv4 HostName Port data IPv6 = IPv6 HostName Port data IP = IP HostName Port data Unix = Unix FilePath data AddressOptions t where ...

...

data Stream data Packet

...

class Address t where connectStream :: t -> IO (Socket Stream) connectPacket :: t -> IO (Socket Packet)

...

instance Connect IPv4 where ... instance Connect IPv6 where ... instance Connect IP where ... instance Connect (AddressOptions t) where ... instance Connect Unix where ...

withStream :: Address a => a -> (Socket Stream -> IO a) -> IO a withPacket :: Address a => a -> (Socket Packet -> IO a) -> IO a

...

socketToHandle :: Socket Stream -> IO Handle

Sending and receiving

...

-- | Send a bytestring send :: Socket t -> L.ByteString -> IO () -- | Receive a bytestring with a maximum length recv :: Socket t -> Int -> IO L.ByteString -- | Receive from -- FIXME SomeAddress here is a kludge, what would be better? recvFrom :: Socket t -> Int -> IO (L.ByteString, SomeAddress) close :: Socket t -> IO () sendTo :: Address a => Socket Packet -> a -> L.ByteString -> IO ()

Server API

...

data ServerOptions t where ... -- FIXME server :: ServerOptions t -> (Socket t -> SomeAddress -> IO ()) -> IO Foobar

Miscellaneous highlevel api

...

sleepForever :: IO () getCurrentHost :: IO HostName nameToNumeric :: Address a => a -> IO a numericToName :: Address a => a -> IO a

- Taru Karttunen

Johan Tibell

24 Aug 24 Aug

7:48 p.m.

On Mon, Aug 24, 2009 at 8:00 PM, Bryan O'Sullivan wrote:

...

On Fri, Aug 21, 2009 at 9:49 PM, Thomas DuBuisson wrote:

...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency.

That's already a step up from the lowest-level bindings, which should be using Ptr a.

My plan for network-bytestring has always been to offer a Network.Socket.Buffer with function such as 'recvInto' that work on buffers. The higher level bytestring interface can then be implemented in terms of this interface. I've been struggling a bit with how to expose less frequently used functionality in the BSD socket API such as the 'flags' parameter to 'send'. It feels a bit unfortunate to have to pass a "no flags" value in the most frequent case. Optional keyword parameters would be useful here.

...

...
4) Support more features Features such as Multicast, Header inclusion (IP_HDRINCL), address binding, etc. IOW, most the IP_ and SO_ options of socket (7) and ip (7) man pages. It would be rather nice if we were able to expose these in a friendly way - but with our cross platform concerns that might not be a good idea (e.g. I'm not familiar with windows).

Providing Network.Windows and Network.Linux and Network.BSD etc modules would work fine for non-portable platform-specific features (of which there are many).

I think this makes sense. I read some slides from a presentation on the next Java file API. The realization the Java community seemed to have come to is that you need to offer APIs with both a cross platform part and a platform specific part, preferably in such a way that both can coexist somewhat peacefully (i.e. not two completely separate class hierarchies). Cheers, Johan

Bryan O'Sullivan

9:22 p.m.

On Mon, Aug 24, 2009 at 12:48 PM, Johan Tibell wrote:

...

My plan for network-bytestring has always been to offer a Network.Socket.Buffer with function such as 'recvInto' that work on buffers. The higher level bytestring interface can then be implemented in terms of this interface.

How would a Buffer differ from a combination of Ptr a and CSize? (My implied argument is that it shouldn't.) I've been struggling a bit with how to expose less frequently used

...

functionality in the BSD socket API such as the 'flags' parameter to 'send'. It feels a bit unfortunate to have to pass a "no flags" value in the most frequent case. Optional keyword parameters would be useful here.

send vs sendWithFlags would be workable, since we're not going to get keyword or optional arguments in a happy-making form in the language any time soon. Of course, just making people use defaultFlags (as getAddrInfo does) isn't too awful, either.

Johan Tibell

25 Aug 25 Aug

8:10 a.m.

On Mon, Aug 24, 2009 at 11:22 PM, Bryan O'Sullivan wrote:

...

On Mon, Aug 24, 2009 at 12:48 PM, Johan Tibell wrote:

...
My plan for network-bytestring has always been to offer a Network.Socket.Buffer with function such as 'recvInto' that work on buffers. The higher level bytestring interface can then be implemented in terms of this interface.

How would a Buffer differ from a combination of Ptr a and CSize? (My implied argument is that it shouldn't.)

It wouldn't. A raw Ptr a and a size (but possibly not CSize?) was what I had in mind. recvInto :: Socket -> Ptr a -> CSize -> IO () (Or perhaps just call it 'recv' since it would live in a separate module.)

...

...
I've been struggling a bit with how to expose less frequently used functionality in the BSD socket API such as the 'flags' parameter to 'send'. It feels a bit unfortunate to have to pass a "no flags" value in the most frequent case. Optional keyword parameters would be useful here.

send vs sendWithFlags would be workable, since we're not going to get keyword or optional arguments in a happy-making form in the language any time soon. Of course, just making people use defaultFlags (as getAddrInfo does) isn't too awful, either.

sendWithFlags might be the best solution given what we have to work with. It would give us twice as many functions in the send/recv/sendTo/recvFrom part of the API which is unfortunate. If there were more than one optional parameter I would, as you mentioned, go with a record type for the options as it's more extensible (i.e. doesn't break user code as long as the user only uses the accessors) and doesn't increase the number of functions. -- Johan

Sterling Clover

3:49 a.m.

The hardest part of network operations, I've found, is handling the myriad of exceptional conditions safely, in such a way that there's control over the number of sockets used and the timeouts in connections (each step potentially needs its own timeout, in fact, and one has to be careful with the interaction of asynchronous exceptions here, or move to a different sort of model) while also providing strong guarantees that resources won't leak. I.e. depending on the stage of the connection process and type of exception, different resources need to be freed. network-fancy seems to be on the right track here, but I think withSocket should absolutely take an optional timeout parameter (not for the whole action, just for the connection). I'm also not too sure that it won't leak as currently written -- i.e. if there's an error in the connection loop itself, is the socket freed? Of course something depending on the socket could still be returned out of withSocket even after its underlying socket was closed. On the other hand, if users really wanted guarantees, I think the signature of withSocket is expressive enough that iteratees could be used within the continuation it was passed. Lightweight monadic regions would also be possible as an alternate API on top of this, as a separate lib. The server functions for network-fancy also seem pretty good. Although it would be nice to have an MVar or other mechanism to shut down the servers cleanly, rather than relying on asynchronous exceptions, which might cause odd interactions. In particular, listening on sockets can likely block exceptions, which means that if you want to be able to shut down a server, you need to throw it an exception, and then ping it to force the listen call to return. Either we need some nonblocking way to listen, or that functionality needs to be wrapped up in some neat abstraction. Speaking of which, are we sure about what calls should be unsafe and safe? The small price of marking things unsafe can lead to big problems with blocking thread context switches, and thus grinding the rest of an app to a halt. A key criteria for the network library, in my opinion, should be that no network call on a given socket and thread blocks any other work of the program, including another network call on another socket and thread. Additionally, while sockets are too nitty-gritty for a high-level interface (although a socket with phantom types might not be), Handles feel to me to be too tricky to reliably map onto, and I'd much prefer something in between. Perhaps an explicit pair of a lazy bytestring and a function of type (ByteString -> IO ()) (both, of course, properly buffered.)? Of course, this doesn't map onto some of the multicast and datagram type functionality, which would need other abstractions. Cheers, S. On Aug 24, 2009, at 3:48 PM, Johan Tibell wrote:

...

On Mon, Aug 24, 2009 at 8:00 PM, Bryan O'Sullivan wrote:

...
On Fri, Aug 21, 2009 at 9:49 PM, Thomas DuBuisson wrote:

...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency.

That's already a step up from the lowest-level bindings, which should be using Ptr a.

My plan for network-bytestring has always been to offer a Network.Socket.Buffer with function such as 'recvInto' that work on buffers. The higher level bytestring interface can then be implemented in terms of this interface.

I've been struggling a bit with how to expose less frequently used functionality in the BSD socket API such as the 'flags' parameter to 'send'. It feels a bit unfortunate to have to pass a "no flags" value in the most frequent case. Optional keyword parameters would be useful here.

...
...
4) Support more features Features such as Multicast, Header inclusion (IP_HDRINCL), address binding, etc. IOW, most the IP_ and SO_ options of socket (7) and ip (7) man pages. It would be rather nice if we were able to expose these in a friendly way - but with our cross platform concerns that might not be a good idea (e.g. I'm not familiar with windows).

Providing Network.Windows and Network.Linux and Network.BSD etc modules would work fine for non-portable platform-specific features (of which there are many).

I think this makes sense. I read some slides from a presentation on the next Java file API. The realization the Java community seemed to have come to is that you need to offer APIs with both a cross platform part and a platform specific part, preferably in such a way that both can coexist somewhat peacefully (i.e. not two completely separate class hierarchies).

Cheers,

Johan _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Simon Marlow

12:07 p.m.

On 25/08/2009 04:49, Sterling Clover wrote:

...

Additionally, while sockets are too nitty-gritty for a high-level interface (although a socket with phantom types might not be), Handles feel to me to be too tricky to reliably map onto, and I'd much prefer something in between. Perhaps an explicit pair of a lazy bytestring and a function of type (ByteString -> IO ()) (both, of course, properly buffered.)?

Could you elaborate a bit more on "Handles feel to me to be too tricky to reliably map onto"? I'd like to understand what problems there are, and see if there's some way to address them. Cheers, Simon

Sterling Clover

26 Aug 26 Aug

3:43 p.m.

Simon Marlow-7 wrote:

...

Could you elaborate a bit more on "Handles feel to me to be too tricky to reliably map onto"? I'd like to understand what problems there are, and see if there's some way to address them.

I'm not a network/sockets expert by any means, but I've had to deal with doing some fiddly stuff in Haskell, and ended up just not trusting the Handle model, even though I know there's been some improvement and quite a few bugs have been fixed, etc. Additionally, these comments are based on what exists now in GHC 6.10. That said, my first gripe is that lots of operations on Handles are really only sane for actual files. Maybe these could be abstracted out, by parameterizing Handles over a phantom type, or by typeclassing? The file size operations are obviously meaningless, as are the get/set posn functions, and seeks, but hIsEOF and hLookAhead also seem problematic, not least because they force a buffer for even ostensibly unbuffered handles. Additionally, at least on windows, sometimes there is an IOError OtherException with error text "failed (No Error)" when an EOF should be provided. [That's really more a behavioural quirk/bug than a failing of Handles as such, I suppose] As far as buffering goes, Handles currently couple buffering modes, which is potentially frustrating if one wants, e.g., no buffering on recv, but block buffering on send. Additionally, for many, though not all applications, it makes sense to separate the ability to close a handle from the ability to read or write to it, particularly for high-level, e.g., server APIs. On the other hand, I'm less concerned with the ability to set options, since it seems to be that socket options can almost invariably be set as part of the initial connection/setup phase, and once you have a Handle or whatever, then the ability to change them isn't really necessary. Cheers, Sterl. -- View this message in context: http://www.nabble.com/Potential-Network-SIG-tp25090622p25151428.html Sent from the Haskell - Libraries mailing list archive at Nabble.com.

Simon Marlow

4:42 p.m.

On 26/08/2009 16:43, Sterling Clover wrote:

...

Simon Marlow-7 wrote:

...
Could you elaborate a bit more on "Handles feel to me to be too tricky to reliably map onto"? I'd like to understand what problems there are, and see if there's some way to address them.

I'm not a network/sockets expert by any means, but I've had to deal with doing some fiddly stuff in Haskell, and ended up just not trusting the Handle model, even though I know there's been some improvement and quite a few bugs have been fixed, etc. Additionally, these comments are based on what exists now in GHC 6.10.

That said, my first gripe is that lots of operations on Handles are really only sane for actual files. Maybe these could be abstracted out, by parameterizing Handles over a phantom type, or by typeclassing? The file size operations are obviously meaningless, as are the get/set posn functions, and seeks, but hIsEOF and hLookAhead also seem problematic, not least because they force a buffer for even ostensibly unbuffered handles.

Ok, so is it fair to summarise this as saying you'd prefer there to be more static type safety to these operations, rather than the current dynamic type checking? I couldn't agree more.

...

Additionally, at least on windows, sometimes there is an IOError OtherException with error text "failed (No Error)" when an EOF should be provided. [That's really more a behavioural quirk/bug than a failing of Handles as such, I suppose]

That sounds like a bug, if you can reproduce it then please report it.

...

As far as buffering goes, Handles currently couple buffering modes, which is potentially frustrating if one wants, e.g., no buffering on recv, but block buffering on send.

Buffering is always invisible on input - if there is any input available, you'll see it immediately. It has performance implications only - but I can't imagine you'd want to deliberately reduce performance by turning off buffering (in fact, I think the new I/O library doesn't even honour NoBuffering on input Handles).

...

Additionally, for many, though not all applications, it makes sense to separate the ability to close a handle from the ability to read or write to it, particularly for high-level, e.g., server APIs.

Yes - more detailed types would be nice.

...

On the other hand, I'm less concerned with the ability to set options, since it seems to be that socket options can almost invariably be set as part of the initial connection/setup phase, and once you have a Handle or whatever, then the ability to change them isn't really necessary.

Still, we could provide a way to recover the Socket from a Handle if necessary (it would fail on a Handle that wasn't a Socket, of course). The other question to consider is I/O multiplexing (select()/epoll()/etc.). Right now if you use Handles with file descriptors you get this for free (not epoll() yet, but hopefully in the future). If you do your own I/O, then you have to implement this too. Ideally the multiplexing of I/O should be available as a separate library that you can hook into from your own I/O code, and it would be used by Handles as well as from other libraries that need low-level I/O. One other reason to want Handles: if you want to do something that involves changing encodings on the fly (e.g. reading an HTTP response) then Handles with hSetEncoding do the job nicely, the buffering and re-encoding is all handled behind the scenes for you. This sort of thing would be tricky with bytestring and text. Cheers, Simon

Ian Lynagh

10 Sep 10 Sep

3:05 p.m.

On Wed, Aug 26, 2009 at 05:42:19PM +0100, Simon Marlow wrote:

...

...
As far as buffering goes, Handles currently couple buffering modes, which is potentially frustrating if one wants, e.g., no buffering on recv, but block buffering on send.

Buffering is always invisible on input - if there is any input available, you'll see it immediately. It has performance implications only - but I can't imagine you'd want to deliberately reduce performance by turning off buffering (in fact, I think the new I/O library doesn't even honour NoBuffering on input Handles).

So is this program supposed to not be valid (or at least, to not behave as I would expect)? import Control.Monad import System.Environment import System.IO import System.Posix.Process main :: IO () main = do [get] <- getArgs hSetBuffering stdin NoBuffering when (read get) $ do x <- hGetChar stdin putStrLn ("Got: " ++ show x) executeFile "cat" True [] Nothing With 6.8.2 I get: $ printf "foo\nbar\n" | ./q True Got: 'f' oo bar $ printf "foo\nbar\n" | ./q False foo bar as expected, but with the HEAD: $ printf "foo\nbar\n" | ./q True Got: 'f' printf "foo\nbar\n" | ./q False foo bar Thanks Ian

Duncan Coutts

11 Sep 11 Sep

11:53 a.m.

On Thu, 2009-09-10 at 16:05 +0100, Ian Lynagh wrote:

...

On Wed, Aug 26, 2009 at 05:42:19PM +0100, Simon Marlow wrote:

...
...
As far as buffering goes, Handles currently couple buffering modes, which is potentially frustrating if one wants, e.g., no buffering on recv, but block buffering on send.

Buffering is always invisible on input - if there is any input available, you'll see it immediately. It has performance implications only - but I can't imagine you'd want to deliberately reduce performance by turning off buffering (in fact, I think the new I/O library doesn't even honour NoBuffering on input Handles).

So is this program supposed to not be valid (or at least, to not behave as I would expect)?

import Control.Monad import System.Environment import System.IO import System.Posix.Process

main :: IO () main = do [get] <- getArgs hSetBuffering stdin NoBuffering when (read get) $ do x <- hGetChar stdin putStrLn ("Got: " ++ show x) executeFile "cat" True [] Nothing

So you're accessing the stdin file descriptor via the stdin Handle and also directly via cat. Note that there's no API in Haskell code to get at the unadorned FD while keeping the Handle open. Duncan

Simon Marlow

1:12 p.m.

On 10/09/2009 16:05, Ian Lynagh wrote:

...

On Wed, Aug 26, 2009 at 05:42:19PM +0100, Simon Marlow wrote:

...
...
As far as buffering goes, Handles currently couple buffering modes, which is potentially frustrating if one wants, e.g., no buffering on recv, but block buffering on send.

Buffering is always invisible on input - if there is any input available, you'll see it immediately. It has performance implications only - but I can't imagine you'd want to deliberately reduce performance by turning off buffering (in fact, I think the new I/O library doesn't even honour NoBuffering on input Handles).

So is this program supposed to not be valid (or at least, to not behave as I would expect)?

import Control.Monad import System.Environment import System.IO import System.Posix.Process

main :: IO () main = do [get]<- getArgs hSetBuffering stdin NoBuffering when (read get) $ do x<- hGetChar stdin putStrLn ("Got: " ++ show x) executeFile "cat" True [] Nothing

With 6.8.2 I get:

$ printf "foo\nbar\n" | ./q True Got: 'f' oo bar

$ printf "foo\nbar\n" | ./q False foo bar

as expected, but with the HEAD:

$ printf "foo\nbar\n" | ./q True Got: 'f'

printf "foo\nbar\n" | ./q False foo bar

I think relying on NoBuffering in this way is a little dodgy. For example, how do you implement hLookAhead, or hIsEOF? Try your example with hIsEOF, and you'll probably get something "unexpected" with 6.8.2. Let's suppose we wanted this to work, i.e. not read more bytes than are necessary to satisfy the current demand. In the case of a multibyte character, we have to read one byte, notice that it is the first byte of a multibyte sequence, and then read the rest of the bytes (into a "buffer" presumably). It would be possible, but complicated, and I'm not convinced we really need to support this kind of usage. Cheers, Simon

Simon Marlow

25 Aug 25 Aug

12:03 p.m.

On 22/08/2009 05:49, Thomas DuBuisson wrote:

...

Hello All,

If you are CCed it's because you are listed as a maintainer of a network-* package that I consider related to the Haskell network library. I'm hoping to roll much of the functionality of network-{bytestring, multicast, fancy} etc into a single package that the community will agree on (namely, "network").

Johan suggested starting a SIG to hammer out a design for a new Network API seeing as the current API, a straight-forward Berkeley binding, doesn't seem to please anyone in a Haskell context. If you want to partake then this e-mail if your heads up. If there is some formal method of setting up a Haskell SIG then please let me know.

My thoughts on some important parts are below - I'm sure not everyone will agree as these thoughts directly contradict some designs found in current libraries.

1) Separate low level functions / bindings from high level / productive code by placing each in different modules.

The low level bindings should remain available for those cases we fail to have the needed functionality in our high level packages. That said, I'm hoping to cover more than the 80% of users with any new design.

2) Maintain type safety by using type classes for most things.

Unlike Network.Fancy and Network.Socket (which have IPv4 and IPv6 as constructors of the same data type), I think we should allow for the possibility that some users of the library will be limited to just one IP version without resorting to partial functions. I suggest type classes to cover this aspect (class Address, class Port, etc).

3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Idealogically speaking, this is not a choice you should make in the network library. The network library should deal with setting up sockets, and delegate the actual I/O to the I/O library. Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too. Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe people want to use the low-level send/recv. But I expect that for most applications, going via Handle will be the right thing, and we should look at how to accommodate the other use cases. Cheers, Simon

Bryan O'Sullivan

4:18 p.m.

On Tue, Aug 25, 2009 at 5:03 AM, Simon Marlow wrote:

...

Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too.

I remember seeing notes about the new I/O code being about 20-25% slower than the old, due to support for character set transcoding. If my recollection is correct, would that hold true for writing ByteStrings, too? (Yes, I'm somewhat performance obsessed.)

Duncan Coutts

9:28 p.m.

On Tue, 2009-08-25 at 09:18 -0700, Bryan O'Sullivan wrote:

...

On Tue, Aug 25, 2009 at 5:03 AM, Simon Marlow wrote:

Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too.

I remember seeing notes about the new I/O code being about 20-25% slower than the old, due to support for character set transcoding. If my recollection is correct, would that hold true for writing ByteStrings, too? (Yes, I'm somewhat performance obsessed.)

As far as I know the only slow down is when there is actual text decoding going on, ie not for ByteString I/O. Simon has promised to explain all in his upcoming talk at the Haskell Implementers' Workshop. :-) That will also be the obvious time to discuss what a new public I/O interface might look like. There's lots of fun new machinery in GHC's I/O system but we need to talk about public APIs. Duncan

Simon Marlow

26 Aug 26 Aug

7:47 a.m.

On 25/08/2009 22:28, Duncan Coutts wrote:

...

On Tue, 2009-08-25 at 09:18 -0700, Bryan O'Sullivan wrote:

...
On Tue, Aug 25, 2009 at 5:03 AM, Simon Marlow wrote:

Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too.

I remember seeing notes about the new I/O code being about 20-25% slower than the old, due to support for character set transcoding. If my recollection is correct, would that hold true for writing ByteStrings, too? (Yes, I'm somewhat performance obsessed.)

As far as I know the only slow down is when there is actual text decoding going on, ie not for ByteString I/O.

Right - and when I measured the final version, it was actually faster than the old version, but only on x86-64 (for reasons unknown). This was measuring hPutStr and hGetContents though, which if you are performance-conscious you wouldn't be using anyway. Bytestring I/O to Handles bypasses the encoding/decoding layer, and if the I/O is big enough it also bypasses the intermediate buffer and writes the bytes directly from the Bytestring. All this is unchanged relative to the old I/O library in GHC 6.10. Still, I would expect Handles to have a performance penalty for doing lots of small writes, due to the overhead of taking the lock, various exception handlers and checking whether the Char buffer needs to be flushed. If you need to do lots of small writes then you'd need to go to a lower layer or add another layer of buffering (as GHC does - see compiler/utils/BufWrite.hs in the GHC sources).

...

Simon has promised to explain all in his upcoming talk at the Haskell Implementers' Workshop. :-)

That will also be the obvious time to discuss what a new public I/O interface might look like. There's lots of fun new machinery in GHC's I/O system but we need to talk about public APIs.

Definitely. Cheers, Simon

taruti

25 Aug 25 Aug

5:05 p.m.

Excerpts from Simon Marlow's message of Tue Aug 25 15:03:21 +0300 2009:

...

Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe people want to use the low-level send/recv. But I expect that for most applications, going via Handle will be the right thing, and we should look at how to accommodate the other use cases.

For example packet (datagram) sockets cannot use Handles at all since they need have explicit boundaries between packets. Thus stream oriented handles don't fit them. Another problematic case is setting/getting socket options - it is not possible to tell in the type system whether a particular Handle represents a socket or not. Handles will be ok for high-level TCP applications if the performance is good enough. - Taru Karttunen

Johan Tibell

8:23 p.m.

On Tue, Aug 25, 2009 at 2:03 PM, Simon Marlow wrote:

...

On 22/08/2009 05:49, Thomas DuBuisson wrote:

...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Idealogically speaking, this is not a choice you should make in the network library. The network library should deal with setting up sockets, and delegate the actual I/O to the I/O library.

Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too.

Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe people want to use the low-level send/recv. But I expect that for most applications, going via Handle will be the right thing, and we should look at how to accommodate the other use cases.

In my mind an improved I/O library would look something like this:

...

-- At the very bottom is a type class 'RawIO' which represents a -- variety of stream-like types. class RawIO a where readInto :: Ptr Word8 -> Int -> IO () write :: ByteString -> IO ()

read :: Int -> IO ByteString read n = ByteString.createAndTrim n (\p -> readInto p n)

This definition is very minimal and most likely need to be expanded with operations such as 'close' and perhaps also 'seek'. The methods would map to the system calls for e.g. files and sockets. A particular instance could would exceptions for unsupported methods (e.g. a file opened as read-only would throw exceptions if 'write' is called).

...

-- A simple wrapper for file descriptors. data File = File CInt

instance RawIO File where readInto = cRead write = cWrite

-- This is only for stream like sockets. Datagram sockets still need to use the lower level API. instance RawIO Socket where readInto = cRecv write = send

-- Assuming 'Buffer' is a mutable byte buffer type. -- This is useful for e.g. testing. ByteString could also be -- made an instance that throws exceptions for unimplemented -- methods (e.g. write in this case). instance RawIO Buffer where readInto = readFromBufferInto write = writeToBuffer

We can now layer buffering on top.

...

-- Buffers for reading and writing are kept in a data type 'BufferedIO'. -- This data type need not be exposed. data BufferedIO = forall a. RawIO a => BufferedIO Buffer Buffer a

instance RawIO BufferedIO where readInto = readFromBufferInto -- Calls RawIO.readInto if needed write = writeToBuffer -- Calls RawIO.write if needed

-- Allocates buffers and returns a BufferedIO buffered :: RawIO a => a -> a buffered = ...

We might opt for a type class for buffered I/O in case we want to expose any methods in addition to those exported by RawIO. We can now layer text I/O on top of buffered I/O:

...

class TextIO a where read :: Int -> IO Text write :: Text -> IO () readLine :: IO Text

-- To do this efficiently BufferedIO might need to expose its buffer. -- Alternatively TextIO can be layered directly on top of RawIO and -- manage its own buffers. text :: (BufferedIO a, TextIO b) => a -> b text = ...

A few people have express interest in discussing this at ICFP. Perhaps we could draft a proposal and put it on a wiki page for others to review. -- Johan

Simon Marlow

26 Aug 26 Aug

8:13 a.m.

On 25/08/2009 21:23, Johan Tibell wrote:

...

On Tue, Aug 25, 2009 at 2:03 PM, Simon Marlow wrote:

...
On 22/08/2009 05:49, Thomas DuBuisson wrote:

...
3) Use Bytestrings (and have corrosponding .Lazy modules) for efficiency. As in network-bytestring, any new API should be performance concious enough to avoid String.

Idealogically speaking, this is not a choice you should make in the network library. The network library should deal with setting up sockets, and delegate the actual I/O to the I/O library.

Right now, that means making Handles from Sockets (which is something the current network library provides). And then you use the bytestring library to write bytestrings to the Handle. In the future we'll have a way to write text to a Handle too.

Now, I wouldn't be surprised if this doesn't cover all the use cases. Maybe people want to use the low-level send/recv. But I expect that for most applications, going via Handle will be the right thing, and we should look at how to accommodate the other use cases.

In my mind an improved I/O library would look something like this:

...
-- At the very bottom is a type class 'RawIO' which represents a -- variety of stream-like types. class RawIO a where readInto :: Ptr Word8 -> Int -> IO () write :: ByteString -> IO ()

read :: Int -> IO ByteString read n = ByteString.createAndTrim n (\p -> readInto p n)

This is quite similar to the class of the same name in the GHC I/O library: -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int I think the Bytestring API should be a layer on top of this.

...

This definition is very minimal and most likely need to be expanded with operations such as 'close' and perhaps also 'seek'.

close/seek etc. are methods of the IODevice class in GHC's IO library. See http://darcs.haskell.org/packages/base/GHC/IO/Device.hs We have implementations of these classes for file descriptors, and it is my intention to have other implementations too: memory-mapped files, Windows HANDLEs, Bytestring (for testing), and Chan Word8 (for testing again: you write to the Handle, and the decoded bytes come out of the Chan). These APIs aren't currently "public", in the sense that they are exported by modules in the GHC.* hierarchy. I hope they'll help as a concrete start to the discussion of where the I/O library should be going, though.

...

We can now layer buffering on top.

...
-- Buffers for reading and writing are kept in a data type 'BufferedIO'. -- This data type need not be exposed. data BufferedIO = forall a. RawIO a => BufferedIO Buffer Buffer a

instance RawIO BufferedIO where readInto = readFromBufferInto -- Calls RawIO.readInto if needed write = writeToBuffer -- Calls RawIO.write if needed

-- Allocates buffers and returns a BufferedIO buffered :: RawIO a => a -> a buffered = ...

This is where things get a bit hairy. The upper layers often want to know about the buffer, for instance when it needs to be flushed, or for performance reasons - e.g. encoding/decoding needs to have direct access to both buffers. So in GHC's I/O library buffering is a new class BufferedIO, consumed by the higher layer, and you can make a BufferedIO instance trivially given a RawIO instance. Not everything has a RawIO instance though: memory-mapped files just appear as buffers. Incedentally, there have been various designs around this theme in the past, e.g. http://www.haskell.org/haskellwiki/Library/Streams (with various problems IMO, but there are some good ideas there). Cheers, Simon

5790

Age (days ago)

5810

Last active (days ago)

List overview

Download

31 comments

10 participants

participants (10)

Bryan O'Sullivan
Duncan Coutts
Ian Lynagh
Johan Tibell
Simon Marlow
Sterling Clover
taruti
Thomas DuBuisson
Yitzchak Gale
Yusaku Hashimoto