
Hello John, Thursday, February 09, 2006, 3:19:30 AM, you wrote:
JM> If we had a good standard poll/select interface in System.IO then we JM> actually could implement a lot of concurrency as a library with no JM> (required) run-time overhead. I'd really like to see such a thing get JM> into the standard. Well, mainly it would just be a really useful thing JM> to have in general. If others think it is a good idea I can try to come JM> up with a suitable API and submit it to the repo.
i have delayed answering to this letter until i announced my Streams library. now i can say that such API already exists - in terms of my library you need just to write an transformer that intercepts vGetBuf/vPutBuf calls and pass them to the select/poll machinery. so you can write such transformer just now and every program that uses Streams will benefit from its usage. Converting programs that use Handles to using Streams should be also an easy task.
JM> I was actually asking for something much more modest, which was the JM> routine needed to pass them to the select/poll machinery. but yeah, what JM> you say is one of my expected uses of such a routine. Once a standard IO JM> library settles down, then I can start working on the exact API such a JM> routine would have. but if all will wait while the library settles down, it will never occur :) your work can change design of library, like the my library itself can change the shape of haskell' :) at this moment, i just developed the library which satisfy demands in extending current I/O library by new features, such as Unicode support, high speed, portability to other compilers, binary i/o, i/o for packed strings, and asynchronous i/o using methods other than select(). but i don't implement all these features actually, i just developed infrastructure, in which all these features can be easily added. unlike the System.IO library, you don't need to ask someone to implement new features or make corrections in foreign sources. you just need to develop module what implements this standard Stream interface and then it can be used as easy as transformers from the library itself as i understand this idea, transformer implementing async i/o should intercept vGetBuf/vPutBuf calls for the FDs, start the appropriate async operation, and then switch to another Haskell threads. the I/O manager thread should run select() in cycle and when the request is finished, wake up the appropriate thread. what's all. if you will ever need, this implementation can then be used to extend GHC's System.IO internals with the support for new async i/o managers (as i understand, select() is now supported by GHC, but poll(), kqueue() is not supported?). the only difference that my lib gives an opportunity to test this implementation without modifying GHC I/O internals, what is somewhat simpler. so, interface for async vGetBuf/vPutBuf routines should be the same as for read/write: type FD = Int vGetBuf_async :: FD -> Ptr a -> Int -> IO Int vPutBuf_async :: FD -> Ptr a -> Int -> IO Int i think that implementations for ghc and jhc should be slightly different, though, because of different ways to implement multi-threading. but the I/O manager should be the same - it just receives info about I/O operations to run and returns information about completed ones. ... well, this I/O manager should implement just one operation: performIO :: Request -> IO () type Request = (IOType, FD, Ptr a, Int, Notifier) data IOType = Read | Write | ... type Notifier = Result -> IO () data Result = OK Int | Fail ErrorInfo "performIO" starts new I/O operation. On the completion of this operation, Notifier is called with information about results of execution. so, for the GHC the following should work: vGetBuf_async fd ptr size = do done <- newMVar let notifier = putMVar done () performIO (Read, fd, ptr, size, notifier) takeMVar done for JHC, the body of "vGetBuf_async" may be different if you will find this interface reasonable, at least for the first iteration, i will develop appropriate transformer, so for you remains "only" the implementation of "performIO"
of course, Streams library is not some standard just now, and moreover - it is not compatible with JHC. the greatest problem is what i using type classes extensions available in GHC/Hugs what is not in H98 standard. so, i'm interested in pushing Haskell' to accept most advanced possible extensions in this area and, of course, in actual implementing these extensions in the Haskell compilers. alternative way to make Streams available to wider range of Haskell compilers is to strip support of streams working in monads other that IO.
JM> Don't take the absence of a feature in jhc to mean I don't like or want JM> that feature. There are a lot of things I don't have but that I'd JM> definitly want to see in the language simply because I was only shooting JM> for H98 to begin with and was more interested in a lot of the back end JM> stuff. You should figure out the nicest design that uses just the JM> extensions needed for the design you want. it could help us decide what JM> goes into haskell-prime to know what is absolutely needed for good JM> design and what is just nice to have. this simply means that the Streams library cannot be used with JHC, what is bad news, because it is even more rich than GHC's System.IO. jhc had chance to get modern I/O library. but it lost that chance :)
if you can make select/poll transformer, at least for testing purposes, that will be really great.
JM> Yeah, I will look into this. the basic select/poll call will have to be JM> pretty low level, but hopefully it will allow interesting higher level JM> constructs based on your streams or an evolution of them. please look. at this moment Sreams library lacks only a few important features, already implemented in GHC's System.IO: sockets, line buffering and async i/o. moreover, i don't have an experience in implementing the async i/o, so foreign help is really necessary addressing these three issues will allow to propose the Streams library as possible System.IO replacement. and as you can see, implementing the "performIO" will allow us to use async i/o for all possible i/o operations, including "get/put_" or vGetContents, for example -- Best regards, Bulat mailto:bulatz@HotPOP.com

On 09.02 22:24, Bulat Ziganshin wrote:
as i understand this idea, transformer implementing async i/o should intercept vGetBuf/vPutBuf calls for the FDs, start the appropriate async operation, and then switch to another Haskell threads. the I/O manager thread should run select() in cycle and when the request is finished, wake up the appropriate thread. what's all. if you will ever need, this implementation can then be used to extend GHC's System.IO internals with the support for new async i/o managers (as i understand, select() is now supported by GHC, but poll(), kqueue() is not supported?). the only difference that my lib gives an opportunity to test this implementation without modifying GHC I/O internals, what is somewhat simpler. so, interface for async vGetBuf/vPutBuf routines should be the same as for read/write:
type FD = Int vGetBuf_async :: FD -> Ptr a -> Int -> IO Int vPutBuf_async :: FD -> Ptr a -> Int -> IO Int
Please don't fix FD = Int, this is not true on some systems, and when implementing efficient sockets one usually wants to hold more complex state.
JM> Don't take the absence of a feature in jhc to mean I don't like or want JM> that feature. There are a lot of things I don't have but that I'd JM> definitly want to see in the language simply because I was only shooting JM> for H98 to begin with and was more interested in a lot of the back end JM> stuff. You should figure out the nicest design that uses just the JM> extensions needed for the design you want. it could help us decide what JM> goes into haskell-prime to know what is absolutely needed for good JM> design and what is just nice to have.
this simply means that the Streams library cannot be used with JHC, what is bad news, because it is even more rich than GHC's System.IO. jhc had chance to get modern I/O library. but it lost that chance :)
I think it is more like "all haskell-prime programs". Seriously, if we design a new IO subsystem it would be quite nice to be able to use it from standard conforming programs. Maybe things can be reformulated in a way that will be compatible with haskell-prime.
please look. at this moment Sreams library lacks only a few important features, already implemented in GHC's System.IO: sockets, line buffering and async i/o. moreover, i don't have an experience in implementing the async i/o, so foreign help is really necessary
If you want I can look at getting network-alt to implement the interface. - Einar Karttunen

Hello Einar, Friday, February 10, 2006, 2:09:08 AM, you wrote:
as i understand this idea, transformer implementing async i/o should intercept vGetBuf/vPutBuf calls for the FDs, start the appropriate
type FD = Int vGetBuf_async :: FD -> Ptr a -> Int -> IO Int vPutBuf_async :: FD -> Ptr a -> Int -> IO Int
EK> Please don't fix FD = Int, this is not true on some systems, EK> and when implementing efficient sockets one usually wants EK> to hold more complex state. the heart of the library is class Stream. both "File" and "Socket" should implement this interface. just now i use plain "FD" to represent files, but that is temporary solution - really file also must carry additional information: filename, open mode, open/closed state. This "File" will be an abstract datatype, what can be based not on FD in other operating systems. The same applies to the "Socket". it can be any type what carry enough information to work with network i/o. implementation of async i/o should have a form of Stream Transformer, which intercepts only the vGetBuf/vPutBuf operations and pass other operations as is: data AsyncFD = AsyncFD FD ... {-additional fields-} instance Stream IO AsyncFD where vIsEOF (AsyncFD h ...) = vIsEOF h vClose (AsyncFD h ...) = vClose h ........ vGetBuf (AsyncFD h ...) ptr size = vGetBuf_async h ptr size as far as i see, the select/epoll don't need to know anything about file/socket except for its descriptor(FD) ? in this case we can make the Async transformer universal, compatible with both files and sockets: data Async h = Async h ... {-additional fields-} addAsyncIO h = do ..... return (Async h ...) instance (Stream IO h) => Stream IO (Async h) where vIsEOF (Async h ...) = vIsEOF h vClose (Async h ...) = vClose h ........ vGetBuf (Async h ...) ptr size = doBlockingOp "read" h $ vGetBuf h ptr size this transformer can be made universal, supporting select/epoll/... implementations via additional parameter to the "addAsyncIO", or it can be a series of transformers, one for each method of async i/o. if we have developed common API for async i/o as John suggested, then one universal transformer working via this API can be used
this simply means that the Streams library cannot be used with JHC, what is bad news, because it is even more rich than GHC's System.IO. jhc had chance to get modern I/O library. but it lost that chance :)
EK> I think it is more like "all haskell-prime programs". Seriously, EK> if we design a new IO subsystem it would be quite nice to be EK> able to use it from standard conforming programs. EK> Maybe things can be reformulated in a way that will be compatible EK> with haskell-prime. or haskell-prime can be reformulated ;) as h' will be defined in first iteration, i will check my lib and say to comittee what i will need to omit from my library to be compatible with this standard. then we can decide :) just at current moment, support for complex class hierrachies outside of Hugs/GHC is very poor
please look. at this moment Sreams library lacks only a few important features, already implemented in GHC's System.IO: sockets, line buffering and async i/o. moreover, i don't have an experience in implementing the async i/o, so foreign help is really necessary
EK> If you want I can look at getting network-alt to implement the EK> interface. please look. basically, Sock should be made an instance of Stream with implementations of vGetBuf/vPutBuf and other operations, as much as possible. it should be easy, see the FD/Handle instances of Stream for example and your support of select/poll should go into the tranformer(s). this will allow to use async i/o not only for your own Sock type, but for files, for the sockets from the old library and so on -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
Hello Einar,
Friday, February 10, 2006, 2:09:08 AM, you wrote:
as i understand this idea, transformer implementing async i/o should intercept vGetBuf/vPutBuf calls for the FDs, start the appropriate
type FD = Int vGetBuf_async :: FD -> Ptr a -> Int -> IO Int vPutBuf_async :: FD -> Ptr a -> Int -> IO Int
EK> Please don't fix FD = Int, this is not true on some systems, EK> and when implementing efficient sockets one usually wants EK> to hold more complex state.
the heart of the library is class Stream. both "File" and "Socket" should implement this interface. just now i use plain "FD" to represent files, but that is temporary solution - really file also must carry additional information: filename, open mode, open/closed state. This "File" will be an abstract datatype, what can be based not on FD in other operating systems.
The same applies to the "Socket". it can be any type what carry enough information to work with network i/o.
implementation of async i/o should have a form of Stream Transformer, which intercepts only the vGetBuf/vPutBuf operations and pass other operations as is:
I don't think async I/O is a stream transformer, fitting it into the stream hierarchy seems artificial to me. It is just another way of doing I/O directly to/from file descriptors. If your basic operation to read from an FD is readFD :: FD -> Int -> Ptr Word8 -> IO Int then an async I/O layer simply provides you with the exact same interface, but with an implementation that doesn't block other threads. It is part of the file descriptor interface, not a stream transformer. Also, you probably need readNonBlockingFD :: FD -> Int -> Ptr Word8 -> IO Int isReadyFD :: FD -> IO Bool in fact, I think this should be the basic API, since you can implement readFD in terms of it. (readNonBlockingFD always reads at least one byte, blocking until some data is available). This is used to partially fill an input buffer with the available data, for example. One problem here is that in order to implement readNonBlockingFD on Unix you have to put the FD into O_NONBLOCK mode, which due to misdesign of the Unix API affects other users of the same file descriptor, including other programs. GHC suffers from this problem. Cheers, Simon

Hello Simon, Friday, February 10, 2006, 3:26:30 PM, you wrote:
as i understand this idea, transformer implementing async i/o should intercept vGetBuf/vPutBuf calls for the FDs, start the appropriate
type FD = Int vGetBuf_async :: FD -> Ptr a -> Int -> IO Int vPutBuf_async :: FD -> Ptr a -> Int -> IO Int
EK> Please don't fix FD = Int, this is not true on some systems, EK> and when implementing efficient sockets one usually wants EK> to hold more complex state.
the heart of the library is class Stream. both "File" and "Socket" should implement this interface. just now i use plain "FD" to represent files, but that is temporary solution - really file also must carry additional information: filename, open mode, open/closed state. This "File" will be an abstract datatype, what can be based not on FD in other operating systems.
The same applies to the "Socket". it can be any type what carry enough information to work with network i/o.
implementation of async i/o should have a form of Stream Transformer, which intercepts only the vGetBuf/vPutBuf operations and pass other operations as is:
SM> I don't think async I/O is a stream transformer, fitting it into the SM> stream hierarchy seems artificial to me. yes, it is possible - what i'm trying to implement everything as tranformer, independent of real necessity. i really thinks that idea of transformers fit every need in extending functionality it is a list of my reasons to implement this as transformer: 1) there is no "common FD" interface. module System.FD implements something, but it is a really interface only for file i/o. it's used partially in System.MMFile, implementing memory-mapped files, and i think these fd* operations will be used to partially implement Socket operations, but something will be different, including using recv/send instead of read/write to implement GetBuf/PutBuf operations. so, there is no common "instance Stream FD", but different instances for files, memory-mapped files and sockets. As Einar just mentioned, Socket dataype will include information what absent in File datatype. So, these 3 types have in common using FD to implement some of its operations, but some operations will be different and internal dataype structures will be different. Transformer is an ideal way to just reimplement vGetBuf/vPutBuf operations while passing through all the rest. Without it, instead of 3 methods of doing I/O (mmap/read/recv) you will need to implement all the 5 (mmap/read/recv/readAsync/recvAsync) - it's even without counting selct, epoll and kqueue separately 2) as you can see in epoll()-based implementation of async i/o in alt-network library, Einar attaches additional data (read/write queues) to the FD to support epoll() interface. These data will be different for select, epoll, kqueue and other methods of async i/o. At least, without async i/o no information should be needed. Transformer is an ideal way to attach additional data to the file/socket without changing of "raw" datatype. Again, otherwise you will need to attach all these data to the raw file, duplicate this work with the raw socket and then repeat this for select, epoll and other async i/o methods on the other side, reasons for your proposal, as i see: 1) if FD will incorporate async i/o support, the System.FD library will become much more useful - anyone using low-level fd* functions will get async i/o support for free but there is another defeciency in the System.FD library - it doesn't include support for the files>4Gb and files with unicode filenames under Windows. it seems natural to include this support in fd* too. now let's see. you are proposing to include in fd* implementation support for files, sockets, various async i/o methods and what's not all. are you not think that this library will become a successor of Handle library, implementing all possible fucntionality and don't giving 3rd-party libraries chances to change anything partially? i propose instead to divide library into the small manageable pieces what can be easily stidied/modified/replaced and that brings something really usefull only when used together. if what means that low-level fd* interface can't be used even to work with raw files without great restrictions (no Unicode filenames in windows, no async i/o) then it will mean just this. SM> It is just another way of doing I/O directly to/from file descriptors. SM> If your basic operation to read from an FD is SM> readFD :: FD -> Int -> Ptr Word8 -> IO Int SM> then an async I/O layer simply provides you with the exact same SM> interface, but with an implementation that doesn't block other threads. SM> It is part of the file descriptor interface, not a stream transformer. SM> Also, you probably need SM> readNonBlockingFD :: FD -> Int -> Ptr Word8 -> IO Int SM> isReadyFD :: FD -> IO Bool SM> in fact, I think this should be the basic API, since you can implement SM> readFD in terms of it. (readNonBlockingFD always reads at least one SM> byte, blocking until some data is available). This is used to partially SM> fill an input buffer with the available data, for example. this can be in basic API, but not in basic implementation :))) really, i think that you mix two things - readNonBlockingFD call that can fill buffer only partially and readAsync call that use some I/O manager to perform other Haskell threads while data are read well, i agree that should be two GetBuf variants in the Stream interface - greedy and non-greedy. say, vGetBuf and vGetBufNonBlocking. vPutBuf also need two variants? then, may be LineBuffering and BlockBuffering should use vGetBufNonBlocking and vGetBuf, respectively? but i don't know anything about implementation. is the difference between readNonBlockingFD and readFD calls only in the O_NONBLOCK mode of file handle, or different functions are used? what for Windows? for sockets? how this interacts with the async i/o? SM> One problem here is that in order to implement readNonBlockingFD on Unix SM> you have to put the FD into O_NONBLOCK mode, which due to misdesign of SM> the Unix API affects other users of the same file descriptor, including SM> other programs. GHC suffers from this problem. what means that it is better to decide at "open" stage whether this file will be used with readNonBlockingFD or with simple readFD? -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
SM> I don't think async I/O is a stream transformer, fitting it into the SM> stream hierarchy seems artificial to me.
yes, it is possible - what i'm trying to implement everything as tranformer, independent of real necessity. i really thinks that idea of transformers fit every need in extending functionality
it is a list of my reasons to implement this as transformer:
1) there is no "common FD" interface.
Well, there's the unix package. In theory, System.IO should layer on top of System.Posix or System.Win32, depending on the platform. In practice we extract the important bits of System.Posix and put them in the base package to avoid circular dependencies. The current implementation could use some cleaning up here (eg. FD vs. Fd).
on the other side, reasons for your proposal, as i see:
1) if FD will incorporate async i/o support, the System.FD library will become much more useful - anyone using low-level fd* functions will get async i/o support for free
but there is another defeciency in the System.FD library - it doesn't include support for the files>4Gb
Yes it does!
and files with unicode filenames under Windows.
Under Windows I believe we should be using a Win32-specific substrate on which to build the I/O library.
it seems natural to include this support in fd* too.
now let's see. you are proposing to include in fd* implementation support for files, sockets, various async i/o methods and what's not all. are you not think that this library will become a successor of Handle library, implementing all possible fucntionality and don't giving 3rd-party libraries chances to change anything partially?
Not at all - I'm just suggesting that there should be an API to FD-based I/O, and that concurrency-safety can be layered on top of this, providing exactly the same API but with concurrency-safety built in.
i think that you mix two things - readNonBlockingFD call that can fill buffer only partially and readAsync call that use some I/O manager to perform other Haskell threads while data are read
Why do you want to expose readAsync at all?
well, i agree that should be two GetBuf variants in the Stream interface - greedy and non-greedy. say, vGetBuf and vGetBufNonBlocking. vPutBuf also need two variants?
then, may be LineBuffering and BlockBuffering should use vGetBufNonBlocking and vGetBuf, respectively?
but i don't know anything about implementation. is the difference between readNonBlockingFD and readFD calls only in the O_NONBLOCK mode of file handle, or different functions are used? what for Windows? for sockets? how this interacts with the async i/o?
Never mind about this - just assume readNonBlockingFD as your lowest-level primitive, and we can provide an implementation of readNonBlockingFD that uses select/poll/whatever underneath. I imagine we'll stop using O_NONBLOCK. The Windows version will look different at this level, because we should be using Win32 native I/O, i.e HANDLE instead of FD, but it will have a primitive similar to readNonBlockingFD, also concurrency-safe. Cheers, SImon

On Fri, Feb 10, 2006 at 12:26:30PM +0000, Simon Marlow wrote:
in fact, I think this should be the basic API, since you can implement readFD in terms of it. (readNonBlockingFD always reads at least one byte, blocking until some data is available). This is used to partially fill an input buffer with the available data, for example.
this is the behavior of standard file descriptors. not non-blocking ones. We should definitly not guarentee reads fill an input buffer fully at least for the lowest level calls, that is the job for the layers on top of it.
One problem here is that in order to implement readNonBlockingFD on Unix you have to put the FD into O_NONBLOCK mode, which due to misdesign of the Unix API affects other users of the same file descriptor, including other programs. GHC suffers from this problem.
non blocking ones will return immediatly if no data is available rather than make sure they return at least one byte. In any case, the correct solution in the circumstances is to provide a select/poll/epoll/devpoll interface. It is nicer than setting NON_BLOCKING and more efficient. This is largely orthogonal to the Streams design though. John -- John Meacham - ⑆repetae.net⑆john⑈

John Meacham wrote:
On Fri, Feb 10, 2006 at 12:26:30PM +0000, Simon Marlow wrote:
in fact, I think this should be the basic API, since you can implement readFD in terms of it. (readNonBlockingFD always reads at least one byte, blocking until some data is available). This is used to partially fill an input buffer with the available data, for example.
this is the behavior of standard file descriptors. not non-blocking ones. We should definitly not guarentee reads fill an input buffer fully at least for the lowest level calls, that is the job for the layers on top of it.
You're right - I was slightly confused there. O_NONBLOCK isn't necessary to implement readNonBlockingFD.
One problem here is that in order to implement readNonBlockingFD on Unix you have to put the FD into O_NONBLOCK mode, which due to misdesign of the Unix API affects other users of the same file descriptor, including other programs. GHC suffers from this problem.
non blocking ones will return immediatly if no data is available rather than make sure they return at least one byte.
In any case, the correct solution in the circumstances is to provide a select/poll/epoll/devpoll interface. It is nicer than setting NON_BLOCKING and more efficient. This is largely orthogonal to the Streams design though.
I think the reason we set O_NONBLOCK is so that we don't have to test with select() before reading, we can just call read(). If you don't use O_NONBLOCK, you need two system calls to read/write instead of one. This probably isn't a big deal, given that we're buffering anyway. I agree that a generic select/poll interface would be nice. If it was in terms of Handles though, that's not useful for implementing the I/O library. If it was in terms of FDs, that's not portable - we'd need a separate one for Windows. How would you design it? Cheers, Simon

On Tue, Feb 21, 2006 at 01:15:48PM +0000, Simon Marlow wrote:
I agree that a generic select/poll interface would be nice. If it was in terms of Handles though, that's not useful for implementing the I/O library. If it was in terms of FDs, that's not portable - we'd need a separate one for Windows. How would you design it?
Yeah, this is why I have held off on a specific design until we get a better idea of what the new IO library will look like. I am thinking it will have to involve some abstract event source type with primitive routines for creating this type from things like handles,fds, or anything else we might want to wait on. so it is system-extendable in that sense in that implementations can just provide new event source creation primitives. The other advantage of this sort of thing is that you would want things like the X11 library to be able to provide an event source for when an X11 event is ready to be read so you can seamlessly integrate your X11 loop into your main one. The X11 library would create such an event source from the underlying socket but just return the abstract event source so the implementation can change (perhaps when using a shared memory based system like D11 for instance) without affecting how the user uses the library in a portable way. I will try to come up with something concrete for us to look at that we can modify as the rest of the IO library congeals. John -- John Meacham - ⑆repetae.net⑆john⑈

On Tue, 21 Feb 2006, John Meacham wrote:
Yeah, this is why I have held off on a specific design until we get a better idea of what the new IO library will look like. I am thinking it will have to involve some abstract event source type with primitive routines for creating this type from things like handles,fds, or anything else we might want to wait on. so it is system-extendable in that sense in that implementations can just provide new event source creation primitives.
The other advantage of this sort of thing is that you would want things like the X11 library to be able to provide an event source for when an X11 event is ready to be read so you can seamlessly integrate your X11 loop into your main one.
The X11 library would create such an event source from the underlying socket but just return the abstract event source so the implementation can change (perhaps when using a shared memory based system like D11 for instance) without affecting how the user uses the library in a portable way.
Could an application reasonably choose between several dispatching systems? For example, I'm working on a Macintosh here, where instead of X11 Apple provides its NextStep based GUI with its own apparently fairly well defined event system. I don't know that system very well, but a MacOS Haskell GUI application would probably want to look in that direction for event integration. Meanwhile, I might want to work with kqueue, on the same platform, because it supports filesystem events along with the usual select stuff. Donn Cave, donn@drizzle.com

Hello Donn, Wednesday, February 22, 2006, 4:23:28 AM, you wrote: DC> Could an application reasonably choose between several dispatching DC> systems? For example, I'm working on a Macintosh here, where instead DC> of X11 Apple provides its NextStep based GUI with its own apparently DC> fairly well defined event system. I don't know that system very well, DC> but a MacOS Haskell GUI application would probably want to look in DC> that direction for event integration. Meanwhile, I might want to DC> work with kqueue, on the same platform, because it supports filesystem DC> events along with the usual select stuff. this depends not of John's design of this low-level lib, but on design of higher-level libs that will use it. just for example - Streams lib will allow to switch this manager even at runtime -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello John, Wednesday, February 22, 2006, 3:32:34 AM, you wrote:
I agree that a generic select/poll interface would be nice. If it was in terms of Handles though, that's not useful for implementing the I/O library. If it was in terms of FDs, that's not portable - we'd need a separate one for Windows. How would you design it?
JM> Yeah, this is why I have held off on a specific design until we get a JM> better idea of what the new IO library will look like. I am thinking it JM> will have to involve some abstract event source type with primitive JM> routines for creating this type from things like handles,fds, or JM> anything else we might want to wait on. so it is system-extendable in JM> that sense in that implementations can just provide new event source JM> creation primitives. i don't think that we need some fixed interface. it can be just parameterized: type ReadBuf h = h -> Ptr () -> Int -> IO Int type WriteBuf h = h -> Ptr () -> Int -> IO () so Unix implementations will use FD, Windows implementation will work with Handle and all will be happy :) JM> The other advantage of this sort of thing is that you would want things JM> like the X11 library to be able to provide an event source for when an JM> X11 event is ready to be read so you can seamlessly integrate your X11 JM> loop into your main one. you don't need to have the same interface for the X11 and files async operations. The library can export "ReadBuf FD", "WriteBuf FD" and "X11Op" implementations and you will use each one in appropriate place. JM> The X11 library would create such an event source from the underlying JM> socket but just return the abstract event source so the implementation JM> can change (perhaps when using a shared memory based system like D11 for JM> instance) without affecting how the user uses the library in a portable JM> way. JM> I will try to come up with something concrete for us to look at that we JM> can modify as the rest of the IO library congeals. as i already said, this IO library will not emerge by itself :) there is my library which use Stream class so it can accept any form of async library. there is a lib by Marcin Kowalczyk. and there is Einar's Alt-Network lib which already implements 2 async methods. so what we need is to convert Einar's work to single interface and make a Stream interface around this. the later will be better accomplished by me, but i don't know whether he planned to work on former. i can also do it, but without any testing because i still don't have any Unix installed :) Streams library by itself is now unix-compilable, thanks to Peter Simons -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Wed, Feb 22, 2006 at 03:28:26PM +0300, bulat.ziganshin@gmail.com wrote:
JM> Yeah, this is why I have held off on a specific design until we get a JM> better idea of what the new IO library will look like. I am thinking it JM> will have to involve some abstract event source type with primitive JM> routines for creating this type from things like handles,fds, or JM> anything else we might want to wait on. so it is system-extendable in JM> that sense in that implementations can just provide new event source JM> creation primitives.
i don't think that we need some fixed interface. it can be just parameterized:
type ReadBuf h = h -> Ptr () -> Int -> IO Int type WriteBuf h = h -> Ptr () -> Int -> IO ()
so Unix implementations will use FD, Windows implementation will work with Handle and all will be happy :)
I think you misunderstand, the poll interface will need to accept a _set_ of events to wait for. This is independent of the buffer interface and lower level than async IO (for the traditional definition of async IO). Not all event sources will necessarily be FDs on unix or handles on windows, if say a haskell RTS integrates with a systems built in event loop (such as the OSX example mentioned in another email).
JM> The other advantage of this sort of thing is that you would want things JM> like the X11 library to be able to provide an event source for when an JM> X11 event is ready to be read so you can seamlessly integrate your X11 JM> loop into your main one.
you don't need to have the same interface for the X11 and files async operations. The library can export "ReadBuf FD", "WriteBuf FD" and "X11Op" implementations and you will use each one in appropriate place.
You can't treat them as independent types at the poll site, since you need to wait on a set of events from potentially different types of sources.
JM> The X11 library would create such an event source from the underlying JM> socket but just return the abstract event source so the implementation JM> can change (perhaps when using a shared memory based system like D11 for JM> instance) without affecting how the user uses the library in a portable JM> way.
JM> I will try to come up with something concrete for us to look at that we JM> can modify as the rest of the IO library congeals.
as i already said, this IO library will not emerge by itself :) there is my library which use Stream class so it can accept any form of async library. there is a lib by Marcin Kowalczyk. and there is Einar's Alt-Network lib which already implements 2 async methods. so what we need is to convert Einar's work to single interface and make a Stream interface around this. the later will be better accomplished by me, but i don't know whether he planned to work on former. i can also do it, but without any testing because i still don't have any Unix installed :) Streams library by itself is now unix-compilable, thanks to Peter Simons
I am not quite sure what you mean by this. the poll/select interface will be lower level than your Streams library and fairly independent. The async methods I have seen have been non-blocking based and tend to be system dependent, which is different than what the poll/select interface is about. the poll/select interface is about providing the mininimum functionality to allow _portable_ async applications and libraries to be written. John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Wednesday, February 22, 2006, 5:11:04 PM, you wrote: it seems that we don't understand each other. let's be concrete: my library reads and writes files. it uses read/write/recv/send to do this in blocking manner. now i want to have another operations what will have the SAME INTERFACES but internally use something like poll in order to allow Haskell RTS support i/o overlapping with "user threads". agree? these async operations should had the same interface as blocked ones, but that is impossible for Windows, so i propose to had slightly more general interfaces: type ReadBuf h = h -> Ptr () -> Int -> IO Int type WriteBuf h = h -> Ptr () -> Int -> IO () these are functions which my library will call, all other drom my viewpoint are internal details of this async lib. i don't know (i really don't know) how to build this list of events and how to manage it. the same is for X11 library - async lib just should provide alternative implementation of some operations and don't require from the user of async lib to manage eventlist. it seems like you want to define something more low-level, but i'm as i/o library author will be happy just to call some non-blocking equivalents of read/write provided by async lib. and it seems that i'm not competent enough to discuss details of its internal implementation ;) on the other side, you don't need to wait while some i/o library will be make standard. anyway such library will need non-blocking implementations of read() and write(), so this is the high-level interface that async lib should implement. agree?
JM> Yeah, this is why I have held off on a specific design until we get a JM> better idea of what the new IO library will look like. I am thinking it JM> will have to involve some abstract event source type with primitive JM> routines for creating this type from things like handles,fds, or JM> anything else we might want to wait on. so it is system-extendable in JM> that sense in that implementations can just provide new event source JM> creation primitives.
i don't think that we need some fixed interface. it can be just parameterized:
type ReadBuf h = h -> Ptr () -> Int -> IO Int type WriteBuf h = h -> Ptr () -> Int -> IO ()
so Unix implementations will use FD, Windows implementation will work with Handle and all will be happy :)
JM> I think you misunderstand, the poll interface will need to accept a JM> _set_ of events to wait for. This is independent of the buffer interface JM> and lower level than async IO (for the traditional definition of async JM> IO). Not all event sources will necessarily be FDs on unix or handles on JM> windows, if say a haskell RTS integrates with a systems built in event JM> loop (such as the OSX example mentioned in another email).
JM> The other advantage of this sort of thing is that you would want things JM> like the X11 library to be able to provide an event source for when an JM> X11 event is ready to be read so you can seamlessly integrate your X11 JM> loop into your main one.
you don't need to have the same interface for the X11 and files async operations. The library can export "ReadBuf FD", "WriteBuf FD" and "X11Op" implementations and you will use each one in appropriate place.
JM> You can't treat them as independent types at the poll site, since you JM> need to wait on a set of events from potentially different types of JM> sources.
JM> The X11 library would create such an event source from the underlying JM> socket but just return the abstract event source so the implementation JM> can change (perhaps when using a shared memory based system like D11 for JM> instance) without affecting how the user uses the library in a portable JM> way.
JM> I will try to come up with something concrete for us to look at that we JM> can modify as the rest of the IO library congeals.
as i already said, this IO library will not emerge by itself :) there is my library which use Stream class so it can accept any form of async library. there is a lib by Marcin Kowalczyk. and there is Einar's Alt-Network lib which already implements 2 async methods. so what we need is to convert Einar's work to single interface and make a Stream interface around this. the later will be better accomplished by me, but i don't know whether he planned to work on former. i can also do it, but without any testing because i still don't have any Unix installed :) Streams library by itself is now unix-compilable, thanks to Peter Simons
JM> I am not quite sure what you mean by this. the poll/select interface JM> will be lower level than your Streams library and fairly independent. JM> The async methods I have seen have been non-blocking based and tend to JM> be system dependent, which is different than what the poll/select JM> interface is about. the poll/select interface is about providing the JM> mininimum functionality to allow _portable_ async applications and JM> libraries to be written. JM> John -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Simon Marlow
I think the reason we set O_NONBLOCK is so that we don't have to test with select() before reading, we can just call read(). If you don't use O_NONBLOCK, you need two system calls to read/write instead of one. This probably isn't a big deal, given that we're buffering anyway.
I've heard that for Linux sockets select/poll/epoll might say that data is available where it in fact is not (it may be triggered by socket activity which doesn't result in new data). Select/poll/epoll are designed to work primarily with non-blocking I/O. In my implementation of my language pthreads are optionally used in the way very similar to your paper "Extending the Haskell Foreign Function Interface with Concurrency". This means that I have a choice of using blocking or non-blocking I/O for a given descriptor, both work similarly, but blocking I/O takes up an OS thread. Each file has a blocking flag kept in its data. A non-blocking I/O is done in the same thread. The timer signal is kept active, so if another process has switched the file to blocking, it will be woken up by the timer signal and won't block the whole process. The thread performing the I/O will only waste its timeslices. A blocking I/O temporarily releases access to the the runtime, setting up a worker OS thread for other threads if needed etc. As an optimization, if there are no other threads to be run by the scheduler (no running threads, nor waiting for I/O, nor waiting for a timeout, and we are the thread which handles system signals), then runtime is not physically released (no worker OS threads, no unlinking of the thread structure), only the signal mask is changed so the visible semantics is maintained. This is common to other such potentially blocking system calls. I don't know if GHC does something similar. (I recently made it working even if a thread that my runtime has not seen before wants to access the runtime. If the optimization of not physically releasing the runtime was in place, the new thread performs the actions on behalf of the previous thread.) In either case EAGAIN causes the thread to block, asking the scheduler to wake it up when I/O is ready. This means that even if some other process has switched the file to non-blocking, the process will only do unnecessary context switches. It's important to make this working when the blocking flag is out of sync. The Unix blocking flag is not even associated with the descriptor but with an open file, i.e. it's shared with descriptors created by dup(), so it might be hard to predict without asking the OS. If pthreads are available, stdin, stdout and stderr are kept blocking, because they are often shared with other processes, and making them blocking works well. Without pthreads they are non-blocking, because I felt it was more important to not waste timeslices of the thread performing I/O than to be nice to other processes. In both cases pipes and sockets are non-blocking, while named files are blocking. The programmer can change the blocking state explicitly, but this is probably useful only when setting up redirections before exec*(). -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

On Fri, 24 Feb 2006, Marcin 'Qrczak' Kowalczyk wrote:
Simon Marlow
writes: I think the reason we set O_NONBLOCK is so that we don't have to test with select() before reading, we can just call read(). If you don't use O_NONBLOCK, you need two system calls to read/write instead of one. This probably isn't a big deal, given that we're buffering anyway.
I've heard that for Linux sockets select/poll/epoll might say that data is available where it in fact is not (it may be triggered by socket activity which doesn't result in new data).
Only UDP, from anything I'm able to find out about this. Apparently a UDP packet may turn out to be invalid in some respect, to be discovered too late during the recvmsg system call. In a similar situation, the TCP layer would have already accounted for this by the time select sees anything. Likewise of course any local slow devices like a tty, pipe etc.
Select/poll/epoll are designed to work primarily with non-blocking I/O.
That's what the Linux kernel developers say, anyway, since it would be inconvenient for them to fix this, even though it apparently violates the POSIX specification. Donn Cave, donn@drizzle.com

Marcin 'Qrczak' Kowalczyk wrote:
Simon Marlow
writes: I think the reason we set O_NONBLOCK is so that we don't have to test with select() before reading, we can just call read(). If you don't use O_NONBLOCK, you need two system calls to read/write instead of one. This probably isn't a big deal, given that we're buffering anyway.
I've heard that for Linux sockets select/poll/epoll might say that data is available where it in fact is not (it may be triggered by socket activity which doesn't result in new data). Select/poll/epoll are designed to work primarily with non-blocking I/O.
Ah yes, you're right. It's important for us to guarantee that calling read() can't block, so even if we select() first there's a race condition in that someone else can call read() before the current thread.
In my implementation of my language pthreads are optionally used in the way very similar to your paper "Extending the Haskell Foreign Function Interface with Concurrency". This means that I have a choice of using blocking or non-blocking I/O for a given descriptor, both work similarly, but blocking I/O takes up an OS thread. Each file has a blocking flag kept in its data.
That's an interesting idea, and neatly solves the problem of making stdin/stdout/stderr non-blocking, but at the expense of some heavyweight OS-thread blocking. Cheers, Simon

Simon Marlow
I agree that a generic select/poll interface would be nice.
We must be aware that epoll (and I think kqueue too) registers event sources in advance, separately from waiting, which is its primary advantage over poll. The interface should use this model because it's easy to implement it in terms of select/poll without losing efficiency, but the converse would lose the benefit of epoll. (My runtime has a generic interface on the C level only, for hooking another implementation to be used by the scheduler.) -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/

Hello Bulat, Thursday, February 09, 2006, 10:24:59 PM, you wrote:
if you can make select/poll transformer, at least for testing purposes, that will be really great.
JM>> Yeah, I will look into this. the basic select/poll call will have to be JM>> pretty low level, but hopefully it will allow interesting higher level JM>> constructs based on your streams or an evolution of them. sorry, John, as i now see, Einar already implemented select/epoll machinery in the alt-network lib. moreover, now he promised to extract this functionality to make universal async i/o layer library. the only thing that i don't know - whether he is ready to develop universal API for these modules, as you initially proposed. as this universal API will be done, i will roll up the Stream transormer that uses it and therefore allows async i/o both with files and sockets on any platform where this API can be implemented -- Best regards, Bulat mailto:bulatz@HotPOP.com
participants (8)
-
Bulat Ziganshin
-
Bulat Ziganshin
-
bulat.ziganshin@gmail.com
-
Donn Cave
-
Einar Karttunen
-
John Meacham
-
Marcin 'Qrczak' Kowalczyk
-
Simon Marlow