
Hello Duncan, Saturday, January 28, 2006, 3:08:04 PM, you wrote:
yes, i want to save exactly this bit of performance - after i optimized all other expenses on the path of text i/o
DC> There is a trade off, using mmap gives you zero-copy access to the page DC> cache however there is a not-insignificant performance overhead in DC> setting up and tearing down memory mappings. This is true on unix and DC> win32. So for small writes (eg 4k blocks) it is likely to be cheaper to DC> just use read()/write() on page aligned buffers rather than use mmap. DC> You would need to do benchmarks on each platform to see which method is DC> quicker. Given the code complexity that other people have mentioned I do DC> not think it would be worth it. i use 64k buffers and tried mmapped files last night. it's not easy to properly implement this and then to ensure good speed. at least windows very lazily flushes the buffers that was filled using mmap. when i wrote 1 gb file in this mode, windows tried to swap out all programs and itself but delayed writing of already unmapped data! DC> Using page aligned and sized buffers can help read()/write() performance DC> on some OSes like some of the BSDs. i will try to cutout aligned 64k buffer inside 128k block and will publish this code here so anyone can test it on his OS
in other words, i interested in having zero-wait operation both for reading and writing,
DC> As I said that is not possible with either read() or mmaped read. DC> Conversely it works automatically with write() and mmaped writes. DC> Zero-copy and zero-wait are not the same thing. i mean that mmap guarantee us zero-copy operation and i wish to use mmap in such way that zero-wait operation can be ensured DC> An important factor for optimising IO performance is using sufficiently DC> large block sizes to avoid making frequent kernel calls. That includes DC> read()/write() calls and mmap()/unmap() calls. that's true and easy to implement DC> Perhaps it is possible to move the complexity needed for the lazy DC> hPutStr case into the hPutStr implementation rather than the Handle DC> implementation. For example perhaps it'd be possible for the Handle to DC> just have one buffer but to have a method for writing out an external DC> buffer that is passed to it. Then hPutStr would allocate it's own DC> buffer, evaluate the string, copying it into the buffer. Then it would DC> call on the Handle to write out the buffer. The Handle would flush its DC> existing internal buffer and write out the extra buffer. 1) "lazy hPutStr" is not some rare case. we can't distinguish strict and lazy strings with current GHC and in any hPutStr invocation we should assume that evaluation of its argument can lead to side effects. that is the whole problem - we want to optimize hPutStr for the fast work with strict strings, but need to ensure that it will work correctly even with slow lazy strings having any side effects 2) the scheme above can be implemented using hPutBuf to write this additional buffer. it's just less efficient (although is not so much - memcpy works 10 times faster than traversing of [Char]) on the other side, Simon don't counted that locking itself is rather slow and using two locks instead of one lead to some slowness of his scheme, especially on small strings DC> Perhaps a better solution for your single-threaded operation case is to DC> have a handle type that is a bit specialised and does not have to deal DC> with the general case. If we're going to get a I/O system that supports DC> various layers and implementations then perhaps you could have an one DC> that implements only the minimal possible I/O class. That could not use DC> any thread locks (ie it'd not work predictably for multiple Haskell DC> threads) moreover - we can implement locking as special "converter" type, that can be applied to any mutable object - stream, collection, counter. that allows to simplify implementations and add locking only to those Streams where we really need it. like these: h <- openFD "test" >>= addUsingOfSelect >>= addBuffering 65536 >>= addCharEncoding utf8 >>= attachUserData dictionary >>= addLocking DC> and use mmap on the entire file. So you wouldn't get the normal DC> feature that a file extends at the end as it's written to, it'd need a DC> method for determining the size at the beginning or extending it in DC> large chunks. On the other hand it would not need to manage any buffers DC> since reads and writes would just be reads to/from memory. yes, i done it. but simple MapViewOfFile/UnMapViewOfFile don't work well enough, at least on writing. windows don't hurry to flush these buffers, even after unmap, and using of flushViewOfFile results in synchronous flushing of buffer to the cache. so i need to try doing flushViewOfFile in separate thread, like the GHC does for its i/o DC> So it'd depend on what the API of the low level layers of the new I/O DC> system are like as to whether such a simple and limited implementation DC> would be possible. it's no problem. m/m file just implements API that tells the user the address/size of the next buffer to fill/read. this interface used also for plain memory buffers and interprocess communications via shared memory: -- | Receive next buffer which contains data / must be filled with data vReceiveBuf :: (Integral size) => h -> ReadWrite -> m (Ptr a, size) -- | Release buffer after reading `len` bytes / Send buffer filled with `len` bytes vSendBuf :: (Integral size, Integral len) => h -> Ptr a -> size -> len -> m () -- Best regards, Bulat mailto:bulatz@HotPOP.com