RE: Raw I/O library proposal, second (more pragmatic) draft

newer
RE: Libraries and hierarchies

older
RE: Libraries and hierarchies

Simon Marlow

1 Aug 2003 1 Aug '03

9:50 a.m.

...

-- More comments, please. Bad names? Important missing functionality? -- Still unimplementable?

Basically I think this is good. I think the naming should be rethought, but that can be done later. I wanted to float a generalisation of this scheme, though. I'm wondering whether it might be a good idea to make InputStream and OutputStream into type classes, the advantage being that this makes streams more extensible - one example is that memory-mapped files fit neatly into this framework. I already have 6 examples of things that can have streams layered on top (or *are* streams), and there are almost certainly more. Here's some signatures for you to peruse: class Stream s where closeStream :: s -> IO () streamSetBuffering :: s -> BufferMode -> IO () streamGetBuffering :: s -> IO BufferMode streamFlush :: s -> IO () isEOS :: s -> IO Bool class InputStream s where streamGet :: s -> IO Word8 streamReadBuffer :: s -> Integer -> Buffer -> IO () streamGetBuffer :: s -> Integer -> IO ImmutableBuffer streamGetContents :: s -> IO [Word8] class OutputStream s where streamPut :: s -> Word8 -> IO () streamPuts :: s -> [Word8] -> IO () streamWriteBuffer :: s -> Integer -> Buffer -> IO () -- Files in the filesystem, with access rights data File data FileInputStream -- instance Stream, InputStream data FileOutputStream -- instance Stream, OutputStream -- Memory-mapped files mapFile :: File -> FileOffset -> Integer -> MapMode -> IO MappedFile data MappedFile data MappedFileInputStream -- instance Stream, InputStream data MappedFileOutputStream -- instance Stream, OutputStream instance MArray MappedFile Word8 IO -- so we can read/write it directly mappedFileInputStream :: MappedFile -> Integer -> Integer -> IO MappedFileInputStream mappedFileOutputStream :: MappedFile -> Integer -> Integer -> IO MappedFileOutputStream -- Pipes data Pipe -- a pipe with a read and a write end instance Stream Pipe instance InputStream Pipe instance OutputStream Pipe createPipe :: IO Pipe closePipe :: Pipe -> IO () -- Streams from arrays: data ArrayInputStream instance InputStream ArrayInputStream data ArrayOutputStream instance OutputStream ArrayOutputStream iarrayInputStream :: (Ix i, IArray a Word8) => a i Word8 -> i -> ArrayInputStream marrayInputStream :: (Ix i, MArray a Word8 IO) => a i Word8 -> i -> ArrayInputStream marrayOutputStream :: (Ix i, MArray a Word8 IO) => a i Word8 -> i -> ArrayOutputStream -- Sockets: data Socket instance Stream Socket instance InputStream Socket instance OutputStream Socket -- URIs: data URIStream getURI :: URI -> IO URIStream instance InputStream URIStream Cheers, Simon

Show replies by date

Ben Rudiak-Gould

1 Aug 1 Aug

5:47 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Fri, 1 Aug 2003, Simon Marlow wrote:

...

I wanted to float a generalisation of this scheme, though. I'm wondering whether it might be a good idea to make InputStream and OutputStream into type classes, the advantage being that this makes streams more extensible - one example is that memory-mapped files fit neatly into this framework. I already have 6 examples of things that can have streams layered on top (or *are* streams), and there are almost certainly more.

I think this is unambiguously superior to my design because it's user-extensible. I can easily imagine a user wanting to put a text reader on top of a user-defined instance of InputStream, for example. It also allows particular kinds of streams to expose additional structure, which is good. My only concern is that the additional structure might not be known at type-check time. In particular, the lookupXputStream functions can't return any particular type of stream, as far as I can tell -- certainly not a FileXputStream.

...

Here's some signatures for you to peruse:

class Stream s where closeStream :: s -> IO ()

I guess "open" and "close" do make sense for streams.

...

streamSetBuffering :: s -> BufferMode -> IO ()

This is not a design issue, but not all kinds of buffering make sense for all kinds of streams (line buffering doesn't seem sensible for file streams, and any buffering on a memory array is pointless). The supplied buffering should presumably be only a suggestion.

...

streamGetBuffering :: s -> IO BufferMode streamFlush :: s -> IO ()

Does streamFlush make sense for input streams? In the case of a file stream it could discard buffered data, but for other streams I'm not sure what it would do.

...

isEOS :: s -> IO Bool

This has a clear meaning for input streams (no more data), but for output streams it could mean many different things (connection closed by listener, no more disk space, no more memory buffer space), and, more seriously, these conditions can't in general be detected synchronously unless the stream happens to be unbuffered.

...

class InputStream s where streamGet :: s -> IO Word8 streamReadBuffer :: s -> Integer -> Buffer -> IO ()

I used "read" and "write" exclusively for files and "get" and "put" exclusively for streams to emphasize that these are completely different operations. Writing a file is like writing on a piece of paper; you know where your data is going and how to get it back with a read. But output streams are like pneumatic tubes that whisk your octets away to parts unknown. I would even go so far as to use names like push/pull or send/receive or speak/listen for streams.

...

streamReadBuffer :: s -> Integer -> Buffer -> IO () streamGetBuffer :: s -> Integer -> IO ImmutableBuffer

This brings up (again) an important issue: what's the most practical way of providing a memory buffer for file/stream operations? There doesn't seem to be a clean answer to this in Haskell. It seems like we'll need more variants than just these two. [snip]

...

data MappedFileInputStream -- instance Stream, InputStream data MappedFileOutputStream -- instance Stream, OutputStream

I don't think these are necessary; you can use ArrayXputStream. [snip]

...

-- Pipes data Pipe -- a pipe with a read and a write end instance Stream Pipe instance InputStream Pipe instance OutputStream Pipe createPipe :: IO Pipe closePipe :: Pipe -> IO ()

I strongly believe that createPipe should return an (InputStream,OutputStream) pair, not a single object supporting both interfaces. The streams associated with a pipe represent the ends of the pipe, not the pipe itself. This is true conceptually and also in practice: pipes are only useful if you separate the two ends and give them to two different threads.

...

-- Sockets: data Socket instance Stream Socket instance InputStream Socket instance OutputStream Socket

Same objection here, although the reason is a bit different. Each TCP connection consists of two independent unidirectional channels; they're only created together for reasons of efficiency (and security?). There are a total of four ends, of which you get two and the remote host gets the other two. I admit that in this case a natural analogy with a telephone handset suggests that the two streams should be kept together; but that's what tuples are for. The only object I can think of that could legitimately be an instance of both InputStream and OutputStream is a LIFO buffer, assuming there's any use for such a thing. -- Ben

Simon Marlow

6 Aug 6 Aug

7:51 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

Following all the suggestions in this thread, I've written out a new proposal. I hope I've managed to resolve most of the comments people had, or otherwise left comments in place, if that's not the case please complain. To summarise, the primary innovations here are: - Streams are layered on top of various kinds of underlying I/O objects, and have a set of uniform operations provided by the Stream, InputStream and OutputStream classes. (no-one thought that using type classes was a bad idea, so I've left them in). - Streams can be layered on top of arrays. Files can be mapped into memory and exposed as arrays. (I thinnk this part is particularly nice). There's a Haddock-processed version of the file here: http://www.haskell.org/~simonmar/io/System.IO.html Cheers, Simon -- | The naming is all subject to change, we're interested in -- functionality first. module System.IO ( -- * Buffers Buffer, withBuffer, ImmutableBuffer, -- $buffers -- * Files File, FileInputStream, FileOutputStream, FileOffset, openFile, closeFile, fileSize, fileSetSize, fileRead, fileGet, fileWrite, fileInputStream, fileOutputStream, mapFile, -- * MappedFiles MappedFile, -- $mappedfiles -- * Pipes PipeInputStream, PipeOutputStream, createPipe, -- * Arrays as streams ArrayInputStream, ArrayOutputStream, iarrayInputStream, marrayInputStream, marrayOutputStream, -- * Sockets -- | The socket support won't live in System.IO, it will be in -- "Network.Socket" as before. Socket, SocketInputStream, SocketOutputStream, socketGetInputStream, socketGetOutputStream, -- $sockets -- * Streams Stream(..), InputStream(..), OutputStream(..), ) where -- ----------------------------------------------------------------------------- -- Buffers -- | A mutable array of bytes that can be passed to foreign functions. data Buffer instance MArray Buffer Word8 IO withBuffer :: Buffer -> (Ptr Word8 -> IO a) -> IO a -- | An immutable array of bytes data ImmutableBuffer instance IArray ImmutableBuffer Word8 -- $buffers -- The idea is that Buffer should be useful for text encoding\/decoding. -- -- Implementation notes: on GHC, 'Buffer' could be implemented by 'IOUArray', -- using a pinned 'ByteArray#' as the underlying object, so that the buffer -- address can be passed to foreign functions. -- -- (a 'StorableArray' would do for Buffer, but an 'IOUArray' will be more -- efficient). -- ---------------------------------------------------------------------------- -- Files data File -- a mutable array of bytes, with some access permissions data FileInputStream data FileOutputStream instance Stream FileInputStream where {} instance Stream FileOutputStream where {} instance InputStream FileInputStream where {} instance OutputStream FileOutputStream where {} type FileOffset = Integer openFile :: FilePath -> IOMode -> IO File closeFile :: File -> IO () fileSize :: File -> IO Integer fileSetSize :: File -> Integer -> IO () fileRead :: File -> FileOffset -> Integer -> Buffer -> IO () fileGet :: File -> FileOffset -> Integer -> IO ImmutableBuffer fileWrite :: File -> FileOffset -> Integer -> Buffer -> IO () -- TODO: what if a file refers to a FIFO? fileInputStream :: File -> FileOffset -> IO FileInputStream fileOutputStream :: File -> FileOffset -> IO FileOutputStream mapFile :: File -> FileOffset -> Integer -> MapMode -> IO MappedFile -- --------------------------------------------------------------------------- -- Mapped files -- | A portion of a 'File' mapped directly into memory. The data can -- be read and written using the array operations, and streams to the -- data can be created using 'marrayInputStream' and 'marrayOutputStream'. data MappedFile instance MArray MappedFile Word8 IO -- $mappedfiles -- A 'MappedFile' might be implemented as a 'StorableArray', with a -- 'ForeignPtr' inside it. The finalizer can unmap the file. -- ----------------------------------------------------------------------------- -- Pipes data PipeInputStream data PipeOutputStream instance Stream PipeInputStream instance Stream PipeOutputStream instance InputStream PipeInputStream instance OutputStream PipeOutputStream createPipe :: IO (PipeInputStream,PipeOutputStream) -- ----------------------------------------------------------------------------- -- Arrays -- | An input stream created from an array. data ArrayInputStream instance Stream ArrayInputStream instance InputStream ArrayInputStream -- | An output stream created from an array. data ArrayOutputStream instance Stream ArrayOutputStream instance OutputStream ArrayOutputStream -- | Creates an 'ArrayInputStream' from an immutable array iarrayInputStream :: (Ix i, IArray a Word8) => a i Word8 -> i -> ArrayInputStream -- | Creates an 'ArrayInputStream' from a mutable array marrayInputStream :: (Ix i, MArray a Word8 IO) => a i Word8 -> i -> ArrayInputStream -- | Creates an 'ArrayOutputStream' from a mutable array marrayOutputStream :: (Ix i, MArray a Word8 IO) => a i Word8 -> i -> ArrayOutputStream -- ----------------------------------------------------------------------------- -- Sockets data Socket data SocketInputStream data SocketOutputStream instance Stream SocketInputStream instance Stream SocketOutputStream instance InputStream SocketInputStream instance OutputStream SocketOutputStream socketGetInputStream :: Socket -> SocketInputStream socketGetOutputStream :: Socket -> SocketOutputStream -- $sockets -- Input and output streams for a socket can be closed -- independently. -- -- Each socket has only one pair of input\/output -- streams, hence these functions are pure. -- ----------------------------------------------------------------------------- -- Streams class Stream s where closeStream :: s -> IO () -- | Note: objections have been raised about this method, and -- are still to be resolved. It doesn't make as much sense -- for output streams as it does for input streams. isEOS :: s -> IO Bool -- On Buffering: -- Not all streams are buffered; for example, there's no point in buffering an -- array stream, or a mapped file. However, using a separate class -- for buffered streams won't work: you couldn't write a function that -- behaved differently when given an unbuffered stream or a buffered -- stream. -- | Sets the buffering mode on the stream. Returns 'True' if -- the buffereing mode was set, or 'False' if it wasn't. If the -- stream does not support buffereing, it may return 'False' here. setBufferMode :: s -> BufferMode -> IO Bool -- | Returns the current buffering mode for a stream. On a -- stream that does not support buffering, the result will always -- be 'NoBuffering'. getBufferMode :: s -> IO BufferMode -- | Flushes the buffer to the operating system for an output -- buffer, or discards buffered data for an input buffer. flush :: s -> IO () -- | Flushes the buffered data as far as possible, even to the -- physical media if it can. It returns 'True' if the data -- has definitely been flushed as far as it can go: to the -- disk for a disk file, to the screen for a terminal, and so on. sync :: s -> IO Bool class InputStream s where streamGet :: s -> IO Word8 streamReadBuffer :: s -> Integer -> Buffer -> IO () streamGetBuffer :: s -> Integer -> IO ImmutableBuffer streamGetContents :: s -> IO [Word8] -- | Gets any data which can be read without blocking. streamGetAvailable :: s -> IO [Word8] class OutputStream s where streamPut :: s -> Word8 -> IO () streamPuts :: s -> [Word8] -> IO () streamWriteBuffer :: s -> Integer -> Buffer -> IO () streamPutBuffer :: s -> Integer -> ImmutableBuffer -> IO () -- ----------------------------------------------------------------------------- -- Notes {- Parameterise InputStream over the element type too, so we can combine InputStream and TextInputStream? NO: uses multiparam type classes. Naming: maybe put all the stream ops in one module, and use qualfied names eg. Stream.close, Stream.setBuffering etc. From Ashley Yakely: 3. I note all the class members have types of the form "s -> a" each for some "a" not dependent on "s". This means streams might be a candidate for data structures: data InputStream m = { isStream :: Stream m, streamGet :: m Word8, streamReadBuffer :: Integer -> Buffer -> IO () ... } I'm not sure which is preferable however. Data-structure inheritance has to be done by hand (except see point 9., it might only be a close function), and they don't allow default implementations (yet). -}

Ashley Yakeley

6:14 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

In article , "Simon Marlow" wrote:

...

fileRead :: File -> FileOffset -> Integer -> Buffer -> IO () fileGet :: File -> FileOffset -> Integer -> IO ImmutableBuffer

Should fileRead return the number of bytes read, in case of asking for more bytes than are in the file? With fileGet you can simply look at the size of the return buffer.

...

-- | Flushes the buffer to the operating system for an output -- buffer, or discards buffered data for an input buffer. flush :: s -> IO ()

Those are two very different things, aren't they? The first pushes data along the pipe (rather than discarding it). The second discards data, equivalent to calling streamGetAvailable and ignoring the result. My preference would be to move flush to OutputStream and, if necessary, add a streamDiscardAvailable. But I can't offhand think of any use for streamDiscardAvailable and it already can be done with streamGetAvailable.

...

-- | Flushes the buffered data as far as possible, even to the -- physical media if it can. It returns 'True' if the data -- has definitely been flushed as far as it can go: to the -- disk for a disk file, to the screen for a terminal, and so on. sync :: s -> IO Bool

This one seems to be output only. Would it ever be meaningful to call sync on an input-stream?

...

class InputStream s where class OutputStream s where

Presumably you wanted Stream superclasses for these?

...

streamGet :: s -> IO Word8

What does this return after the last byte has been read? What does this return for an empty file?

...

streamReadBuffer :: s -> Integer -> Buffer -> IO () streamGetBuffer :: s -> Integer -> IO ImmutableBuffer

Same issue as fileRead/fileGet.

...

Parameterise InputStream over the element type too, so we can combine InputStream and TextInputStream? NO: uses multiparam type classes.

Unless of course your stream type is a type constructor: class InputStream s where streamGet :: s m e -> m e ... -- Ashley Yakeley, Seattle WA

Tom Pledger

6:42 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

Ashley Yakeley writes: : | > streamGet :: s -> IO Word8 | | What does this return after the last byte has been read? What does | this return for an empty file? How about throwing an IOError instead of returning anything?

Ashley Yakeley

7 Aug 7 Aug

12:12 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

In article <16177.33859.531992.475470@tux-17.corp.peace.com>, Tom Pledger wrote:

...

Ashley Yakeley writes: : | > streamGet :: s -> IO Word8 | | What does this return after the last byte has been read? What does | this return for an empty file?

How about throwing an IOError instead of returning anything?

An EOS error? This would be the only function that throws it, wouldn't it? Might it not be better for streamGet to return (Maybe Word8)? "Nothing" would mean EOS, and "Just b" would be the next byte. -- Ashley Yakeley, Seattle WA

Ben Rudiak-Gould

12:49 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Wed, 6 Aug 2003, Simon Marlow wrote:

...

-- TODO: what if a file refers to a FIFO? fileInputStream :: File -> FileOffset -> IO FileInputStream fileOutputStream :: File -> FileOffset -> IO FileOutputStream

Part of my original design was that File values always refer to random-access files. Most of the operations on Files make no sense with anything else. openFile might succeed on something like /dev/kmem (although I'm not convinced that this is type-safe :-), but to open a FIFO you had to use openXputStream instead. This avoids the questions raised by using the functions above on non-seekable handles (What would the FileOffset argument mean? and What happens if you put more than one input or output stream on the same FIFO?). -- Ben

John Meacham

8 Aug 8 Aug

7:44 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

we should have a routine which pumps bytes from an inputstream to an outputstream. on many systems this can be greatly optimized by using something like sendfile(2). even without sendfile, this operation can be optimized in many cases like for arraystreams it can be a single memcpy. I feel this is very important, but am unsure where to add it. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

Ben Rudiak-Gould

1 Aug 1 Aug

7:38 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Fri, 1 Aug 2003, Simon Marlow wrote:

...

class InputStream s where streamGet :: s -> IO Word8 streamReadBuffer :: s -> Integer -> Buffer -> IO () streamGetBuffer :: s -> Integer -> IO ImmutableBuffer streamGetContents :: s -> IO [Word8]

class OutputStream s where streamPut :: s -> Word8 -> IO () streamPuts :: s -> [Word8] -> IO () streamWriteBuffer :: s -> Integer -> Buffer -> IO ()

Now that I think about it, we should parameterize these classes over the input/output data type also. Then we could have filterInputStream :: (Storable a, Storable b, InputStream s a, InputStream t b) => BlockRecoder a b -> s -> t filterOutputStream :: (Storable a, Storable b, OutputStream s a, OutputStream t b) => BlockRecoder b a -> s -> t Text streams would fit into this model. Plus, there are many useful stream filters which don't go octet -> char or char -> octet, such as: packBitsLE, packBitsBE :: BlockRecoder Bool Word8 unpackBitsLE, unpackBitsBE :: BlockRecoder Word8 Bool zlibPack, zlibUnpack :: BlockRecoder Word8 Word8 recodeNewlines :: NewlineFormat -> BlockRecoder Char Char serialize :: (Serializable a) => BlockRecoder a Word8 unserialize :: (Unserializable a) => BlockRecoder Word8 a -- Ben

Ben Rudiak-Gould

9:17 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

I wrote:

...

serialize :: (Serializable a) => BlockRecoder a Word8 unserialize :: (Unserializable a) => BlockRecoder Word8 a

... except that as BlockRecoder is currently defined, these won't work unless the type a is Storable also. Oops. They could be done this way instead: serialize :: (Serializable a, InputStream s Word8, InputStream X a) => s -> X unserialize :: (Unserializable a, OutputStream s Word8, OutputStream X a) => s -> X where X is some concrete type. Hope I got that right. (Also, I got the types of filterXputStream wrong: t should be replaced by another concrete type like X.) -- Ben

Sven Panne

2 Aug 2 Aug

11:04 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

Ben Rudiak-Gould wrote:

...

Now that I think about it, we should parameterize these classes over the input/output data type also. Then we could have

filterInputStream :: (Storable a, Storable b, InputStream s a, InputStream t b) => BlockRecoder a b -> s -> t

filterOutputStream :: (Storable a, Storable b, OutputStream s a, OutputStream t b) => BlockRecoder b a -> s -> t [...]

But this is not Haskell98 anymore, which would be a bad thing for general library IMHO. NHC98 doesn't support MPTCs, so I guess Malcolm will be unhappy with this design, too... :-) Cheers, S.

ketil＠ii.uib.no

5 Aug 5 Aug

2:50 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

"Simon Marlow" writes:

...

class Stream s where closeStream :: s -> IO () StreamSetBuffering :: s -> BufferMode -> IO () :

Just a minor suggestion, since I think some of the names get a bit long and cumbersome. How about: module Stream where class Stream s where close :: s -> IO () setBuffering :: s -> BufferMode -> IO () : and so on. Importing qualified gives you names like Stream.close, Stream.setBuffering,... which are quite similar to your suggested ones. I suspect a typical use would have lots of stream operations in a few modules, so having short (unqualified) names might be nice. -kzm -- If I haven't seen further, it is by standing in the footprints of giants

Seth Kurtzberg

3:09 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

For my purposes (transaction logging for my database server) I need to be able to guarantee that data is written to disk. That is, it isn't enough to disable buffering in the compiler libraries (all libraries, more accurately), I need to also force the O/S to flush the data to disk. This is difficult to do in a portable manner, obviously, but if a practical way can be found it would have many uses in systems using transactional semantics. It would also get rid of an FFI dependency for my code. On Monday, August 4, 2003, at 11:50 PM, Ketil Z. Malde wrote:

...

"Simon Marlow" writes:

...
class Stream s where closeStream :: s -> IO () StreamSetBuffering :: s -> BufferMode -> IO () :

Just a minor suggestion, since I think some of the names get a bit long and cumbersome. How about:

module Stream where

class Stream s where close :: s -> IO () setBuffering :: s -> BufferMode -> IO () :

and so on. Importing qualified gives you names like

Stream.close, Stream.setBuffering,...

which are quite similar to your suggested ones. I suspect a typical use would have lots of stream operations in a few modules, so having short (unqualified) names might be nice.

-kzm -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

----------------------------------------------------------------- Seth Kurtzberg CTO ISEC Research and Network Operations Center 480-314-1540 888-879-5206 seth@isec.us -----------------------------------------------------------------

Ben Rudiak-Gould

3:30 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Tue, 5 Aug 2003, Seth Kurtzberg wrote:

...

For my purposes (transaction logging for my database server) I need to be able to guarantee that data is written to disk. That is, it isn't enough to disable buffering in the compiler libraries (all libraries, more accurately), I need to also force the O/S to flush the data to disk.

This is difficult to do in a portable manner, obviously, but if a practical way can be found it would have many uses in systems using transactional semantics. It would also get rid of an FFI dependency for my code.

My intended semantics for the osFlush function was always that it would do its best to ensure that the data was "pushed as far as possible" toward its final destination. If you need a guarantee, the function could be made to return a Bool, with True indicating that it was absolutely sure that the data had made it all the way. But I don't think that it could ever return True. It might be running in a VMware sandbox without realizing it, for example. So you'll probably have to run tests on your particular setup to see how well it works. -- Ben

Seth Kurtzberg

3:34 a.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Tuesday, August 5, 2003, at 12:30 AM, Ben Rudiak-Gould wrote:

...

On Tue, 5 Aug 2003, Seth Kurtzberg wrote:

...
For my purposes (transaction logging for my database server) I need to be able to guarantee that data is written to disk. That is, it isn't enough to disable buffering in the compiler libraries (all libraries, more accurately), I need to also force the O/S to flush the data to disk.

This is difficult to do in a portable manner, obviously, but if a practical way can be found it would have many uses in systems using transactional semantics. It would also get rid of an FFI dependency for my code.

My intended semantics for the osFlush function was always that it would do its best to ensure that the data was "pushed as far as possible" toward its final destination.

If you need a guarantee, the function could be made to return a Bool, with True indicating that it was absolutely sure that the data had made it all the way. But I don't think that it could ever return True. It might be running in a VMware sandbox without realizing it, for example. So you'll probably have to run tests on your particular setup to see how well it works.

That is certainly true, but to get even that far the semantics have to exist. You've answered my question; osFlush means (assuming that the O/S can provide the functionality) flush to permanent storage.

...

-- Ben

John Meacham

5 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Tue, Aug 05, 2003 at 12:34:03AM -0700, Seth Kurtzberg wrote:

...

On Tuesday, August 5, 2003, at 12:30 AM, Ben Rudiak-Gould wrote:

...
On Tue, 5 Aug 2003, Seth Kurtzberg wrote:

...
For my purposes (transaction logging for my database server) I need to be able to guarantee that data is written to disk. That is, it isn't enough to disable buffering in the compiler libraries (all libraries, more accurately), I need to also force the O/S to flush the data to disk.

This is difficult to do in a portable manner, obviously, but if a practical way can be found it would have many uses in systems using transactional semantics. It would also get rid of an FFI dependency for my code.

My intended semantics for the osFlush function was always that it would do its best to ensure that the data was "pushed as far as possible" toward its final destination.

If you need a guarantee, the function could be made to return a Bool, with True indicating that it was absolutely sure that the data had made it all the way. But I don't think that it could ever return True. It might be running in a VMware sandbox without realizing it, for example. So you'll probably have to run tests on your particular setup to see how well it works.

That is certainly true, but to get even that far the semantics have to exist. You've answered my question; osFlush means (assuming that the O/S can provide the functionality) flush to permanent storage.

There are three useful levels of flush that I can think of. flush all userspace buffers to the OS, flush all data to disk. flush all data and metadata to disk. the os interfaces would be fflush (well the internal haskell equivalant), fdatasync, and fsync. I think there is a use for all of them. in particular, being able to flush to the os without doing an fsync is good for network traffic because always fsyncing can mess with the normal TCP packet consolodation logic. (on some OSes). fdatasync vs fsync is useful. fdatasync can be much faster with many types of filesystems and is usually what people want. John -- --------------------------------------------------------------------------- John Meacham - California Institute of Technology, Alum. - john@foo.net ---------------------------------------------------------------------------

Seth Kurtzberg

6:42 p.m.

New subject: Raw I/O library proposal, second (more pragmatic) draft

On Tuesday, August 5, 2003, at 02:00 PM, John Meacham wrote:

...

On Tue, Aug 05, 2003 at 12:34:03AM -0700, Seth Kurtzberg wrote:

...
On Tuesday, August 5, 2003, at 12:30 AM, Ben Rudiak-Gould wrote:

...
On Tue, 5 Aug 2003, Seth Kurtzberg wrote:

...
For my purposes (transaction logging for my database server) I need to be able to guarantee that data is written to disk. That is, it isn't enough to disable buffering in the compiler libraries (all libraries, more accurately), I need to also force the O/S to flush the data to disk.

This is difficult to do in a portable manner, obviously, but if a practical way can be found it would have many uses in systems using transactional semantics. It would also get rid of an FFI dependency for my code.

My intended semantics for the osFlush function was always that it would do its best to ensure that the data was "pushed as far as possible" toward its final destination.

If you need a guarantee, the function could be made to return a Bool, with True indicating that it was absolutely sure that the data had made it all the way. But I don't think that it could ever return True. It might be running in a VMware sandbox without realizing it, for example. So you'll probably have to run tests on your particular setup to see how well it works.

That is certainly true, but to get even that far the semantics have to exist. You've answered my question; osFlush means (assuming that the O/S can provide the functionality) flush to permanent storage.

There are three useful levels of flush that I can think of. flush all userspace buffers to the OS, flush all data to disk. flush all data and metadata to disk. the os interfaces would be fflush (well the internal haskell equivalant), fdatasync, and fsync.

There is another very important case, which is to flush, _selectively_, some but not all data to disk. For example when doing transaction logging you must flush log data to disk but you _don't_ want to flush other data (because it destroys performance).

...

I think there is a use for all of them. in particular, being able to flush to the os without doing an fsync is good for network traffic because always fsyncing can mess with the normal TCP packet consolodation logic. (on some OSes). fdatasync vs fsync is useful. fdatasync can be much faster with many types of filesystems and is usually what people want. John

-- ----------------------------------------------------------------------- ---- John Meacham - California Institute of Technology, Alum. - john@foo.net ----------------------------------------------------------------------- ---- _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

7957

Age (days ago)

7964

Last active (days ago)

List overview

Download

16 comments

8 participants

participants (8)

Ashley Yakeley
Ben Rudiak-Gould
John Meacham
ketil＠ii.uib.no
Seth Kurtzberg
Simon Marlow
Sven Panne
Tom Pledger