Re: [Haskell-cafe] Re: Abstraction leak

drtomc:
On 7/4/07, Donald Bruce Stewart
wrote: Can we do a cheap bytestring binding to libxml, to avoid any initial String processing?
For my part, it's not too big an issue. A version of HaXml or at least Parsec built on top of ByteString would be a good start. I know there was a SoC for the latter, though I have not looked to see where it ended up.
Actually, if you were looking for a good bit of abstraction to build how's this? It would be *really* nice to do all my IO with mmap so my program isn't hit by the buffer duplication problem[*]. The kind of API I have in mind is something like:
data Mapping -- abstract
mmap :: Handle {- or Fd, perhaps -} -> Offset -> Length -> IO Mapping
read :: Mapping -> Offset -> Length -> IO ByteString
write :: Mapping -> Offset -> ByteString -> IO ()
munmap :: Mapping -> IO () -- maybe just use a finalizer
Oh, we should really restore the mmapFile interface in Data.ByteString. Currently its commented out to help out windows people. And the current implementation does indeed use finalisers to handle the unmapping.
This API has the problem that read in particular still has to do copying. If you think about the binary XML stuff I mentioned before, you'll see that it would be really nice if I could mmap in a record and parse it without having to do any copying, or at least to defer any copying with a copy-on-write scheme. Doing a simple implementation of read that just put a ByteString wrapper around the mmapped memory would be nice and efficient, but would suffer from the problem that if something changed that bit of the underlying file, things would break. Maybe it's just not possible to finesse this one.
Yep. The current impl is: mmapFile :: FilePath -> IO ByteString mmapFile f = mmap f >>= \(fp,l) -> return $! PS fp 0 l mmap :: FilePath -> IO (ForeignPtr Word8, Int) mmap = do ... p <- mmap l fd fp <- newForeignPtr p unmap -- attach unmap finaliser return fp Maybe I should just stick this in the unix package. -- Don

On 7/5/07, Donald Bruce Stewart
Yep. The current impl is:
mmapFile :: FilePath -> IO ByteString mmapFile f = mmap f >>= \(fp,l) -> return $! PS fp 0 l
mmap :: FilePath -> IO (ForeignPtr Word8, Int) mmap = do ... p <- mmap l fd fp <- newForeignPtr p unmap -- attach unmap finaliser return fp
Which, if I read it correctly is not safe in a concurrent/multitasking environment, since it wraps the underlying mmapped region. In many programs, I'm sure this won't be a problem. Unfortunately, the system I'm working on is multi-threaded, and we definitely want to update regions. Perhaps I'll have to bite the bullet and implement the Mapping thing I described. The really unfortunate thing is that I'd really like to be able to do it within the STM monad, with rollback, etc - escaping to the IO monad is annoying. FWIW, the technique I use to handle this kind of situation may be of general interest. Consider a cache of structures reconstituted from an external file. If a requested item is not in the cache, then we throw an exception which is caught in a wrapper function which is in the IO monad, read the requested structure, stick it in the cache, then rerun the transaction. There are a few details you have to get right, including making sure none of the items you require to complete the operation get evicted by another thread, but it works very nicely. T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.
participants (2)
-
dons@cse.unsw.edu.au
-
Thomas Conway