
drtomc:
On 7/4/07, Donald Bruce Stewart
wrote: Can we do a cheap bytestring binding to libxml, to avoid any initial String processing?
For my part, it's not too big an issue. A version of HaXml or at least Parsec built on top of ByteString would be a good start. I know there was a SoC for the latter, though I have not looked to see where it ended up.
Actually, if you were looking for a good bit of abstraction to build how's this? It would be *really* nice to do all my IO with mmap so my program isn't hit by the buffer duplication problem[*]. The kind of API I have in mind is something like:
data Mapping -- abstract
mmap :: Handle {- or Fd, perhaps -} -> Offset -> Length -> IO Mapping
read :: Mapping -> Offset -> Length -> IO ByteString
write :: Mapping -> Offset -> ByteString -> IO ()
munmap :: Mapping -> IO () -- maybe just use a finalizer
Oh, we should really restore the mmapFile interface in Data.ByteString. Currently its commented out to help out windows people. And the current implementation does indeed use finalisers to handle the unmapping.
This API has the problem that read in particular still has to do copying. If you think about the binary XML stuff I mentioned before, you'll see that it would be really nice if I could mmap in a record and parse it without having to do any copying, or at least to defer any copying with a copy-on-write scheme. Doing a simple implementation of read that just put a ByteString wrapper around the mmapped memory would be nice and efficient, but would suffer from the problem that if something changed that bit of the underlying file, things would break. Maybe it's just not possible to finesse this one.
Yep. The current impl is: mmapFile :: FilePath -> IO ByteString mmapFile f = mmap f >>= \(fp,l) -> return $! PS fp 0 l mmap :: FilePath -> IO (ForeignPtr Word8, Int) mmap = do ... p <- mmap l fd fp <- newForeignPtr p unmap -- attach unmap finaliser return fp Maybe I should just stick this in the unix package. -- Don