On Tue, Dec 15, 2009 at 1:39 AM, Simon Marlow <marlowsd@gmail.com> wrote:
I haven't implemented a bytestring-backed Handle, but as you say all the abstractions should be present. It would be a great thing to have on Hackage.
A good starting point would be the mmap-backed Handle code that I wrote for my talk at the Haskell Implementors Workshop last year. I'd intended to polish this up and upload to Hackage, but never got around to it. I've put the code here for now:
http://www.haskell.org/~simonmar/mmap-handle.tar.gz
Ooh, thanks! I'll take a look-see.
Yes, you may remember we talked about this in Edinburgh (the conversion would probably make more sense to you now than it did then :-).
I do indeed remember :-)
One thing I experimented with is making CharBuffers use UTF-16. You'll see some instances of #ifdef CHARBUF_UTF16 in the code - it partially works, I believe the main missing piece is support in the built-in codecs. I don't think it would be too hard to fix them, they just need to more abstract about offsets in the CharBuffer; writeCharBuffer/readCharBuffer already handle the UTF-16 encoding/decoding.
So one possibility is to get this working and then avoid the extra copy by just taking out the ByteArray# inside a CharBuffer and turning it into a text buffer. I'm not sure of the details here, but I imagine something along those lines would work. We would then have to allocate a new CharBuffer for the Handle.
Yes, that would amount to double-buffering, and would work nicely, only the current buffers go through foreign pointers while text uses an unpinned array. I can see why this is (so iconv can actually work), but it does introduce a fly into the ointment :-)
Another possibility is (as you suggested) to make Handles independent of the representation of the CharBuffer, making it completely abstract. I haven't put much thought into that, it might well be a better approach. It would presumably involve a new existential class constraint in the Handle for the CharBuffer operations, and we'd have to be careful about performance: currently I think the CharBuffer operations get inlined nicely.
Aye. I think this would have the same problem with foreign transcoding code that wants a reliable pointer.