
On 15/12/09 06:09, Bryan O'Sullivan wrote:
I just added support to Data.Text for your new Unicode-based Handle implementation, and I'd like to write some tests. The natural way to do this would be to create Handles that will write to, and read from, ByteStrings. Does any such code exist at the moment? I don't see it in base or bytestring, though all the necessary abstractions appear to be present.
I haven't implemented a bytestring-backed Handle, but as you say all the abstractions should be present. It would be a great thing to have on Hackage. A good starting point would be the mmap-backed Handle code that I wrote for my talk at the Haskell Implementors Workshop last year. I'd intended to polish this up and upload to Hackage, but never got around to it. I've put the code here for now: http://www.haskell.org/~simonmar/mmap-handle.tar.gz
Also, the place I hooked into the new I/O machinery was at the next level up from CharBuffer. Because the implementation of CharBuffer isn't abstract, I had no opportunity to put a text array in there, so there's an extra amount of copying that happens when going from byte buffer to char buffer to Text. It's a bit of a shame, but I don't see a way around it at the moment. Would you be interested in trying to remove that extra copy, or is the current interface set in stone?
Yes, you may remember we talked about this in Edinburgh (the conversion would probably make more sense to you now than it did then :-). One thing I experimented with is making CharBuffers use UTF-16. You'll see some instances of #ifdef CHARBUF_UTF16 in the code - it partially works, I believe the main missing piece is support in the built-in codecs. I don't think it would be too hard to fix them, they just need to more abstract about offsets in the CharBuffer; writeCharBuffer/readCharBuffer already handle the UTF-16 encoding/decoding. So one possibility is to get this working and then avoid the extra copy by just taking out the ByteArray# inside a CharBuffer and turning it into a text buffer. I'm not sure of the details here, but I imagine something along those lines would work. We would then have to allocate a new CharBuffer for the Handle. Another possibility is (as you suggested) to make Handles independent of the representation of the CharBuffer, making it completely abstract. I haven't put much thought into that, it might well be a better approach. It would presumably involve a new existential class constraint in the Handle for the CharBuffer operations, and we'd have to be careful about performance: currently I think the CharBuffer operations get inlined nicely. Cheers, Simon