Re: iteratee enumHandle (and parsing)

Hello, This explanation is entirely correct. I'm not happy with this situation, because I think it's confusing for people just starting with iteratee. However, I do want something that will be unicode (and other possible encodings) aware. Currently, I'm waiting to see how unicode support in the base libraries works out. What seems the best possibility at this time is writing codecs that could be used with mapStream or convStream to handle this appropriately, but I'd prefer to re-use core unicode support if possible. I'm open to suggestions on this issue. At the very least, I should probably include the ASCII encoding in the iteratee package. I'd like to make the common case of reading ASCII text easier than it currently is. Until then, one approach is using mapStream as suggested below. Another is using a wrapped ByteString instead of the List chunk. The ByteString chunk forces the element to be an 8-bit ASCII character instead of using the Char storable instance. If the full polymorphism of the List chunk is required, the conversion can be done with either mapStream or convStream. convStream is probably more efficient for this at the moment. Regarding using Parsec with iteratees: unfortunately I don't really do much in the way of parsing myself so I'm not much help here. I think that Oleg intended for parsers to be written directly using the iteratee primitives, in particular the "heads" function would be useful (at least he found it so). I don't see an efficient way to generate a continuation on a partial parse with Parsec, using iteratees or not. I think you could just load data into a parser with setInput, but if you don't have enough data for a complete parse you'll just need to hold onto all the data until you've loaded enough to run it. If you tell me what functions you find lacking from iteratee, I'll look into implementing them. I'd rather not re-implement all of Parsec, but the text parsers should be simple enough, as well as some of the combinators. Cheers, John Lato
From: Echo Nolan
Hi Paolino.
What's happening is reading [Char] uses the Storable instance for Char which is 32-bit. Thus, you get gibberish. The below does what you want, by reading Word8s and converting them.
import Control.Exception import Data.Char import Data.Iteratee.IO import Data.Iteratee.Base import Data.Word import System.IO
main :: IO () main = do h <- openFile "mamma23" ReadWriteMode hPutStr h "ciao" hSeek h AbsoluteSeek 0 l <- enumHandle h readString >>= run print $ assert (l == "ciao") ()
-- This is declared on its own so I can give a type signature without making -- any of the above lines unmanageably long. readString :: IterateeG [] Word8 IO String readString = joinI $ mapStream (chr . fromIntegral) stream2list
This only works for ASCII, of course. Someone should write some enumerators for the other encodings.
Regards, Echo Nolan
participants (1)
-
John Lato