Re: [Haskell-cafe] Brainstorming on how to parse IMAP

Quoth John Goerzen

Donn Cave wrote:
I mentioned that my parser may return an incomplete status. In principle, something like
parseResponse :: ByteString -> Maybe (IMAPResponse, ByteString)
That means the parse needs to be repeated each time until enough input data has accumulated. I worried a little about how to represent a useful incomplete parse state, but decided that it isn't worth the trouble - the amount of parsing in an ordinary response is too trivial.
But the problem here is getting that first ByteString in the first place. Are you suggesting it would be the result of hGetContents or somesuch, reading from a socket? I've tried the sort of approach with my FTP library. It can be done, but it is exceptionally tricky and I wouldn't do it again. You have to make extremely careful use of things like try in Parsec, and even just regular choices (sometimes it wants to read a character that won't exist yet.) Also buffering plays into it too. The trick with reading from the network in a back-and-forth protocol is knowing how much to read. You have to be very careful here. If you try to read too much (and are blocking until you read), you will get deadlock because you are reading data that the other end isn't going to send yet. And that, as I see it, is the problem with the above. It seems to be a chicken-and-egg problem in my mind: how do you know how much data to read until you've parsed the last bit of data that tells you how to read the next bit?
| 3) The linkage between Parsec and IO is weak. I cannot write an | "IMAPResponse" parser. I would have a write a set of parsers to parse | individual components of the IMAP response as part of the IO monad code | that reads the IMAP response, since the result of one dictates how much | network data I attempt to read.
The parser should just parse data, and not read it.
You don't need to worry about whether you can get recv(2) semantics on a socket with bytestrings, and you don't need to saddle users of this parser with whatever choice you might make there. You don't need to supply an SSL input function with the same semantics, or account for the possibility that data might actually not be coming from a socket at all (UNIX pipes are not unheard of.) You don't need to lock users into whatever execution dispatching might be supported by that I/O, potentially ruling out graphics libraries etc. that might not be compatible.
So I would let the application come up with the data. In general,
Well, your response here begs the question of how much you want to automate from the application. Yes, there are multiple ways of communicating with IMAP servers, but you have the same synchronization issues with all of them. Yes, I plan to let the application supply functions to read data. But in the end, that is pointless if those functions can't be written in Haskell!
I don't think there's any way to specify how much to read - I mean, the counted literal certainly provides that information, but that's the exception - so I would assume the application will need some kind of recv(2)-like function that reads data as available.
Exactly. But there is no recv()-like function, except the one that returns IO String. There is no recv()-like function that returns IO ByteString. (I actually notice now a package on Hackage that does this... why it's not in core, I don't know.)
By the way, if you haven't already run across this, you may be interested to read about the IMAP "IDLE" command, cf. RFC2177. I think the value of this feature can be overstated, but it depends on the server, and some IMAP client implementators are very fond of it. At this point, the reason it might be interesting is that it moves away from the call & response pattern.
It's on my todo list. -- John
participants (2)
-
Donn Cave
-
John Goerzen