
On Thu, Sep 15, 2005 at 11:09:25AM -0500, John Goerzen wrote:
The recent thread on binary parsing got me to thinking about more general network protocol parsing with parsec. A lot of network protocols these days are text-oriented, so seem a good fit for parsec.
However, the difficulty I come up time and again is: parsec normally expects to parse as much as possible at once.
With networking, you must be careful not to attempt to read more data than the server hands back, or else you'll block.
I've had some success with hGetContents on a socket and feeding it into extremely carefully-crafted parsers, but that is error-prone and ugly.
I don't see why this would be more error-prone than any other approach. As for ugly, it might be somewhat more pleasant if Parsec could take input from a monadic action, but hGetContents works, and if you want more control (eg, reading from a socket fd directly), you can use unsafeInterleaveIO yourself. I wrote a parser for s-expressions that must not read beyond the final ')', and while I agree it is tricky, it's all necessary trickiness. Note I use lexeme parsers as in the Parsec documentation, and use an "L" suffix in their names. -- do not eat trailing whitespace, because we want to process a request from -- a lazy stream (eg socket) as soon as we see the closing paren. sexpr :: Parser a -> Parser (Sexpr a) sexpr p = liftM Atom p <|> cons p cons :: Parser a -> Parser (Sexpr a) cons p = parens tailL where tailL = do dotL sexprL p <|> liftM2 Cons (sexprL p) tailL <|> return Nil sexprL :: Parser a -> Parser (Sexpr a) sexprL p = lexeme (sexpr p) consL :: Parser a -> Parser (Sexpr a) consL p = lexeme (cons p) top p = between whiteSpace eof p lexeme p = do r <- p whiteSpace return r whiteSpace = many space dotL = lexeme (string ".") -- NB: eats whitespace after opening paren, but not closing parens p = between (lexeme (string "(")) (string ")") p Andrew