
Hi, I've been struggling with this problem for days and I'm dying. Please help. I want to use Parsec to parse NNTP data coming to me from a handle I get from connectTo. One unworkable approach I tried is to get a lazy String from the handle with hGetContents. The problem: suppose the first message from the NNTP server is "200 OK\r\n". Parsec parses it beautifully. Now I need to discard the parsed part so that Parsec will parse whatever the server sends next, so I use Parsec's getInput to get the remaining data. But there isn't any, so it blocks. Deadlock: the client is inappropriately waiting for server data and the server is waiting for my first command. Another approach that doesn't quite work is to create an instance of Parsec's Stream with timeout functionality: instance Stream Handle IO Char where uncons h = do r <- hWaitForInput h ms if r then liftM (\c -> Just (c, h)) (hGetChar h) else return Nothing where ms = 5000 It's probably obvious to you why it doesn't work, but it wasn't to me at first. The problem: suppose you tell parsec you're looking for (many digit) followed by (string "\r\n"). "123\r\n" won't match; "123\n" will. My Stream has no backtracking. Even if you don't need 'try', it won't work for even basic stuff. Here's another way: http://www.mail-archive.com/haskell-cafe@haskell.org/msg22385.html The OP had the same problem I did, so he made a variant of hGetContents with timeout support. The problem: he used something from unsafe*. I came to Haskell for rigor and reliability and it would make me really sad to have to use a function with 'unsafe' in its name that has a lot of wacky caveats about inlining, etc. In that same thread, Bulat says a timeout-enabled Stream could help. But I can't tell what library that is. 'cabal list stream' shows me 3 libraries none of which seems to be the one in question. Is Streams a going concern? Should I be checking that out? I'm not doing anything with hGetLine because 1) there's no way to specify a maximum number of characters to read 2) what is meant by a "line" is not specified 3) there is no way to tell if it read a line or just got to the end of the data. Even using something like hGetLine that worked better would make the parsing more obscure. Thank you very very much for *any* help.

Are you doing this all in a single thread?
On Tue, Aug 26, 2008 at 4:35 PM, brian
Hi, I've been struggling with this problem for days and I'm dying. Please help.
I want to use Parsec to parse NNTP data coming to me from a handle I get from connectTo.
One unworkable approach I tried is to get a lazy String from the handle with hGetContents. The problem: suppose the first message from the NNTP server is "200 OK\r\n". Parsec parses it beautifully. Now I need to discard the parsed part so that Parsec will parse whatever the server sends next, so I use Parsec's getInput to get the remaining data. But there isn't any, so it blocks. Deadlock: the client is inappropriately waiting for server data and the server is waiting for my first command.
Another approach that doesn't quite work is to create an instance of Parsec's Stream with timeout functionality:
instance Stream Handle IO Char where uncons h = do r <- hWaitForInput h ms if r then liftM (\c -> Just (c, h)) (hGetChar h) else return Nothing where ms = 5000
It's probably obvious to you why it doesn't work, but it wasn't to me at first. The problem: suppose you tell parsec you're looking for (many digit) followed by (string "\r\n"). "123\r\n" won't match; "123\n" will. My Stream has no backtracking. Even if you don't need 'try', it won't work for even basic stuff.
Here's another way: http://www.mail-archive.com/haskell-cafe@haskell.org/msg22385.html The OP had the same problem I did, so he made a variant of hGetContents with timeout support. The problem: he used something from unsafe*. I came to Haskell for rigor and reliability and it would make me really sad to have to use a function with 'unsafe' in its name that has a lot of wacky caveats about inlining, etc.
In that same thread, Bulat says a timeout-enabled Stream could help. But I can't tell what library that is. 'cabal list stream' shows me 3 libraries none of which seems to be the one in question. Is Streams a going concern? Should I be checking that out?
I'm not doing anything with hGetLine because 1) there's no way to specify a maximum number of characters to read 2) what is meant by a "line" is not specified 3) there is no way to tell if it read a line or just got to the end of the data. Even using something like hGetLine that worked better would make the parsing more obscure.
Thank you very very much for *any* help. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- /jve

Perhaps you'll want to continue with the hGetLine setup in one thread
(assuming the NNTP data is line delimited), then in another, parse the data,
then in a third, respond.
Lookup how to use MVar's. Allowing the threads to block on reads/writes is a
lot easier (logically) than figuring out the mess in a single threaded
system. When you have a system like Haskell's threading tools, you're much
better off splitting the tasks up into blocking calls with MVar's to
synchronize.
(Perhaps MVar's aren't quite the correct solution here, but it seems like it
would work to me.)
On Tue, Aug 26, 2008 at 4:40 PM, brian
On Tue, Aug 26, 2008 at 3:38 PM, John Van Enk
wrote: Are you doing this all in a single thread?
Yes.
-- /jve

On Tue, Aug 26, 2008 at 3:43 PM, John Van Enk
Perhaps you'll want to continue with the hGetLine setup in one thread (assuming the NNTP data is line delimited), then in another, parse the data, then in a third, respond.
Sorry if my writing was unclear. I think hGetLine is really unsuited for doing anything with data from a network. It's like the Haskell equivalent of gets(3). I think it's only suitable for quick tests or toy programs. The only way I can think to make it a little safer is to wrap it in a timeout, and that'd still be really bad.

Hello, Polyparse has some lazy parsers: http://www.cs.york.ac.uk/fp/polyparse/ Perhaps that would do the trick? j. At Tue, 26 Aug 2008 15:35:28 -0500, brian wrote:
Hi, I've been struggling with this problem for days and I'm dying. Please help.
I want to use Parsec to parse NNTP data coming to me from a handle I get from connectTo.
One unworkable approach I tried is to get a lazy String from the handle with hGetContents. The problem: suppose the first message from the NNTP server is "200 OK\r\n". Parsec parses it beautifully. Now I need to discard the parsed part so that Parsec will parse whatever the server sends next, so I use Parsec's getInput to get the remaining data. But there isn't any, so it blocks. Deadlock: the client is inappropriately waiting for server data and the server is waiting for my first command.
Another approach that doesn't quite work is to create an instance of Parsec's Stream with timeout functionality:
instance Stream Handle IO Char where uncons h = do r <- hWaitForInput h ms if r then liftM (\c -> Just (c, h)) (hGetChar h) else return Nothing where ms = 5000
It's probably obvious to you why it doesn't work, but it wasn't to me at first. The problem: suppose you tell parsec you're looking for (many digit) followed by (string "\r\n"). "123\r\n" won't match; "123\n" will. My Stream has no backtracking. Even if you don't need 'try', it won't work for even basic stuff.
Here's another way: http://www.mail-archive.com/haskell-cafe@haskell.org/msg22385.html The OP had the same problem I did, so he made a variant of hGetContents with timeout support. The problem: he used something from unsafe*. I came to Haskell for rigor and reliability and it would make me really sad to have to use a function with 'unsafe' in its name that has a lot of wacky caveats about inlining, etc.
In that same thread, Bulat says a timeout-enabled Stream could help. But I can't tell what library that is. 'cabal list stream' shows me 3 libraries none of which seems to be the one in question. Is Streams a going concern? Should I be checking that out?
I'm not doing anything with hGetLine because 1) there's no way to specify a maximum number of characters to read 2) what is meant by a "line" is not specified 3) there is no way to tell if it read a line or just got to the end of the data. Even using something like hGetLine that worked better would make the parsing more obscure.
Thank you very very much for *any* help. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

"Jeremy Shaw"
Polyparse has some lazy parsers:
but Tomasz Zielonka once posted a 'lazyMany' combinator for Parsec, which I've used successfully. Isn't lazy parsing of many NNTPResponse or similar what you want? (If you can't find the original post, there's a copy at http://malde.org/~ketil/biohaskell/biolib/Bio/Util/Parsex.hs) PS: While I try to avoid 'unsafePerformIO', I tend to make use of 'unsafeInterleaveIO'. Although I'm sure the creative people here can demonstrate cases of nasal demons caused by that function, too. -k -- If I haven't seen further, it is by standing in the footprints of giants

I made a small example related to the problem: http://hpaste.org/9957 It's my attempt to run data from the network directly into Parsec without having to fear deadlock due to blocking. The idea is that we feed Parsec from a timeout-enabled Stream based on Handle. As long as Parsec is able to read the data it wants reasonably quickly, everything is OK. If the remote host stops sending data, we don't hang; we just treat it as a parse error. But the code demonstrates a problem. Why is it doing that? How to fix it? Thanks.

On Tue, Aug 26, 2008 at 1:35 PM, brian
One unworkable approach I tried is to get a lazy String from the handle with hGetContents. [...] The OP had the same problem I did, so he made a variant of hGetContents with timeout support. The problem: he used something from unsafe*. I came to Haskell for rigor and reliability and it would make me really sad to have to use a function with 'unsafe' in its name that has a lot of wacky caveats about inlining, etc.
unsafeInterleaveIO has no weird inlining caveats (as opposed to unsafePerformIO, which does). In fact, hGetContents is implemented using unsafeInterleaveIO; the source is here: http://haskell.org/ghc/docs/latest/html/libraries/base/src/GHC-IO.html#hGetC... The "unsafe" bit is that it potentially allows side effects embedded within pure values; if the value is never demanded, the side effect never executes, so the observable behavior of your program can be made to depend on whether or not (or when) that value is evaluated. But when you are reading from a network stream, that's exactly what you want. -- ryan

Quoth brian

Donn Cave wrote:
... I would implement the network service input data stream myself, with timeouts, encryption, whatever as required, and then apply the parser to available data as a simple, pure function that returns NNTP results and whatever data remains. So the parser would never see any streams or handles or anything, it would just get strings to parse.
A likely problem with that is that your implementation of the "input data stream" will still need to parse some information from it. So you're going to replicate code from the parser. I think the following is analoguous. Imagine you're writing a parser for a simple programming language. A program is a sequence of statements. Fine, you do "readFile" (once) and then apply a pure Parsec parser. Then you decide to include "import" statements in your language. Suddenly the parser needs to do IO. Assume the import statements need not be the first statements of the program (there may be headers, comments etc. before). Then you really have to interweave the parsing and the IO. If anyone has a nice solution to this, please tell. - J.W.

On Fri, 2008-08-29 at 20:01 +0200, Johannes Waldmann wrote:
Donn Cave wrote:
... I would implement the network service input data stream myself, with timeouts, encryption, whatever as required, and then apply the parser to available data as a simple, pure function that returns NNTP results and whatever data remains. So the parser would never see any streams or handles or anything, it would just get strings to parse.
A likely problem with that is that your implementation of the "input data stream" will still need to parse some information from it. So you're going to replicate code from the parser.
I think the following is analoguous.
Imagine you're writing a parser for a simple programming language. A program is a sequence of statements. Fine, you do "readFile" (once) and then apply a pure Parsec parser.
Then you decide to include "import" statements in your language. Suddenly the parser needs to do IO. Assume the import statements need not be the first statements of the program (there may be headers, comments etc. before). Then you really have to interweave the parsing and the IO.
If anyone has a nice solution to this, please tell. - J.W.
Using parsec3 you can just do exactly what you said.

Quoth Johannes Waldmann

Johannes Waldmann wrote:
Imagine you're writing a parser for a simple programming language. A program is a sequence of statements. Fine, you do "readFile" (once) and then apply a pure Parsec parser.
Then you decide to include "import" statements in your language. Suddenly the parser needs to do IO. Assume the import statements need not be the first statements of the program (there may be headers, comments etc. before). Then you really have to interweave the parsing and the IO.
If anyone has a nice solution to this, please tell. - J.W.
Design your language in a way that the *parse* tree does not depend on import statements? I.e. Chasing imports is performed after you've got an abstract syntax tree. Regards, apfelmus

apfelmus wrote:
Design your language in a way that the *parse* tree does not depend on import statements? I.e. Chasing imports is performed after you've got an abstract syntax tree.
OK, that would work. This property does not hold for Haskell, because you need the fixities of the operators (so, another language design error :-) J.W.

On 2008-08-30, Johannes Waldmann
apfelmus wrote:
Design your language in a way that the *parse* tree does not depend on import statements? I.e. Chasing imports is performed after you've got an abstract syntax tree.
OK, that would work.
This property does not hold for Haskell, because you need the fixities of the operators (so, another language design error :-)
Yes, but you can partially parse into a list, which later gets completely parsed. It's not like C with its textual inclusion, and constructs changing what counts as a type. -- Aaron Denney -><-

On Fri, Aug 29, 2008 at 11:15 AM, Donn Cave
Quoth brian
: | I want to use Parsec to parse NNTP data coming to me from a handle I | get from connectTo.
I would implement the network service input data stream myself, with timeouts
Could you explain a little about how this would look? If it's reading characters trying to make a String we want to call a 'line', isn't that what the parser is supposed to be doing? If you were parsing /etc/passwd, would you read each line yourself and give each one to Parsec?
So the parser would never see any streams or handles or anything, it would just get strings to parse.
Well, I think the parser still works with a Stream. For example, Text/Parsec/ByteString.hs makes ByteString an instance of Stream. My next try is to make this thing an instance of Stream: data Connection = Connection { connectionHandle :: Handle , connectionData :: C.ByteString } In uncons, the easy case is when connectionData is nonnull. If it is null, hWaitForInput on the handle. If we get something, read it and return appropriate stuff. If not, it's a parse error similar to getting unexpected EOF in a file.

Quoth brian

There's a whole bunch of other problems with lazy network IO. The big
problem is that you cannot detect when your stream ends since that
will happen inside unsafeInterleaveIO which is invisible from inside
pure code. You also have no guarantee that the lazy code actually
consumes code enough. Finalisers don't help, either, since there is
in fact no guarantee they are actually run, never mind on time.
The proposed solution by Oleg & co. is to use enumerations/left folds
[1]. The basic idea is to use a callback which gets handed a chunk of
the input from the network. When the last chunk is handed-out the
connection is closed automatically. Using continuations, you can turn
this into a stream again [2] which is needed for many input processing
tasks, like parsing.
I remember Johan Tibell (CC'd) working on an extended variant of
Parsec that can deal with this chunked processing. The idea is to
teach Parsec about a partial input and have it return a function to
process the rest (a continuation) if it encounters the end of a chunk
(but not the end of a file). Maybe Johan can tell you more about
this, or point you to his implementation.
[1]: http://okmij.org/ftp/papers/LL3-collections-enumerators.txt
[2]: http://okmij.org/ftp/Haskell/fold-stream.lhs
/ Thomas
On Tue, Aug 26, 2008 at 10:35 PM, brian
Hi, I've been struggling with this problem for days and I'm dying. Please help.
I want to use Parsec to parse NNTP data coming to me from a handle I get from connectTo.
One unworkable approach I tried is to get a lazy String from the handle with hGetContents. The problem: suppose the first message from the NNTP server is "200 OK\r\n". Parsec parses it beautifully. Now I need to discard the parsed part so that Parsec will parse whatever the server sends next, so I use Parsec's getInput to get the remaining data. But there isn't any, so it blocks. Deadlock: the client is inappropriately waiting for server data and the server is waiting for my first command.
Another approach that doesn't quite work is to create an instance of Parsec's Stream with timeout functionality:
instance Stream Handle IO Char where uncons h = do r <- hWaitForInput h ms if r then liftM (\c -> Just (c, h)) (hGetChar h) else return Nothing where ms = 5000
It's probably obvious to you why it doesn't work, but it wasn't to me at first. The problem: suppose you tell parsec you're looking for (many digit) followed by (string "\r\n"). "123\r\n" won't match; "123\n" will. My Stream has no backtracking. Even if you don't need 'try', it won't work for even basic stuff.
Here's another way: http://www.mail-archive.com/haskell-cafe@haskell.org/msg22385.html The OP had the same problem I did, so he made a variant of hGetContents with timeout support. The problem: he used something from unsafe*. I came to Haskell for rigor and reliability and it would make me really sad to have to use a function with 'unsafe' in its name that has a lot of wacky caveats about inlining, etc.
In that same thread, Bulat says a timeout-enabled Stream could help. But I can't tell what library that is. 'cabal list stream' shows me 3 libraries none of which seems to be the one in question. Is Streams a going concern? Should I be checking that out?
I'm not doing anything with hGetLine because 1) there's no way to specify a maximum number of characters to read 2) what is meant by a "line" is not specified 3) there is no way to tell if it read a line or just got to the end of the data. Even using something like hGetLine that worked better would make the parsing more obscure.
Thank you very very much for *any* help. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, Aug 30, 2008 at 3:51 AM, Thomas Schilling
I remember Johan Tibell (CC'd) working on an extended variant of Parsec that can deal with this chunked processing. The idea is to teach Parsec about a partial input and have it return a function to process the rest (a continuation) if it encounters the end of a chunk (but not the end of a file). Maybe Johan can tell you more about this, or point you to his implementation.
I have written a parser for my web server that uses continuations to resume parsing. It's not really Parsec like anymore though. It only parses LL(1) grammars as that's all I need to parse HTTP. I haven't released a first version of my server yet, indeed most of the code is on this laptop and not in the Git repo [1], but if you would like to steal some ideas feel free. 1. http://www.johantibell.com/cgi-bin/gitweb.cgi?p=hyena.git;a=blob;f=Hyena/Par... Cheers, Johan
participants (12)
-
Aaron Denney
-
apfelmus
-
brian
-
Derek Elkins
-
Donn Cave
-
Jeremy Shaw
-
Johan Tibell
-
Johannes Waldmann
-
John Van Enk
-
Ketil Malde
-
Ryan Ingram
-
Thomas Schilling