Screen scraping with an interactive process: Buffering problems?

Hi all, I'm writing some code to interact with an ACL2 [0] process. I'd like to be able to write a function test :: String -> IO Bool that will attempt to prove something by forking an ACL2 process and screen scraping its output, to see whether the conjecture was proved. The code below [1] is close: it launches the process, sends the conjecture to ACL2, and reads ACL2's output line-by-line until it finds "Proof succeeded" or "attempt has failed". This works well enough for the simple case, but there is a problem. In readACL2Answer, the code shouldn't have to check for success *and* failure, since the conditions are complementary. But if I change that code to check only for success, then it is possible (and indeed it happens every time) that if I send ACL2 a conjecture it can't prove, readACL2Answer blocks indefinitely -- and the print statements show that hWaitForInput has said there is more data on the Handle but hGetLine blocks trying to read it. The reason for this, as far as I can tell, is that there *is* more data -- a character, for example -- but no newline, so hGetLine blocks waiting for a newline. However, if I'm reading the BufferMode documentation [3] correctly, the fact that I've set my streams to LineBuffering should mitigate that scenario. Concretely, I know that ACL2 runs in a REPL and the last thing it prints is in fact an incomplete line -- "ACL2 !>". I'd rather not have to read every line character by character just to detect the uncommon case. Should I just do it character by character? Am I reading the BufferMode docs wrong, and if so, what is the correct interpretation? Is there better approach for the original problem I'm trying to solve? Thanks for reading this far. =] -- Denis [0] http://www.cs.utexas.edu/users/moore/acl2/ [1] import System.Process import System.IO type Proc = (Handle, Handle, Handle, ProcessHandle) testThm = test "(equal x x)" -- something acl2 can prove testBad = test "(equal x (not x))" -- something it can't -- | Pass a conjecture form to ACL2, attempt to prove it, and return a boolean -- indicating whether ACL2 was able to prove it. test :: String -> IO Bool test thm = do i@(stdin, _, _, pid) <- setUpACL2 success <- testProc i thm hPutStrLn stdin "(good-bye)" putStrLn "saying goodbye" exitCode <- waitForProcess pid case exitCode of ExitSuccess -> return success ExitFailure i -> error ("ACL2 bad exit code: " ++ show i) -- | Pass a conjecture vio the given process' input stream, and return whether -- it is proved. testProc :: Proc -> String -> IO Bool testProc i@(stdin, stdout, _, pid) thm = do hSetBuffering stdout LineBuffering hSetBuffering stdin LineBuffering success <- testInACL2 ("(thm " ++ thm ++ ")") i putStrLn "found success value" return success succeedRx = mkRegex "Proof succeeded[.]" failedRx = mkRegex "attempt has failed[.]" -- create acl2 process and return handles setUpACL2 :: IO Proc setUpACL2 = runInteractiveCommand "acl2" testInACL2 :: String -> Proc -> IO Bool testInACL2 str (pin, pout, perr, pid) = do putStrLn "printing to acl2..." hPutStrLn pin str putStrLn "getting acl2 result..." readACL2Answer pout -- | How long we should wait for ACL2 to start saying something. Five -- seconds, currently. readTimeout = 100 readACL2Answer :: Handle -> IO Bool -- I *should* be required *only* to check for *success* in the matchRegex -- statement below, instead of checking for success and failure. But if I do -- that, and the theorem to be proved fails, this function blocks indefinitely -- (at least I observe a much greater than 5 second wait period), even though -- it should only block for `readTimeout' milliseconds. I'm not sure why this -- is, but at least there is an acceptable workaround. readACL2Answer pout = hWaitForInput pout readTimeout >>= (\isReady -> if isReady then do putStrLn "getting line" line <- hGetLine pout putStrLn $ "read line: " ++ line -- Check for success or failure. case (matchRegex succeedRx line, matchRegex failedRx line) of (Nothing, Nothing) -> readACL2Answer pout (Nothing, Just _) -> return False (Just _, _) -> return True else do putStrLn "not ready" return False) [3] http://haskell.org/ghc/docs/latest/html/libraries/base-3.0.0.0/System-IO.htm...

What about using hGetContents to just read ALL of the input, as a lazy string? Then you look through that string for success or failure. In other words, readACL2Answer pout = do s <- hGetContents pout parse s here

On Nov 6, 2007 10:15 PM, David Benbennick
What about using hGetContents to just read ALL of the input, as a lazy string? Then you look through that string for success or failure. In other words,
readACL2Answer pout = do s <- hGetContents pout parse s here
Ironically, this was my first problem. First of all, I don't think I want the semi-closed state -- I want to be able to read and write freely later on (I may be misunderstanding semi-closed, however). Second, when I used this approach, after the hGetContents call I did the regexp matching, and the program hung during matching. -- Denis

On Nov 7, 2007 5:12 AM, Denis Bueno
Ironically, this was my first problem. First of all, I don't think I want the semi-closed state -- I want to be able to read and write freely later on (I may be misunderstanding semi-closed, however).
I don't think that makes sense. First of all, pout is only the stdout of the program, so you can only read from it, not write to it. And once you do hGetContents, you have read all the data that will ever exist on that handle, so there's nothing to read from it "later on".
Second, when I used this approach, after the hGetContents call I did the regexp matching, and the program hung during matching.
That's not surprising. You first match for "Proof succeeded". So if the proof failed, the matcher will go past the "attempt has failed" message, looking for a later succeeded. Which means it will block waiting for more output from your subprocess, which will never produce more output until you give it another request. Assuming that each call to ACL2 produces exactly one of either "Proof succeeded" or "attempt has failed", you can get a list of results like this (where aclOutput :: String is the result of hGetContents): let results = map (\l -> if l == "Proof succeeded" then True else False) $ filter (\l -> elem l ["Proof succeeded", "attempt has failed"]) $ lines aclOutput Then results :: [Bool], and results !! n is True if the nth call succeeded. Just make sure not to inspect results !! n until after making the nth call to ACL2, or it will block.

On Nov 7, 2007 4:44 PM, David Benbennick
And once you do hGetContents, you have read all the data that will ever exist on that handle, so there's nothing to read from it "later on".
I completely misunderstood how hGetContents works. This now makes sense. I first thought hGetContents would return "everything you could read from the stream right now" as a string. From reading the the haddock documentation for System.IO.hGetContents, I failed to understand that the string it returns will contain characters that haven't even been written to the (interactive) stream by the time of the call to hGetContents. I'm not sure if the hGetContents documentation could easily be made to make that fact more salient, but I'd be willing to submit a patch, if other people have been confused.
Assuming that each call to ACL2 produces exactly one of either "Proof succeeded" or "attempt has failed", you can get a list of results like this (where aclOutput :: String is the result of hGetContents):
let results = map (\l -> if l == "Proof succeeded" then True else False) $ filter (\l -> elem l ["Proof succeeded", "attempt has failed"]) $ lines aclOutput
Then results :: [Bool], and results !! n is True if the nth call succeeded. Just make sure not to inspect results !! n until after making the nth call to ACL2, or it will block.
I used a minor modification of this approach ("Proof succeeded" is not on a line by itself), and it worked famously. Thanks for the help, David! -- Denis
participants (2)
-
David Benbennick
-
Denis Bueno