Re: [Haskell-beginners] too lazy parsing?

On Monday 04 February 2013, 11:50:24, Kees Bleijenberg wrote:
module Main where
import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse) import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)
parseFile hOut fn = do
handle <- openFile fn ReadMode cont <- hGetContents handle
hGetContents does read the file only on demand
print cont
Here the contents is demanded, so the file is read.
let res = parse (many (string "blah")) "" cont
The binding of res is lazy, so it's not demanded yet, without the printing, the file would still not be read.
hClose handle
Now the file handle is closed, so nothing more can be read from the file. If nothing has been demanded so far, cont will be an empty string. 1. You should rather use readFile, unless you need to read a lot of files, in which case opening too many at once may exhaust the available file handles; the you need strict IO with exact control over when a file is opened, read, and closed. readFile only semi-closes the file handle, reading from the file still works until the contents goes out of scope or the end of the file is reached. Also readFile leads to simpler code, cont <- readFile fn let res = parse (many (string "blah")) "" cont case res of ... 2. if you absolutely want to use the more cumbersome hOpen - hGetContents - hClose sequence, you need to force the file contents to be read before closing the file. Instead of printing the contents, let res = ... res `seq` hClose handle would work here, generally, to ensure the entire file was read, a common way is length cont `seq` hClose handle But in those cases, it would probably make more sense to use strict IO anyway, rather than lazy IO. 3. The only reason to use hOpen - hGetContents - hClose instead of readFile is more exception-safety, in which case you should use bracket (from Control.Exception), or the wrapper around bracket withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
case res of
(Left err) -> hPutStrLn hOut $ "Error: " ++ (show err) (Right goodRes) -> mapM_ (hPutStrLn hOut) goodRes
main = do
hOut <- openFile "outp.txt" WriteMode mapM (parseFile hOut) ["inp.txt"] hClose hOut
I’am writing a program that parses a lot of files. Above is the simplest program I can think of that demonstrates my problem. The program above parses inp.txt. Inp.txt has only the word blah in it. The output is saved in outp.txt. This file contains the word blah after running the program. if I comment out the line ‘print cont’ nothing is saved in outp.txt. If I comment out ‘print cont’ and replace many with many1 in the following line, it works again?
Hmm, I get (as expected) Error: (line 1, column 1): unexpected end of input expecting "blah" with many1. When you parse an empty string with `many parser`, the parse succeeds and returns an empty list. Successively printing an empty list of strings means printing nothing at all. Using `many1` instead of `many` makes the parse fail, and then you print the error, thus `many1 (string "blah") produces some output from an empty file [and without forcing the file contents in some way before closing, the file is effectively empty].
Can someone explain to me what is going on?
Kees
participants (1)
-
Daniel Fischer