
On Tuesday 07 June 2011, 15:22:38, Manfred Lotz wrote:
On Mon, 6 Jun 2011 21:55:23 +0200
Daniel Fischer
wrote: Problem, part 2, the real problem.
The point is that getXmlContent doesn't really parse the file yet, it returns a thunk saying how to get the result from the file contents. Therefore, it doesn't need to read the entire file, just enough of it to find out whether parseXMLDoc returns a Just contents or a Nothing.
Then you close the handle, explicitly. That means, it's closed immediately, leaving the unread portion of the file unread. If the bla bla is long enough, the location tag is in the unread part, and when finally the search for that tag is forced by printing, the tag is not contained in the input.
If you leave out the call to hClose, leaving the closing to hGetContents when it reaches the end of the file, the file contents is not truncated, and the location found. But then the file handle might remain half-closed longer than you wish (you could run out of file handles if you open a lot of files without explicitly closing them before the next [bunch] is opened).
I ran out of handles in the first place that's why I had changed the code just to got bitten by the IO laziness. Thanks for explaining it to me. Changing the sample code to use either bracket or withBinaryFile (implicitly bracket) makes it work indeed.
When I go back to my original program it seems I cannot get rid easily of the laziness.
I process a bunch of xml files like this:
import qualified Data.Map as M ...
-- xmlfiles is a list of [FilePath] ht <- foldM insertXml M.empty xmlfiles mapM_ printEntry (M.toList ht)
and in insertXML I now have
insertXml m xf = do U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle let k = ctName ct let m' = if k /= "" then if M.lookup k m == Nothing then M.insert k ct m else m else m return m')
I'm not sure if it is obvious where my mistake here lies. If not I have to try to make this a working minimal example.
I wrote:
If you force the result before closing the handle, enough of the file is read to find the desired elements,
You're not forcing the result. You can choose to force in getXmlContent by seq'ing on name, location and whatever else fields you have, then using a strict return (return $! m') in the withBinaryFile action is sufficient, or you can do all the forcing in the withBinaryFile action, for example do ct@(CTest !nm !loc) <- getXmlContent xf handle ... return $! m' with BangPatterns (use seq if you want your code to be portable to other implementations than GHC).