
Hi all, I'm a bit lost. I'm having a problem with xml light but as a Haskell newbie it could be of course that the problem is sitting between my keyboard and chair. I tried hard to create a minimal example. Here is a minimal xmltest.hs: <-----------------------------snip--------------------------------> module Main where import System.Environment.UTF8 import qualified System.IO.UTF8 as U import System.IO import Text.XML.Light data CTest = CTest { ctName :: String , ctLocation :: String } deriving (Show,Read,Eq,Ord) getXmlContent :: Handle -> IO CTest getXmlContent inh = do xml <- U.hGetContents inh let content = parseXMLDoc xml case content of Just c -> do let name = case findChild (unqual "name") c of Nothing -> "<unknown>" Just n' -> strContent n' let path = case findChild (unqual "location") c of Nothing -> "<unknown>" Just path' -> case findAttr (unqual "path") path' of Nothing -> "<unknown>" Just p -> p return CTest { ctName=name, ctLocation=path} _ -> fail "not expected" readXmlFile :: FilePath -> IO CTest readXmlFile f = do inh <- U.openBinaryFile f ReadMode xml <- getXmlContent inh hClose inh return xml doSomething :: Show a => a -> IO () doSomething xml = print xml main :: IO () main = do args <- getArgs xml <- readXmlFile $ head args doSomething xml <-----------------------------snap--------------------------------> Here is the test.xml I use: <-----------------------------snip--------------------------------> <?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE entry SYSTEM 'test.dtd'> <entry> <name>some name</name> <description> bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla </description> <location path='/info/somepath'/> </entry> <-----------------------------snap--------------------------------> Here the output of the program: CTest {ctName = "some name", ctLocation = "<unknown>"} which is wrong. If I delete just one of the 'bla bla' lines then the output of the program is: CTest {ctName = "some name", ctLocation = "/info/somepath"} which is correct. Question: What is my error? -- Manfred

On Monday 06 June 2011, 20:19:49, Manfred Lotz wrote:
Hi all, I'm a bit lost. I'm having a problem with xml light but as a Haskell newbie it could be of course that the problem is sitting between my keyboard and chair.
Not really. Yes, you're doing something wrong, but it's a bit delicate and nothing obvious. It's the unobvious behaviour of lazy IO.
I tried hard to create a minimal example.
Here is a minimal xmltest.hs:
<-----------------------------snip--------------------------------> module Main where
import System.Environment.UTF8 import qualified System.IO.UTF8 as U
import System.IO import Text.XML.Light
data CTest = CTest { ctName :: String , ctLocation :: String } deriving (Show,Read,Eq,Ord)
getXmlContent :: Handle -> IO CTest getXmlContent inh = do xml <- U.hGetContents inh
Problem, part 1
let content = parseXMLDoc xml case content of Just c -> do let name = case findChild (unqual "name") c of Nothing -> "<unknown>" Just n' -> strContent n' let path = case findChild (unqual "location") c of Nothing -> "<unknown>" Just path' -> case findAttr (unqual "path") path' of Nothing -> "<unknown>" Just p -> p return CTest { ctName=name, ctLocation=path} _ -> fail "not expected"
readXmlFile :: FilePath -> IO CTest readXmlFile f = do inh <- U.openBinaryFile f ReadMode xml <- getXmlContent inh hClose inh
Problem, part 2, the real problem. The point is that getXmlContent doesn't really parse the file yet, it returns a thunk saying how to get the result from the file contents. Therefore, it doesn't need to read the entire file, just enough of it to find out whether parseXMLDoc returns a Just contents or a Nothing. Then you close the handle, explicitly. That means, it's closed immediately, leaving the unread portion of the file unread. If the bla bla is long enough, the location tag is in the unread part, and when finally the search for that tag is forced by printing, the tag is not contained in the input. If you leave out the call to hClose, leaving the closing to hGetContents when it reaches the end of the file, the file contents is not truncated, and the location found. But then the file handle might remain half-closed longer than you wish (you could run out of file handles if you open a lot of files without explicitly closing them before the next [bunch] is opened). If you force the result before closing the handle, enough of the file is read to find the desired elements, and you make sure that there's no leaking file handle -- unless there's an exception in getXmlContent, so that the hClose is never reached. To prevent that, use Control.Exception.bracket, or use withBinaryFile, which does that for you.
return xml
doSomething :: Show a => a -> IO () doSomething xml = print xml
main :: IO () main = do args <- getArgs xml <- readXmlFile $ head args doSomething xml <-----------------------------snap-------------------------------->
Question: What is my error?

On Mon, 6 Jun 2011 21:55:23 +0200
Daniel Fischer
Problem, part 2, the real problem.
The point is that getXmlContent doesn't really parse the file yet, it returns a thunk saying how to get the result from the file contents. Therefore, it doesn't need to read the entire file, just enough of it to find out whether parseXMLDoc returns a Just contents or a Nothing.
Then you close the handle, explicitly. That means, it's closed immediately, leaving the unread portion of the file unread. If the bla bla is long enough, the location tag is in the unread part, and when finally the search for that tag is forced by printing, the tag is not contained in the input.
If you leave out the call to hClose, leaving the closing to hGetContents when it reaches the end of the file, the file contents is not truncated, and the location found. But then the file handle might remain half-closed longer than you wish (you could run out of file handles if you open a lot of files without explicitly closing them before the next [bunch] is opened).
I ran out of handles in the first place that's why I had changed the code just to got bitten by the IO laziness. Thanks for explaining it to me. Changing the sample code to use either bracket or withBinaryFile (implicitly bracket) makes it work indeed. When I go back to my original program it seems I cannot get rid easily of the laziness. I process a bunch of xml files like this: import qualified Data.Map as M ... -- xmlfiles is a list of [FilePath] ht <- foldM insertXml M.empty xmlfiles mapM_ printEntry (M.toList ht) and in insertXML I now have insertXml m xf = do U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle let k = ctName ct let m' = if k /= "" then if M.lookup k m == Nothing then M.insert k ct m else m else m return m') I'm not sure if it is obvious where my mistake here lies. If not I have to try to make this a working minimal example. -- Thanks again, Manfred

On Tuesday 07 June 2011, 15:22:38, Manfred Lotz wrote:
On Mon, 6 Jun 2011 21:55:23 +0200
Daniel Fischer
wrote: Problem, part 2, the real problem.
The point is that getXmlContent doesn't really parse the file yet, it returns a thunk saying how to get the result from the file contents. Therefore, it doesn't need to read the entire file, just enough of it to find out whether parseXMLDoc returns a Just contents or a Nothing.
Then you close the handle, explicitly. That means, it's closed immediately, leaving the unread portion of the file unread. If the bla bla is long enough, the location tag is in the unread part, and when finally the search for that tag is forced by printing, the tag is not contained in the input.
If you leave out the call to hClose, leaving the closing to hGetContents when it reaches the end of the file, the file contents is not truncated, and the location found. But then the file handle might remain half-closed longer than you wish (you could run out of file handles if you open a lot of files without explicitly closing them before the next [bunch] is opened).
I ran out of handles in the first place that's why I had changed the code just to got bitten by the IO laziness. Thanks for explaining it to me. Changing the sample code to use either bracket or withBinaryFile (implicitly bracket) makes it work indeed.
When I go back to my original program it seems I cannot get rid easily of the laziness.
I process a bunch of xml files like this:
import qualified Data.Map as M ...
-- xmlfiles is a list of [FilePath] ht <- foldM insertXml M.empty xmlfiles mapM_ printEntry (M.toList ht)
and in insertXML I now have
insertXml m xf = do U.withBinaryFile xf ReadMode (\handle -> do ct <- getXmlContent xf handle let k = ctName ct let m' = if k /= "" then if M.lookup k m == Nothing then M.insert k ct m else m else m return m')
I'm not sure if it is obvious where my mistake here lies. If not I have to try to make this a working minimal example.
I wrote:
If you force the result before closing the handle, enough of the file is read to find the desired elements,
You're not forcing the result. You can choose to force in getXmlContent by seq'ing on name, location and whatever else fields you have, then using a strict return (return $! m') in the withBinaryFile action is sufficient, or you can do all the forcing in the withBinaryFile action, for example do ct@(CTest !nm !loc) <- getXmlContent xf handle ... return $! m' with BangPatterns (use seq if you want your code to be portable to other implementations than GHC).

On Tue, 7 Jun 2011 16:09:41 +0200
Daniel Fischer
I'm not sure if it is obvious where my mistake here lies. If not I have to try to make this a working minimal example.
I wrote:
If you force the result before closing the handle, enough of the file is read to find the desired elements,
You're not forcing the result. You can choose to force in getXmlContent by seq'ing on name, location and whatever else fields you have, then using a strict return (return $! m') in the withBinaryFile action is sufficient, or you can do all the forcing in the withBinaryFile action, for example
Somehow I had not really understood your first answer. Indeed return $! m' does it nicely. -- Thanks, Manfred
participants (2)
-
Daniel Fischer
-
Manfred Lotz