Can someone help? The program below works fine with small files but when I try to use it on the one I need to (about 3 million lines of data) it produces no output. The hard disk is hammered - I assume this is the run time system paging. My suspicion is that the program is trying to read in the whole file before processing it. Is this correct? If so, how do I make the program lazy so that it processes a line at time?
I was about to apply GHood to your program to see whether such a tool could help to find the problem, so I started to cut down your code. However, in the simplified version of the program, one can see what is going on without any graphical tool.. (nevertheless, observation of the simplified code with GHood confirms your suspicion immediately, and it points out the spine of the list as the problem, too, so the tool is useful!-) (*) In effect, your anonymize comes down to a "mapM" over a list of actions applied to input lines, and all of the resulting IO-actions are placed before the single "hPutStr". So, even if the results of the individual actions in the list may not be needed until later, the whole spine of the list of lines has to be traversed before "hPutStr" can be executed, meaning that all input is read before any output is produced (and thus before any computation results are requested, blowing up memory usage). For the problem at hand, you could simply output each line as it is processed instead of just returning it into a list for later use (see variant _1 below). If you would want to keep both the modular program structure and the explicit line-by-line IO-style, you would need to interleave the input and output commands somehow (perhaps similar to variant _2 below?). Hth, Claus (*) Please note that our web-server is being upgraded today.. (web-pages and GHood download will not be available until tomorrow, hence no URL here :-( PS The simplified code (+ variations) with observations: module Main(main) where import Observe import IO(openFile, hPutStr, IOMode(ReadMode,WriteMode,AppendMode)) filename = "ldif1.txt" fileout = "ldif.out" readAndWriteAttrVals = do h <- openFile fileout WriteMode s <- readFile filename let action l = return (':':l) a <- mapM action (observe "input" (lines s)) hPutStr h (unlines (observe "output" a)) main = runO readAndWriteAttrVals readAndWriteAttrVals_1 = do h <- openFile fileout WriteMode s <- readFile filename let action_and_output l = hPutStr h (':':l) mapM_ (observe "output" action_and_output) (observe "input" (lines s)) main_1 = runO readAndWriteAttrVals_1 readAndWriteAttrVals_2 = do h <- openFile fileout WriteMode s <- readFile filename let { action l = return (':':l) ; as = map action (observe "input" (lines s)) ; os = repeat (hPutStr h) } mapM id (observe "output" (zipWith (>>=) as os)) main_2 = runO readAndWriteAttrVals_2
participants (1)
-
C.Reinke