maybe IO doesn't suck, but my code does...

Hello, I'm a haskell beginner, and I'm struggling with the following problem: I've started writing a simple apache log file analyzer, but I cannot get rid of important memory usage problems (in fact, at each attempt, I fear I won't be able to unlock my box as my linux 2.6.9 kernel is on its knees, which reminds me on my early days writing C on MMU-less processors... not because of the language of course :-)) Enclosed is a sample code, which aborts on a large (> 100000 lines) file. I tried different variations (readFile $ lines, openFile, openFile + IORef,...) but with no success... So, if the enclosed version is not too far, please give me a hint. Alternatively, if I took the wrong direction, please refocus my search .-) Thanks in advance, Frédéric -- Frédéric Gobry Infoscience DIT-KIS / EPFL Tel: +41216932288 http://people.epfl.ch/frederic.gobry

On 2004 December 03 Friday 05:33, Frédéric Gobry wrote:
important memory usage problems (in fact, at each attempt, I
Alternatively, if I took the wrong direction, please refocus my search http://haskell.org/hawiki/ForcingEagerEvaluation and especially follow the link and look at "Strict datatypes, seq, ($!), DeepSeq and Strategies".

Hi Frédéric, I took a look at you program. To be honest I have to admit that it is not how I think one should program in Haskell. However I have changed it to the style I would recomment and now it runs in constant space. First of all I thought it is enough to strictify the members of your records. Btw. this nearly never hurts if you have some kind of State transition.
data Stat = Stat { latest :: !CalendarTime, total :: !Int } deriving (Show)
This was not enough but it was essential in the final version anyway. So lets think about why this is necessary. Well you write in _doParse something like:
Stat (max (time hit) (latest state)) (total state + 1) ^^^^^^^^^^^^^^^^
Since Haskell is lazy it won't evaluate the (+) so in it keeps all copies of the Stat until the very end when you actually print it. The ! annotation in the record definition doesn't allow Haskell to store closures in the members, so it is forced to evaluate it. Now to the design! Please do not use IORefs if you don't really need them. The streaming that you have done by reading line by line by hand can be performed using lazy IO. The main.hs changes to:
do args <- getArgs contents <- readFile (args !! 0) let result = parseLog contents print result or print $ total result
You don't need IO in the parsing module, believe me. The ParseLog.hs becomes:
module ParseLog (parseLog, total) where import Hit data Stat = Stat { latest :: !CalendarTime, total :: !Int } deriving (Show)
_parseLine :: String -> Stat -> Stat _parseLine line state = let hit = parseHit line in Stat (max (time hit) (latest state)) (total state + 1)
parseLog :: String -> Stat parseLog contents = let initial = Stat epoch 0 in foldl (flip _parseLine) initial $ lines contents
The Hit module keeps unchanged. Try to use functions like fold(l|r), map ... instead of write your recursions by hand and try to make use of the lazyness where it helps.
Recently there have been same discussion about blockwise IO and similar stuff, but if you don't care to much about speed you can go with the standard library.
Cheers,
Georg
On Fri, 3 Dec 2004 11:33:34 +0100, Frédéric Gobry
Hello,
I'm a haskell beginner, and I'm struggling with the following problem: I've started writing a simple apache log file analyzer, but I cannot get rid of important memory usage problems (in fact, at each attempt, I fear I won't be able to unlock my box as my linux 2.6.9 kernel is on its knees, which reminds me on my early days writing C on MMU-less processors... not because of the language of course :-))
Enclosed is a sample code, which aborts on a large (> 100000 lines) file.
I tried different variations (readFile $ lines, openFile, openFile + IORef,...) but with no success...
So, if the enclosed version is not too far, please give me a hint. Alternatively, if I took the wrong direction, please refocus my search .-)
Thanks in advance,
Frédéric
-- ---- Georg Martius, Tel: (+49 34297) 89434 ---- ------- http://www.flexman.homeip.net ---------

how I think one should program in Haskell. However I have changed it to the style I would recomment and now it runs in constant space.
I'm certainly interested in general remarks about style. Consider this a my first program in Haskell, coming from object / imperative background.
First of all I thought it is enough to strictify the members of your records. Btw. this nearly never hurts if you have some kind of State transition.
data Stat = Stat { latest :: !CalendarTime, total :: !Int } deriving (Show)
I won't forget that :-)
Since Haskell is lazy it won't evaluate the (+) so in it keeps all copies of the Stat until the very end when you actually print it. The ! annotation in the record definition doesn't allow Haskell to store closures in the members, so it is forced to evaluate it.
Wouldn't it be conceivable to reduce some expressions when the runtime system notices it starts to grow too much (some sort of expression garbage collection), or would this alter the semantic of the code too deeply ?
Please do not use IORefs if you don't really need them. The streaming that you have done by reading line by line by hand can be performed using lazy IO.
You know, I started with a lines $ readFile, the code I sent is the result of many failed attempts to overcome my problems :-)
Recently there have been same discussion about blockwise IO and similar stuff, but if you don't care to much about speed you can go with the standard library.
I'll maybe perform the actual processing I need to get the job done first, and come back to the mailing list once I'll be stuck with blockwise IO :-) Thanks for your help, Frédéric -- Frédéric Gobry Infoscience DIT-KIS / EPFL Tel: +41216932288 http://people.epfl.ch/frederic.gobry

At Sat, 4 Dec 2004 22:18:38 +0100, Frédéric Gobry wrote:
Since Haskell is lazy it won't evaluate the (+) so in it keeps all copies of the Stat until the very end when you actually print it. The ! annotation in the record definition doesn't allow Haskell to store closures in the members, so it is forced to evaluate it.
Wouldn't it be conceivable to reduce some expressions when the runtime system notices it starts to grow too much (some sort of expression garbage collection), or would this alter the semantic of the code too deeply ?
This paper may have some answers, though I don't if it addresses that specific question: Optimistic Evaluation: An adaptive evaluation strategy for non-strict programs available at (among other places): http://www.cl.cam.ac.uk/~rje33/icfp2003.pdf Jeremy Shaw. -- This message contains information which may be confidential and privileged. Unless you are the addressee (or authorized to receive for the addressee), you may not use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender and delete the message. Thank you.

I'll maybe perform the actual processing I need to get the job done first, and come back to the mailing list once I'll be stuck with blockwise IO :-)
Arg, I tested the version with strict Stat and and no explicit IO, but it still explodes on my log file (stack space overflow after a lot of swapping) :-( Frédéric

Hi,
you probably need to compile it with -O. You can use -Wall as well to see some hint from ghc that prevents you from making silly mistakes.
Georg
On Sat, 4 Dec 2004 22:44:49 +0100, Frédéric Gobry
I'll maybe perform the actual processing I need to get the job done first, and come back to the mailing list once I'll be stuck with blockwise IO :-)
Arg, I tested the version with strict Stat and and no explicit IO, but it still explodes on my log file (stack space overflow after a lot of swapping)
:-(
Frédéric
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- ---- Georg Martius, Tel: (+49 34297) 89434 ---- ------- http://www.flexman.homeip.net ---------
participants (4)
-
Frédéric Gobry
-
Georg Martius
-
Jeremy Shaw
-
Scott Turner