Haskell-beginners problem with memory consuption

Hi, I have just started using Haskell, and it has been really fun so far. But I have problems with memory usage when handling large files. What I do is the following; I load my file in one chunk, and does a lot of substitutes on the string - this is quick eating all my memory and the computers start to get really slow. The problem is of course that the string is copied each time I do a substitute, and I wonder if a more experienced haskeller has a better solution to my problem. I have myself considered these solutions, but they all seems non-elegant; 1. Use some IO extensions to fake a mutable variable - but solving a problem like this with a solution which is meant for IO does not seem right? 2. Split the file in smaller parts could be a solution, but then I would have to do a lot of preparsing to make sure I not spilt my file in the middle of a pattern. Sounds to me like developing a lex-like system for such a easy task is overkill? 3. I could maybe use some form of mutable arrays, but doing string regexps on an array...? 4. ? Any clues or opinions? Cheers, Petter Egesund

Am Mittwoch, 1. Oktober 2003, 15:18 schrieb Petter Egesund:
[...]
The problem is of course that the string is copied each time I do a substitute, and I wonder if a more experienced haskeller has a better solution to my problem.
It doesn't have to be a problem that the string is copied each time. If you have, e.g., functions f1, f2, ..., fn :: String -> String and do something like f1 (f2 (... (fn string)...)) then string and the intermediate data can be removed by the garbage collector as soon as they are not needed. Unfortunately, from your message it's not very clear to me what exactly you mean, and, unfortunately again, I'm not an expert in Haskell memory management.
I have myself considered these solutions, but they all seems non-elegant; [...]
Indeed, they all look very non-elegant, and I think, there is a better solution.
Cheers,
Petter Egesund
Wolfgang

Petter Egesund
I load my file in one chunk, and does a lot of substitutes on the string - this is quick eating all my memory and the computers start to get really slow.
Keep in mind that Strings are lists of characters. I think (somebody correct me if I'm wrong) GHC will store a character inside a cons cell, but that still leaves 8 bytes per character. Worst case it will store the 8-byte cons cell pointing to a 32-bit char value, 12 bytes per character. (Strings as lists-of-char is very useful, but not terribly efficient). Using hGetArray to read into a UArray of Word8, or something like that, will probably be a lot faster and save a lot of space.
The problem is of course that the string is copied each time I do a substitute
As W.J. says, you should make sure that you don't keep more references to the original string than you need to, so that old stuff can be garbage collected. If you post (pieces of) the code, people may be able to point out improvements more easily. -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (3)
-
Ketil Malde
-
Petter Egesund
-
Wolfgang Jeltsch