Re: [Haskell-cafe] help optimizing memory usage for a program

2 Mar 2009

      Bulat Ziganshin ha scritto:
...
Hello Manlio,
Monday, March 2, 2009, 6:30:51 PM, you wrote:
...
The process-data-1 program parse the entire dataset using about 1.4 GB
of memory (3x increment).
...
This is strange.
The memory required is proportional to the number of ratings.
It may be IntMap the culprit, or the garbage collector than does not 
release the memory to the operating system (or, worse, does not 
deallocate all used temporary memory).
ghc has 2 garbage collectors.
...
[...]
I already tried to enable the compacting collection (-c RTS option).

Here are some numbers
(when parsing only 5000 movie files, with process-data-1):

1) With default collection algorithm, I have:

     real	1m4.599s
     user	0m49.131s
     sys	        0m1.492s

     409 MB

2) With -c option:

     real  1m45.197s
     user  0m59.248s
     sys	  0m1.640s

     418 MB

So, nothing changed.
...
moreover, you may set up"growing factor". with a g.f. of
1.5, for example, memory will be collected once heap will become 1.5x
larger than real memory usage after last GC. this effectively
guarantees that memory overhead will never be over this factor
Thanks.
This seems to be effective (but it also reduce performances).

3) With -F1 option

     real  9m59.789s
     user  9m41.844s
     sys	  0m5.608s

     264 MB

I have to parse the whole data set, to check if memory usage is good.
...
look at GHC manual, RTS switches section. and last - GHC never returns
memory to the OS, there should be ticket on this
The same problem with Python.

By the way: I have written the first version of the program to parse 
Netflix training data set in D.
I also used ncpu * 1.5 threads, to parse files concurrently.

However execution was *really* slow, due to garbage collection.
I have also tried to disable garbage collection, and to manually run a 
garbage cycle from time to time (every 200 file parsed), but the 
performance were the same.

Running the Haskell version with -F1 *seems* (I have to check) to be as 
slow as the D version.

So, it seems that for this class of problems it is better to do manual 
memory management (and it is not even hard).
When I find some time I will try to write a C++ version (I also have a 
Python version, but it is very memory inefficient).

At least I hope that I can serialize the parsed data in a binary file, 
and then read it again, with optimized memory usage.

Thanks  Manlio Perillo