On Wed, Jun 1, 2011 at 12:55 AM, Aleksandar Dimitrov <aleks.dimitrov@googlemail.com> wrote:
On Tue, May 31, 2011 at 11:30:06PM +0100, John Lato wrote:How big were your input corpora?
> None of these leak space for me (all compiled with ghc-7.0.3 -O2).
> Performance was pretty comparable for every version, although Aleksander's
> original did seem to have a very small edge.
So it seems that I can't get rid of a factor of around 3x the input file size.
Luckily, the dependency seems to be linear. Here's some profiling:
<<ghc: 30478883712 bytes, 57638 GCs, 41925397/143688744 avg/max bytes residency (189 samples), 322M in use, 0.00 INIT (0.00 elapsed), 23.73 MUT (24.94 elapsed), 26.71 GC (27.10 elapsed) :ghc>>
../src/cafe/tools/iterTable 106M_text.txt +RTS -tstderr 50.44s user 1.50s system 99% cpu 52.064 total
ghc itself reports 38MB avg (can live with that,) and 140MB max (too much.)
Redirecting the program's output to a file will yield a mere 2.2M for the data
gathered by the above script. Since those 2.2M of data are all I care about, why
do I need so much more RAM to compute them?
Are my demands unreasonable?