
Am 17.12.2016 um 22:45 schrieb Sergey Vinokurov:
After trying out different heap profiling modes it seems to me that the problem is caused by the lazy empty and final fields (attached as reg.svg). These fields contain thunks which hold references to regexps and this quickly fills out the heap. However, making fields strict improves the memory usage a lot, from 1030 MB to 61 MB:
Many thanks Sergey, that's it! Two exclamation marks added and all is well. ben@sarun[1]: .../play/regexfp > ./re 500 +RTS -s True 96,115,528 bytes allocated in the heap 20,225,160 bytes copied during GC 154,376 bytes maximum residency (2 sample(s)) 37,368 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) ... Total time 0.061s ( 0.059s elapsed) Exactly as claimed in the paper. I wonder why the authors did not mention that these fields should be strict and how to (easily) achieve that (not even in a footnote). Neither did the reviewers catch this omission. Cheers Ben
On Sat, Dec 17, 2016 at 10:51 PM, Ben Franksen
wrote: In the functional pearl "A Play on Regular Expressions" by Sebastian Fischer, Frank Huch, and Thomas Wilke, the authors explain how to implement a Glushkov automaton in a very elegant way. They claim that when they compile their code with ghc-6.10.4 and with -O2 optimization they get:
""" 1 MB total memory in use ... Total time 0.06s (0.21s elapsed) """
for a program that matches stdin against the equivalent (in grep -E syntax) of
^(a?){500}a{500}$
When I try their code on my machine with ghc-7.10.3 (and also -O2) I get:
""" ben@sarun[1]: .../play/regexfp > ghc Reg.hs -O2 -o re [1 of 1] Compiling Main ( Reg.hs, Reg.o ) Linking re ... ben@sarun[1]: .../play/regexfp > ./re 500 +RTS -s True ... 155 MB total memory in use ... Total time 1.031s ( 1.037s elapsed) """ -- "Make it so they have to reboot after every typo." ― Scott Adams