
In the functional pearl "A Play on Regular Expressions" by Sebastian Fischer, Frank Huch, and Thomas Wilke, the authors explain how to implement a Glushkov automaton in a very elegant way. They claim that when they compile their code with ghc-6.10.4 and with -O2 optimization they get: """ 1 MB total memory in use ... Total time 0.06s (0.21s elapsed) """ for a program that matches stdin against the equivalent (in grep -E syntax) of ^(a?){500}a{500}$ When I try their code on my machine with ghc-7.10.3 (and also -O2) I get: """ ben@sarun[1]: .../play/regexfp > ghc Reg.hs -O2 -o re [1 of 1] Compiling Main ( Reg.hs, Reg.o ) Linking re ... ben@sarun[1]: .../play/regexfp > ./re 500 +RTS -s True ... 155 MB total memory in use ... Total time 1.031s ( 1.037s elapsed) """ The details of the RTS report furthermore tell me that GC is responsible for more than 80% of the runtime. Now, I certainly have a different machine and perhaps OS than the authors of the paper, but that can't explain a factor of 16 in runtime and of 150 in memory use. I have double and triple checked my code for possible mistakes. I /do/ get the expected results (same as in the paper). I tried different numbers and found n total memory in use / MB 100 8 200 28 500 169 1000 698 which shows that memory consumtion is roughly quadratic in n. I tried heap profiling but that tells me only that all the allocation is done by the function 'shift', which I already knew (or rather guessed). I also tried several compiler versions: 7.0.4, 7.10.3, 7.4.2, 7.6.3, 8.0.1. They all result in similar performance. The ghc version the authors used is 6.10.4 and I don't have a package for that compatible to my distribution. Any hint as to where these differences might come from are appreciated. I have attached the program for reference. Cheers Ben -- "Make it so they have to reboot after every typo." ― Scott Adams