Lessons from memory leak in streaming parser

First, let me thank all the people who responded to my issue, it was very helpful. I would have responded earlier but I was on a business trip and out of contact for the last week. On the bright side I used the time to re-work my implementation. Instead of relying on Haskell's lazy evaluation, elegant as it may be, I changed it to use a continuation passing style. Every invocation of the parser returns a few tokens and a parser for the rest of the input. Interestingly, this approach maps well to non-lazy languages - basically anything that supports closures - unlike the original method that relied on lazy evaluation. So in principle I can now convert the Haskell code to Scheme, Ruby, or even JavaScript. At any rate, once I have done that, things started to fall into place. I still had minor leaks, but the results of profiling were actually proving useful for a change. I needed to turn off lazy evaluation in several key places (almost all record fields are now strict, and a 'seq' was needed in one crucial location). The result is a parser that can handle anything you throw at it with constant (reasonably low) memory usage. It is dog-slow (about 8k/sec, or twice that if compiling with -O2) but I was expecting that. So, lesson learned - lazy evaluation in Haskell is a very nice feature, but extremely difficult to debug, profile and control. Using continuations may be less elegant at places, but is much more practical. You can see the final result in version 0.3 of the YamlReference package I just uploaded to the Cabal database. This is intended to be an "executable version" of the YAML specification (the BNF file, which is actually Haskell code, is almost exactly the list of spec BNF productions). Thanks again for all the help, Oren Ben-Kiki
participants (1)
-
Oren Ben-Kiki