
Excerpts from Don Stewart's message of Sun Jan 29 22:55:08 +0100 2012:
Summary; -fstrict wouldn't magically make your code good. No. you're right. I don't expect that to happen. I agree on it being always the programmers fault using wrong tools or not knowing the tools well enough to get a job done.
The PHP code looks like this: foreach(glob('*.txt') as $file){ foreach(split(file_get_contents($file)) => $line){ $parsed_line = json_decode($line); // get some unix timestamps, keep some hashes of seen clients // (cookie ids) and such // check how many minutes before a checkout the customer visited // the site - and whether he did so for a couple of days. } } // print result The files are about 300 MB in size. However memory usage grew and grew and grew - I had to kill it or limit amount of files. The PHP code runs in a couple of seconds (parsing json and loading files).. the Haskell app took much longer. That PHP is fast is no surprise: I expect json_decode and split to be implemented in C. So yes - I used lazy lists. However 8GB of RAM should have been enough to keep things in RAM. So maybe also the JSON parsing library kept too many unevaluated things in memory. So I could start either writing my own JSON parsing library (being more strict) or profile the application many times - but I don't want to. Ignoring the json parsing I also gave conduits a try - only counting lines. I know its experimental - but from its description I concluded it would prevent me being a stupid Haskell programmer from taking too much memory even using bad Haskell code. However it looked like splitting the lines only counting them (recognizing utf-8 chars) took a lot longer than also doing the json parsing in PHP. Then the conduit implementation looked funny: hGetLine is used to feed a line each time ... (luckily - because the utf8-libraries don't provide nice ways to parse incomplete chunks of utf-8 bytes such as returning Either IncompleteMissingByte UTF8Chunk).. Probably the split(,"\n") in PHP doesn't parse utf-8 chars - and luckily it doesn't seem to matter because \n only uses one byte. I know that I'm not an Haskell expert. I still got the impression that getting nice performance would be a small challenge and require much more time than I spend on the PHP version. That's why I'm wondering why there is no -fstrict option for such simple use cases because Haskell has so many optimizations other languages dream about. I mean lot's of "non lazy" language still get their jobs done. So it always depends on the use case. Isn't it easy to add a compiler flag to GHC adding those ! strictness annotations everywhere possible? Then simple use case like this would not be a huge challenge. Maybe you're right: I should just prepare some dummy files and ask the community to help. However optimizing the JSON parser and my code just seemed to be too much effort .. Marc Weber