Re: [Haskell-cafe] strict version of Haskell - does it exist?

29 Jan 2012

      Excerpts from Don Stewart's message of Sun Jan 29 22:55:08 +0100 2012:
...
Summary; -fstrict wouldn't magically make your code good.
No. you're right. I don't expect that to happen. I agree on it being
always the programmers fault using wrong tools or not knowing the tools
well enough to get a job done.
The PHP code looks like this:

    foreach(glob('*.txt') as $file){
      foreach(split(file_get_contents($file)) => $line){
        $parsed_line = json_decode($line);
        // get some unix timestamps, keep some hashes of seen clients
        // (cookie ids) and such
        // check how many minutes before a checkout the customer visited
        // the site - and whether he did so for a couple of days.
      }
    }

// print result

The files are about 300 MB in size. However memory usage grew and grew
and grew - I had to kill it or limit amount of files.
The PHP code runs in a couple of seconds (parsing json and loading
files).. the Haskell app took much longer. That PHP is fast is no
surprise: I expect json_decode and split to be implemented in C.

So yes - I used lazy lists. However 8GB of RAM should have been enough
to keep things in RAM. So maybe also the JSON parsing library kept too
many unevaluated things in memory. So I could start either writing my
own JSON parsing library (being more strict) or profile the application
many times - but I don't want to.
Ignoring the json parsing I also gave conduits a try - only counting
lines. I know its experimental - but from its description I concluded it
would prevent me being a stupid Haskell programmer from taking too much
memory even using bad Haskell code.
However it looked like splitting the lines only counting them (recognizing
utf-8 chars) took a lot longer than also doing the json parsing in PHP.
Then the conduit implementation looked funny: hGetLine is used to feed
a line each time ... (luckily - because the utf8-libraries don't provide
nice ways to parse incomplete chunks of utf-8 bytes such as returning
Either IncompleteMissingByte UTF8Chunk)..

Probably the split(,"\n") in PHP doesn't parse utf-8 chars - and luckily
it doesn't seem to matter because \n only uses one byte.

I know that I'm not an Haskell expert. I still got the impression that
getting nice performance would be a small challenge and require much
more time than I spend on the PHP version.

That's why I'm wondering why there is no -fstrict option for such simple
use cases because Haskell has so many optimizations other languages
dream about.

I mean lot's of "non lazy" language still get their jobs done. So it
always depends on the use case.

Isn't it easy to add a compiler flag to GHC adding those ! strictness annotations
everywhere possible? Then simple use case like this would not be a huge
challenge.

Maybe you're right: I should just prepare some dummy files and ask the
community to help. However optimizing the JSON parser and my code just
seemed to be too much effort ..

Marc Weber