On Tue, 2014-06-24 at 22:02 +0530, C K Kashyap wrote:You should `seq` the calculated length before returning the value,
> If I use mapM as suggested by others, I quickly run into -
>
> openFile: resource exhausted (Too many open files)
otherwise the file descriptor needs to be kept open until the processing
result is used (when printing the result list), and you exhaust your
resources.
Anyway, for giggles I added concurrency support using `async` and IO
handling using `pipes`/`pipes-bytestring` to my fork in
https://github.com/NicolasT/haskell-perf-repro/commit/86639daa3b577487e75d385ea825aa58e3c8713b
Using the sample dataset created by your Makefile, this version is
somewhat slower than the previous (and your Perl script), but in case
the dataset is huge (and depending on the block-device you're using, the
layout of the files on it,... the lot) it might help (or might not).
Nicolas