On Tue, Jun 24, 2014 at 10:35 PM, Nicolas Trangez <nicolas@incubaid.com> wrote:

On Tue, 2014-06-24 at 22:02 +0530, C K Kashyap wrote:
> If I use mapM as suggested by others, I quickly run into -
>
> openFile: resource exhausted (Too many open files)

You should `seq` the calculated length before returning the value,
otherwise the file descriptor needs to be kept open until the processing
result is used (when printing the result list), and you exhaust your
resources.

Anyway, for giggles I added concurrency support using `async` and IO
handling using `pipes`/`pipes-bytestring` to my fork in
https://github.com/NicolasT/haskell-perf-repro/commit/86639daa3b577487e75d385ea825aa58e3c8713b
Using the sample dataset created by your Makefile, this version is
somewhat slower than the previous (and your Perl script), but in case
the dataset is huge (and depending on the block-device you're using, the
layout of the files on it,... the lot) it might help (or might not).

Nicolas