
On Tue, 2014-06-24 at 22:02 +0530, C K Kashyap wrote:
If I use mapM as suggested by others, I quickly run into -
openFile: resource exhausted (Too many open files)
You should `seq` the calculated length before returning the value, otherwise the file descriptor needs to be kept open until the processing result is used (when printing the result list), and you exhaust your resources. Anyway, for giggles I added concurrency support using `async` and IO handling using `pipes`/`pipes-bytestring` to my fork in https://github.com/NicolasT/haskell-perf-repro/commit/86639daa3b577487e75d38... Using the sample dataset created by your Makefile, this version is somewhat slower than the previous (and your Perl script), but in case the dataset is huge (and depending on the block-device you're using, the layout of the files on it,... the lot) it might help (or might not). Nicolas