
Simon Peyton-Jones wrote:
A String is a [Char] and a Char is a heap object. So a file represented as a string takes a massive 20 bytes/char (12 for the cons cell, 8 for the Char cell). Then it's all sucked through several functions.
It's entirely possible, though, that the biggest performance hit is in the I/O itself. We'd be happy if anyone wanted to invesigate and improve.
Unless ghc is extremely fast at filling a heap, it's the memory allocation. I get 11.8 seconds for ghc with a standard heap and 7.3 seconds when I give it enough heap not to do garbage collection. Since this is 200M, I don't think there is much time to do anything else. The input is 2000000 bytes. So, this would be 40M worth of [Char] data, I guess lines and unlines make ehm 12*2*2=48M, so that's about 100M total. I guess the other 100M is used for function applications. Jan