
Graham Klyne
main = do file <- getContents putStrLn $ show (length $ lines file) ++ " " ++ show (length $ words file) ++ " " ++ show (length file)
Space-leak or what?
I can see that this requires the original file to be kept for 3-time scanning, so enough memory for the entire file will be required. Is that *the* problem to which you allude? I can't see any other problem here.
Yes, it is the main problem. Don't forget, the shootout benchmark runs this example over a very large input (15Mb). Since the character-list stored in memory for this file takes approximately 12 bytes per character, that blows up to about 180Mb to store temporarily. The shootout performance figures reckon that ghc actually uses 223Mb in total.
And why would this put Haskell at a disadvantage?
Large live heap space means a large time spent in GC, trying to find the needle that is actual garbage in the haystack of live pointers. It also increases the likelihood of cache misses and all kinds of other bad memory effects. In other words, wasting space is wasting time. There is a good literature on heap profiling in Haskell which demonstrates the benefits of keeping space usage small to improve time performance. In any case, for the shootout, this is patently a different algorithm to the one every other solution uses. All the other languages do a simple one-pass traversal of the file, in constant memory space. Why artificially handicap Haskell by insisting it do the job naively? Regards, Malcolm