
Malcolm Wallace wrote:
Graham Klyne
writes: I can see that this requires the original file to be kept for 3-time scanning, so enough memory for the entire file will be required. Is that *the* problem to which you allude? I can't see any other problem here.
Yes, it is the main problem. <snip> In any case, for the shootout, this is patently a different algorithm to the one every other solution uses. All the other languages do a simple one-pass traversal of the file, in constant memory space. Why artificially handicap Haskell by insisting it do the job naively?
Just for the heck of it, I'd thought I'd try to write a naive 1-pass version of the program. It turned out to be 4.5 times slower than the original... -- compiled with: ghc -O2 -ddump-simpl -fvia-c -o wc_fold wc_fold.hs import IO main = do file <- getContents putStrLn (show (foldl wc (0,0,0) file)) wc :: (Int,Int,Int) -> Char -> (Int, Int, Int) wc (l,w,c) '\n' = (l+1,w ,c+1) wc (l,w,c) ' ' = (l ,w+1,c+1) wc (l,w,c) x = (l ,w ,c+1) The algorithm isn't correct (it counts spaces instead of words), but anyone have advice for improving its performance? Is the problem caused by the laziness of the Int's inside the tuple? Here is the report from ghc with the '-ddump-simpl' option. Main.wc :: (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int) -> GHC.Base.Char -> (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int) [GlobalId] Arity 2 __P Main.$wwc NoCafRefs Str: DmdType U(LLL)U(L)m Main.wc = __inline_me (\ w :: (GHC.Base.Int, GHC.Base.Int, GHC.Base.Int) w1 :: GHC.Base.Char -> case w of w2 { (ww, ww1, ww2) -> case w1 of w3 { GHC.Base.C# ww3 -> case Main.$wwc ww ww1 ww2 ww3 of ww4 { (# ww5, ww6, ww7 #) -> (ww5, ww6, ww7) } } }) Rec { Main.$wlgo :: GHC.Base.Int -> GHC.Base.Int -> GHC.Base.Int -> [GHC.Base.Char] -> (# GHC.Base.Int, GHC.Base.Int, GHC.Base.Int #) And here is the memory allocation report generated from passing '+RTS -M900m -k250m -sstderr' to the executable... 875,150,472 bytes allocated in the heap 149,008,464 bytes copied during GC 250,845,060 bytes maximum residency (2 sample(s)) 478 collections in generation 0 ( 10.44s) 2 collections in generation 1 ( 0.00s) 813 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 0.93s ( 1.03s elapsed) GC time 10.44s ( 11.38s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 11.37s ( 12.41s elapsed) %GC time 91.8% (91.7% elapsed) Alloc rate 941,022,012 bytes per MUT second Productivity 8.2% of total user, 7.5% of total elapsed Finally, here's the profiling report (-pT)... Wed Sep 29 15:21 2004 Time and Allocation Profiling Report (Final) wc_fold +RTS -M800m -k250m -pT -RTS total time = 1.22 secs (61 ticks @ 20 ms) total alloc = 173,502,056 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc wc Main 60.7 64.8 main Main 39.3 35.2 individual inherited COST CE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 1 0 0.0 0.0 100.0 100.0 main Main 146 1 39.3 35.2 100.0 100.0 wc Main 147 3048000 60.7 64.8 60.7 64.8 CAF GHC.Handle 103 3 0.0 0.0 0.0 0.0 CAF System.Posix.Internals 81 4 0.0 0.0 0.0 0.0 Any hints appreciated, Greg Buchholz