Seemingly subtle change causes large performance variation

Hello,
I've been playing with the INTEST problem on SPOJ which demonstrates
the ability to write a program which processes large quantities of
input data. http://www.spoj.pl/problems/INTEST/
I came across some curious behavior while cleaning up the program.
The original program, which runs fast (enough), is:
module Main(main) where
import Control.Monad
import Data.Maybe
import qualified Data.ByteString.Char8 as B
divisibleBy :: Int -> Int -> Bool
a `divisibleBy` n = a `rem` n == 0
main :: IO ()
main = do
[n,k] <- (map int . B.split ' ') `fmap` B.getLine :: IO [Int]
let
doLine :: Int -> Int -> IO Int
doLine r _ = B.getLine >>= testDiv r
testDiv r l
| int l `divisibleBy` k = return (r + 1)
| otherwise = return r
foldM doLine 0 [1..n] >>= print
where
int :: B.ByteString -> Int
int = fst . fromJust . B.readInt
But when I make a slight modification, the program chews up a ton more memory
and takes more time:
module Main(main) where
import Control.Monad
import Data.Maybe
import qualified Data.ByteString.Char8 as B
divisibleBy :: Int -> Int -> Bool
a `divisibleBy` n = a `rem` n == 0
main :: IO ()
main = do
[n,k] <- (map int . B.split ' ') `fmap` B.getLine :: IO [Int]
let
doLine :: Int -> Int -> IO Int
doLine r _ = B.getLine >>= return . testDiv r
-- 'return' moved here ^^
testDiv r l
| int l `divisibleBy` k = r + 1
| otherwise = r
foldM doLine 0 [1..n] >>= print
where
int :: B.ByteString -> Int
int = fst . fromJust . B.readInt
This program will generate sample data:
import System.Random
import System.Environment
import Control.Monad
main = do
[n] <- map read `fmap` getArgs :: IO [Int]
k <- randomRIO (1, 100)
putStrLn $ unwords [show n, show k]
replicateM_ n $ randomRIO (1, 10^9) >>= print
Note that the same behavior occurs even if I manually inline the local
function and try: return (if .. then .. else).
Some sample runs:
$ ghc/compiler/ghc-inplace
ghc-6.7.20070601: no input files
$ ghc/compiler/ghc-inplace --make -O2 intest.hs
[1 of 1] Compiling Main ( intest.hs, intest.o )
Linking intest ...
$ ghc/compiler/ghc-inplace --make -O2 intest_slow.hs
[1 of 1] Compiling Main ( intest_slow.hs, intest_slow.o )
Linking intest_slow ...
$ time ./intest +RTS -tstderr -RTS < test1
./intest +RTS -tstderr
8830
<

Matthew Danish
I've been playing with the INTEST problem on SPOJ which demonstrates the ability to write a program which processes large quantities of input data. http://www.spoj.pl/problems/INTEST/
I don't know if anyone replied to this already, so here is my attempt to explain the space leak.
doLine :: Int -> Int -> IO Int doLine r _ = B.getLine >>= testDiv r testDiv r l | int l `divisibleBy` k = return (r + 1) | otherwise = return r
But when I make a slight modification, the program chews up a ton more memory and takes more time:
doLine :: Int -> Int -> IO Int doLine r _ = B.getLine >>= return . testDiv r -- 'return' moved here ^^ testDiv r l | int l `divisibleBy` k = r + 1 | otherwise = r
In the first version, the strictness of the IO monad ensures that 'testDiv' must be evaluated, at least sufficiently to find the 'return' monadic action inside it. In particular, this forces the evaluation of the guard, and therefore the call of `divisibleBy`. Whereas in the latter version, the 'return' is wrapped _outside_ the call to 'testDiv', so the monadic action is found immediately, whilst the value being returned in in the monad is still lazily calculated. Thus, a collection of 'testDiv r' applications builds up until the Int values are actually used, at which point they are reduced. Regards, Malcolm
participants (2)
-
Malcolm Wallace
-
Matthew Danish