
On Thu, Feb 14, 2008 at 04:56:39AM -0800, Grzegorz Chrupala wrote:
I have a very simple program which reads a Data.Map from a file using Data.Binary and Data.ByteString.Lazy, which gives stack overflow with files over a certain size. Any ideas of what might be causing it?
You can try it with the small file (11M) at: http://computing.dcu.ie/~gchrupala/map.bin
import System import Data.Map import qualified Data.ByteString.Lazy as BS import Data.Binary main = do [path] <- getArgs m <- fmap decode (BS.readFile path)::IO (Map (Int,Maybe String) Int) putStrLn . show . findMax $ m
Since no-one else has answered, I'll take a stab. Obiously, you have a stack leak due to laziness somewhere -- now Data.Map is strict in it's keys, but lazy in it's values, and Data.Binary is lazy so my suspicion would be that you're building up lazy decode thunks for the value elements on the stack as the map is being constructed. If either decode or the Map construction were strict, you'd probably be OK, but the combination is blowing the stack. (Data.Binary serialises Maps by writing them out to a list of key, value pairs in ascending key order. The Map can be reconstructed from this list in O(N) time.) In your specific example, you can replace the type for the result of the decoding with IO [(Int,Maybe String),Int)] and evaluate the right hand sides (with seq if necessary) before constructing the map from the list, which will avoid the stack blowout. In fact, a little experimentation has revealed that this: do [path] <- getArgs m <- liftM decode (BS.readFile path)::IO [((Int, Maybe String), Int)] putStrLn . show . findMax . fromAscList $ m will work just fine. No extra evaluation needed at all! I'll leave it to the Haskell gurus to explain what's going on... cheers, Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt