Data.Binary.Get for large files

29 Apr 2010

      Hello again folks,

Sorry to keep troubling you - I'm very appreciative of the help you've 
given so far. I've got one more for you that has got me totally stumped. 
I'm writing a program which deals with largish-files, the one I am using 
as a test case is not stupidly large at about 200mb. After three 
evenings, I have finally gotten rid of all the stack overflows, but I am 
unfortunately left with something that is rather unfeasably slow. I was 
hoping someone with some keener skills than I could take a look, I've 
tried to distill it to the simplest case.

This program just reads in a file, interpreting each value as a double, 
and does a sort of running average on them. The actual function doesn't 
matter too much, I think it is the reading it in that is the problem. 
Here's the code:

import Control.Exception
import qualified Data.ByteString.Lazy as BL
import Data.Binary.Get
import System.IO
import Data.Binary.IEEE754

myGetter acc = do
     e <- isEmpty
     if e == True
         then
             return acc
         else do
             t <- getFloat64le
             myGetter $! ((t+acc)/2)

myReader file = do
     h <- openBinaryFile file ReadMode
     bs <- BL.hGetContents h
     return $ runGet (myGetter 0)  bs

main = do
     d <- myReader "data.bin"
     evaluate d

This takes about three minutes to run on my (fairly modern) laptop.. The 
equivilant C program takes about 5 seconds.

I'm sure I am doing something daft, but I can't for the life of me see 
what. Any hints about how to get the profiler to show me useful stuff 
would be much appreciated!

All the best,

Philip

PS: If, instead of computing a single value I try and build a list of 
the values, the program ends up using over 2gb of memory to read a 200mb 
file.. any ideas on that one?

Philip Scott

MAN

Daniel Fischer

Daniel Fischer

Philip Scott

Kyle Murphy

Daniel Fischer

tags

participants (4)