Re: [Haskell-beginners] Data.Binary.Get for large files

30 Apr 2010

      Am Freitag 30 April 2010 23:06:07 schrieb Philip Scott:
...
Hi Daniel
...
Replace getFloat64le with e.g. getWord64le to confirm.
The reading of IEEE754 floating point numbers seems rather
complicated. Maybe doing it differently could speed it up, maybe not.
That speeds things up by a factor of about 100 :)
Yes, I too.
...
I think there must be some efficiency to be extracted from there
somewhere.. Either the IEEE module
Look at the code. It does a lot of hard work. That is probably necessary to 
ensure correctness, but it's sloow.

If you feel like playing with fire,

{-# LANGUAGE BangPatterns, MagicHash #-}
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Get
import GHC.Prim
import Data.Word

getFloat64le :: Get Double
getFloat64le = fmap unsafeCoerce# getWord64le

myGetter !acc = do
     e <- isEmpty
     if e then
             return acc
         else do
             !t <- getFloat64le
             myGetter ((t+acc)/2)

may work on your system (no warranties, you know what 'unsafe' means, don't 
you?).
...
or the Data.Binary.Get.
Considering that it's quick enough getting Word64, you won't get much 
improvement there.
...
Is it possible to get the profiler to look deeper than the top level
module?
Lots of {-# SCC "foo" #-} pragmas. Or create a local copy of the module and 
import that, then -prof -auto-all should give more info.
...
With all the options I could find, it only ever tells me about
things in the file I am dealing with..Hm, 200MB file => ~25 million
Doubles, such a list needs at least 400MB.
...
Still a long way to 2GB. I suspect you construct a list of thunks, not
Doubles.
I think you are almost certainly right. Is there an easy way to see
if/how/where this is happening?
Read the core, profile with all -hx flags and look at the profiles, show 
the code to more experienced Haskellers.
...
Thanks once again,
Philip

Re: [Haskell-beginners] Data.Binary.Get for large files

Daniel Fischer