Re: [Haskell-cafe] newbie question about list performance

29 Oct 2007

      John Lato wrote:
...
I'm working
with moderate-sized files (tens to hundreds of MBs) that have some
ascii header data followed by a bunch of 32-bit ints.
...
but I don't know if [Int32] is actually the best choice.  It seems to me
that something like a lazy list of strict arrays (analogous to a lazy
bytestring) would be better.
Depends on your data access pattern. If you access the words strictly 
linearly, from the beginning of the file to the end, and that's all, 
then [Int32] is absolutely fine. A list is a data-structure equivalent 
of a for loop; it's the correct structure if you are dealing with things 
linearly or nearly-linearly. If you were using adjacent words together, 
that would be fine too (as in, e.g., zip xs (tail xs)).

If your data access pattern is more scattered or random-access in style, 
then [Int32] does not scale well to 10s of MBs. If you keep the data 
around, the overhead for [] is inappropriate (around 600-800% memory 
usage overhead on [Int32]) and its performance guarantees are not good 
either, for random access. In this case, as a first approximation, I 
would be inclined to try a library which simple backended onto lazy 
bytestring. For example the 'index' operation to fetch a single word 
would fetch four bytes and bit-twiddle them into a word. If that doesn't 
give the high speed you're after, then perhaps something *like* LBS, 
i.e. foreignptr behind the scenes, but directly accessing word-at-a-time.

Jules

Re: [Haskell-cafe] newbie question about list performance

Jules Bean