
John Lato wrote:
I'm working with moderate-sized files (tens to hundreds of MBs) that have some ascii header data followed by a bunch of 32-bit ints.
but I don't know if [Int32] is actually the best choice. It seems to me that something like a lazy list of strict arrays (analogous to a lazy bytestring) would be better.
Depends on your data access pattern. If you access the words strictly linearly, from the beginning of the file to the end, and that's all, then [Int32] is absolutely fine. A list is a data-structure equivalent of a for loop; it's the correct structure if you are dealing with things linearly or nearly-linearly. If you were using adjacent words together, that would be fine too (as in, e.g., zip xs (tail xs)). If your data access pattern is more scattered or random-access in style, then [Int32] does not scale well to 10s of MBs. If you keep the data around, the overhead for [] is inappropriate (around 600-800% memory usage overhead on [Int32]) and its performance guarantees are not good either, for random access. In this case, as a first approximation, I would be inclined to try a library which simple backended onto lazy bytestring. For example the 'index' operation to fetch a single word would fetch four bytes and bit-twiddle them into a word. If that doesn't give the high speed you're after, then perhaps something *like* LBS, i.e. foreignptr behind the scenes, but directly accessing word-at-a-time. Jules