
On Thu, Mar 18, 2004 at 03:43:21PM +0100, Ketil Malde wrote:
Okay. What's really bothering me is that I can't find any good indication of what to do to get IO faster. Do I need to FFI the whole thing and have a C library give me large chunks? Or can I get by with hGet/PutArray? If so, what sizes should they be? Should I use memory mapped files?
I'm willing to put in some work, accept some kluges, and so on, but I can't really blindly try all possible combinations with my fingers crossed. Some people seem to manage to speed things up, but I can't seem to find anything *specific* anywhere.
E.g. when I posted a snippet to do readFile in somewhat larger chunks a while ago, I was hoping somebody would say, hey, that's just stupid, what you need to do instead is... or point me to TFM, but unfortunately only silence ensued, and left me a sadder but none the wiser man...
If your usage needs are similar to those of darcs, you could use my FastPackedString module (which isn't packaged separately, but wouldn't be too hard to separate out). It supports reasonably fast IO on Word8-based strings, supports reading and writing to gzipped files, reading (unchanging) files with mmap, etc. It supports breaking PackedStrings up without copying, i.e. tailPS is an efficient operation, as are the splitting and breaking operations. So as long as you stay in the world of PackedString, you're in good shape. If your data is in some binary format, you ought to be able to take the FastPackedString code and replace Word8 with some other data type, and still take advantage of the work I've done on getting fast IO (and also some other fast data manipulations, such as linesPS). -- David Roundy http://civet.berkeley.edu/droundy/