
Arnoldo Muller
I am trying to use haskell in the analysis of bio data. One of the main reasons I wanted to use haskell is because lazy I/O allows you to see a large bio-sequence as if it was a string in memory.
Funny you should mention it. I've written a bioinformatics library¹ that (naturally) supports reading and writing various file formats for sequences and alignments and stuff. Some of these files can be substantial in size (i.e., larger than my laptop's memory), so most IO of potentially large files (Fasta, BLAST XMl output, 454 SFF files...) are read lazily, and large Fasta sequences are read as lazy bytestrings. This works nicely for a lot of use cases (well, my use cases, at any rate, wich quite often boils down to streaming through the data). One thing to look out for is O(n) indexed access to lazy bytestrings, so there's a defragment operation that converts a sequence to a single chunk (which gives O(1) access, but of course must fit into memory). I guess the most annoying thing about laziness is that small test cases always work, you need Real Data to stress test your programs for excessive memory use. Lazy IO always worked well for me, so althouhg I feel I should look more deeply into "real" solutions, like Iteratee, my half-hearted attemts to do so have only resulted in the conclusion that it was more complicated, and thus postponed for some rainy day... lazy IO for lazy programmers, I guess. -k ¹ Stuff's on Hackage in the bioinformatics section and also on http://blog.malde.org and http//malde.org/~ketil/bioinformatics. -- If I haven't seen further, it is by standing in the footprints of giants