
On Fri, May 12, 2006 at 04:07:47PM +1000, Donald Bruce Stewart wrote:
The theory is that we'd be able to efficiently process large data, where both ByteString and [Char] fails, and open a new range of applications that could be handled successfully with Haskell.
My large data files are already divided into reasonably sized chunks and I think this approach is quite widespread - at least Google also processes much of their data in chunks. To process my data with Haskell, I would have to be able to decode it into records with efficiency close to what I achieve in C++ (say, at least 80% as fast). Until now I managed to get 33% of it in reasonably pure Haskell, which is not that bad IMO. However, I feel that I've hit a barrier now and will have to use raw Ptr's or FFI. Maybe I could try pushing it through the haskell-cafe optimisation ;-) Anyway, the point is that large data tends to be divided into smaller chunks not only because it's impossible to load the whole file into memory, but also to allow random access, to help distributing the computation over many computers, etc. So, I am not sure Haskell would gain that much by being able to process terabytes of data in one go. On the other hand, this is quite cool and I am probably wrong, being concentrated on my needs. Best regards Tomasz