
On 6/20/07, apfelmus
Eh, why not a simple mergesort that also deletes duplicates?
I had to sit down and think about this, and while for the simple case that I showed, your equivalent code is definitely simpler, and probably more efficient. The actual case that I'm dealing with, where I believe Data.Map (or similar, incl finger trees) has a benefit is one in which it's not simply a case of lists of items, yielding a list of items. I'm manipulating an on-disk inverted index, so rather than a simple list of items, the code is actually monadic, doing IO to retrieve the items off disk, and the cost of creating the intermediate lists is unwearable. The key problem is that you loose the laziness because of the IO monad, so if you're not careful, you end up trying to store the complete intermediate lists. If you can assume you can hold enough stuff in memory, then nice elegant lazy algorithms work beautifully. I'm doing external algorithms which have to work on Gbs of data, which I can't hold in memory. So the type signature of my merge is approximately: type Reader t = IO (Maybe t) type Writer t = t -> IO () merge :: [Reader t] -> Writer t -> IO () Actually, the scope of the problem is such that I could *almost* finesse the problem by reading the globs of data as ByteStrings, then use lazy evaluation internally, and write the outputs merge blobLocators writer = do lists <- fmap decode $ mapM readOffDisk blobLocators mapM_ writer (mergesort lists) but I need to keep a firm lid on the resource usage, so I can't. <sigh> cheers, T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.