
Ben wrote:
a haskell newbie here, searching for comments and wisdom on my code.
i had a project to try to implement "external sort" in haskell as a learning exercise. (external sort is sorting a list that is too large to fit in main memory, by sorting in chunks, spooling to disk, and then merging. more properly there probably should be multiple stages, but for simplicity i'm doing a one-stage external sort.)
i'd appreciate any testing, comments and suggestions from the haskell gods out there.
I'm not a god but I like it very much :) Especially because it's laziness in action.
blocks <- mapM (\x -> do {bs <- hGetContentsWithCursor h x; return $ Bin.decode bs}) idx
(Minuscule cosmetics: blocks <- mapM ((liftM Bin.decode) . hGetContentsWithCursor h) idx )
Merging chunks. K-way merge (and in fact external sort in general) is detailed in Knuth, where he recommends tournament trees. The easiest thing is to probably use one of Okasaki's heaps. I'll use splay heaps, because I don't know any better.
It would be better if I changed Ord for blocks to only check the first element.
kMerge :: (Ord a) => [[a]] -> [a] kMerge [] = [] kMerge l = let h = Splay.fromSeq l in kM (Splay.minElem h) (Splay.deleteMin h) where kM :: (Ord a) => [a] -> Splay.Heap [a] -> [a] kM l h | h == Splay.empty = l | otherwise = let next = Splay.minElem h (f, b) = span (\x -> x <= head next) l in f ++ (kM next (if null b then Splay.deleteMin h else (Splay.insert b $ Splay.deleteMin h)))
kMergeSort :: (Ord a) => [a] -> [a] kMergeSort l = kMerge $ blockify blocksize l
Oh, I would have expected a lazy mergesort here. Internally, this will work like a tournament heap. See also http://article.gmane.org/gmane.comp.lang.haskell.cafe/24180 Regards, apfelmus