Elegant external sorting

Hello haskell-cafe, SPJ asked us a few years ago about examples of simple and elegant haskell programs. i want to propose this external sorting program: it reads stdin in chunks of 5000 lines, writes sorted chunks into files and then merges files together: import Data.List import Control.Monad import Control.Arrow main = do -- (Lazily) create sorted sublists sublists <- fmap chop_and_sort getContents -- Write the sublists into files named 1..n n <- fmap length $ zipWithM writeFile (map show [1..]) sublists -- Read (again lazily) contents of these files sublists <- mapM (fmap lines.readFile.show) [1..n] -- Merge them and write result to stdout putStr (unlines (merge_lists compare sublists)) -- Split input list into 5000-line chunks and sort them chop_and_sort = lines >>> recursive (splitAt 5000) >>> map (sort>>>unlines) -- unfoldr variant that stops on empty list recursive splitFunc = unfoldr (\xs -> guard (xs>[]) >> Just (splitFunc xs)) -- Merging sorted lists (kidnapped from Data.List) merge_lists :: (a -> a -> Ordering) -> [[a]] -> [a] merge_lists cmp [] = [] merge_lists cmp [xs] = xs merge_lists cmp xss = merge_lists cmp (merge_pairs cmp xss) merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]] merge_pairs cmp [] = [] merge_pairs cmp [xs] = [xs] merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a] merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = case x `cmp` y of GT -> y : merge cmp (x:xs) ys _ -> x : merge cmp xs (y:ys) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (1)
-
Bulat Ziganshin