
Felipe Almeida Lessa wrote:
On 6/23/07, Bertram Felgenhauer
wrote: "rbwt" implements the corresponding inverse BWT. It's a fun knot tying exercise.
rbwt xs = let res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res) in tail . map snd . zip xs $ head res
Indeed it's very fun =). Although I must admit that, even after staring at the code for some time, I still don't see how it works. Is it too much to ask for a brief explanation?
I see that zipWith' creates all the lazy list without requiring any knowledge of the second list, which means that the list to be sorted is created nicely. But when sortBy needs to compare, say, the first with the second items, it forces the thunk of the lazy list. But that thunk depends on the sorted list itself!
What am I missing? ;)
Thanks in advance,
I just read and understood it. I'll walk through my version:
import Data.List
f `on` g = \x y -> (g x) `f` (g y)
zipWith' f [] _ = [] zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
bwt = map head . sortBy (compare `on` tail) . init . tails . ('\0':)
rbwt xs = tail $ zipWith' (flip const) xs $ head res where res = sortBy (compare `on` head) (zipWith' (:) xs res)
The key is that sort and sortBy are very very lazy. 'res' is a finite list of infinite strings. Consider sorting such a finite list of infinite lists without the recursive definition:
example = let a = [1,3..] b = [1,2..] c = [2,4..] d = [1,4..] sorted = sort [a,b,c,d] firstFiveOfEach = map (take 5) sorted in mapM_ print firstFiveOfEach
Which in ghci produces:
*Main> example [1,2,3,4,5] [1,3,5,7,9] [1,4,7,10,13] [2,4,6,8,10]
Now for a sidetrack:
example = let a = [1,3..] b = [1,2..] c = [2,4..] d = [1,4..] sorted = sort [a,b,c,d,c] firstFiveOfEach = map (take 5) sorted in mapM_ print firstFiveOfEach
Produces the first three elements before 'c' ....
*Main> example [1,2,3,4,5] [1,3,5,7,9] [1,4,7,10,13]
...and then hangs because "compare c c" never terminates. This lazy sorting is also why "head . sort $ foo" is an O(N) way to get the minimum of a list "foo" in Haskell. So the code for rbwt depends on this lazy sorting behavior. The recursive nature of the definition is the difficult to (directly) invent part. The result of (zipWith' (:) xs ___) is a list the same length as 'xs' and where each item is a list with the head item taken from 'xs'. So it has "primed the pump" of the recursive definition by giving the first Char in each of the strings. Compare with the much simpler 'ones' and 'odds': ones = 1 : ones odds = 1 : map (2+) odds What about the tails of all these lists? They come from res. And res comes from sorting the results of zipWith'. But the sorting is done only by looking at the head element, which is the one that comes from xs. That is the "sortBy (\(a:as) (b:bs) -> a `compare` b)" or "sortBy (compare `on` head)". So the sorting routine does not care about the ___ in (zipWith' (:) xs ___) at all, since it only examines the head which comes from xs.