
Andrew Coppin wrote:
apfelmus wrote:
Note that the one usually adds an "end of string" character $ in the Burrows-Wheeler transform for compression such that sorting rotated strings becomes sorting suffices.
Yeah, I noticed that the output from by program can never actually be reverted to its original form. ;-) Maybe it I alter the code to stick a 0-byte in there or something...
To recover the original string you either need to store the offset to the first character or add a sentinel character to mark the string end. One advantage of the sentinel character approach is that it's sufficient to sort just the suffixes of the text. A disadvantage is that you need an additional character. The code below reserves \0 for this purpose.
Code: >>>
"bwt" implements a variation of the Burrows-Wheeler transform, using \0 as a sentinel character for simplicity. The sentinel has to be smaller than all other characters in the string.
bwt xs = let suffixes = [(a,as) | a:as <- tails ('\0':xs)] in map fst . sortBy (\(_,a) (_,b) -> a `compare` b) $ suffixes
"rbwt" implements the corresponding inverse BWT. It's a fun knot tying exercise.
rbwt xs = let res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res) in tail . map snd . zip xs $ head res
"zipWith'" is a variant of zipWith that asserts that the third argument has the same shape as the second one.
zipWith' f [] _ = [] zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys
<<< End Code <<< Both transforms use linear memory (but quite a lot, admittedly). regards, Bertram