
apfelmus wrote:
Andrew Coppin wrote:
Yeah, I noticed that the output from by program can never actually be reverted to its original form.
Well it can, but that's a different story told in
Richard S. Bird and Shin-Cheng Mu. Inverting the Burrows-Wheeler transform. http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html #DBLP:journals/jfp/BirdM04
Oh, and you had a function inv_bwt, right?
Implementation #1 had an inverse - but it works by finding an unused character and prepending that to the input before doing the forward BWT. ;-) For the other implementations, there is no inverse at all.
I was trying to avoid O(n^2) RAM usage. :-}
Note that for ByteStrings, this takes only O(n) RAM because the substrings are shared. But for lists, this would take O(n^2) RAM since (take n) cannot share hole sublists.
Presumably that's only true for *strict* ByteStrings? (I still don't actually understand how lazy bytestrings are any different to normal lists - or why they produced such a massive performance increase...)
An O(n) choice for lists that doesn't recalculate ++ all the time would be
bwt :: Ord a => [a] -> [a] bwt xs = map last . sortBy (compare `on` (take n)) $ rotations where n = length xs rotations = take n . tails $ xs ++ xs
with the well-known
on :: (a -> a -> c) -> (b -> a) -> (b -> b -> c) on g f x y = g (f x) (f y)
Interesting... But how does it perform? ;-)