[Haskell-cafe] Friday Afternoon Play: The Burrows Wheeler Transformation

17 Nov 2006

      So, it's Friday afternoon (well, here anyway), and
presumably everyone is thinking of leaving for the
weekend. Is anyone interested in playing a little game?  I'm
going to make the first move, and then just wait and see if
anyone makes the next. I'm certainly not promising to make
any further moves myself, but I'm not promising I won't,
either.

Below you'll see a naïve implementation of the Burrows
Wheeler transformation and its inverse.  That's my
move. (And don't complain too much about style and such --
I'm half asleep.)  Further moves are made by transforming
the programme so that it becomes more efficient.  The first
target is not raw speed, but to adjust it until the
computational complexity gets to be what it should be.  I'm
hoping for elegant moves here.  After that, folk might start
replacing the lists with more efficient datastructures and
so on, but still /elegantly/.

The object of the game isn't ultimate speed -- I've no
particular wish to see a Haskell version that's as fast as a
good C version, especially not if it ends up /looking/ like
a C version.  Really the objective is to think about Haskell
programming -- and probably come up with some observations
such as "there really ought to be a class of indexable
structures, so that we don't have to change (!!) to (!) all
over the place". And to have fun, anyhow...

* * *

Demonstration of the principle of the Burrows Wheeler
transform. The implementations are simplistic (and have
worse complexity than is possible).
...
module Burrows_Wheeler_Transform where
import List
This is surely the wrong module to have to find (><) from:
...
import Data.Graph.Inductive.Query.Monad((><))
The forward transformation. Input a list, output the
transformed list and possibly an int (Nothing if there were
no elements in the input -- perhaps the result type should
be Maybe (NonEmptyList t,Int))
...
slow_bwt:: Ord t => [t] -> ([t], Maybe Int)
This version works by computing a list of all the rotations
of the input and sorting it. The transformation is then the
last element of each line of the sorted rotations. Tag the
data with an index so that it's possible to find where the
original form of the unput ends up after sorting. This is
essentially the specification of the transform coded up
directly.
...
slow_bwt l
    = (map fst tagged_bwt, index_of_original)
      where tagged_bwt = map (last>
that looks like worst case n²×log n to me (where n is the
length of the string): if every element is the same, the
n×log n comparisons the sort does will each look at every
element, and that's really more than is necessary.

The inverse transform. We should have that 

@ 
slow_inv_bwt . slow_bwt == id
@

Observe that sorting the input gives the first column of the
sorted rotations (and the input is the last). So after one
sort we have

X    ...   c
Y    ...   b
Z    ...   a
.     .    .
.     .    .
.     .    .

Since they are rotations, rotating each row of this gives
pairs that belong to the original list:

cX   ...
by   ...
aZ   ...
.     .
.     .
.     .

sorting these gives us the first two columns of the sorted
rotations, and again we already know the last column, which
can be rotated into the first and so on.
...
slow_inv_bwt ([],_) = []
slow_inv_bwt (l,Just index_of_original)
    = iterate catsort (replicate len [])
After (length input) iterations we have the original sorted
rotations,
...
!! len
and we have the index of where the untransformed string is,
so we can just pick it back out.
...
!! index_of_original

...
where catsort s = sort $ map (uncurry (:)) $ l `zip`s
            len = length l
This version does rather fewer sorts:
...
mark2_inv_bwt ([],_) = []
mark2_inv_bwt (l,Just n)
    = (transpose $ take (length l) $ iterate (permuteWith perm) l)!!(perm!!n)
      where perm = map snd $ sort $ l `zip` [0..]
Should I simply have left it out and hoped that it would
turn up after a few moves? I'm not going to explain it,
anyway.
...
rotations l = init $ zipWith (++) (tails l) (inits l)
...
permuteWith ns l = map (l!!) ns
-- 
Jón Fairbairn                                 Jon.Fairbairn@cl.cam.ac.uk