None brute-force BWT algorithm

23 Jun 2011

      Hi,

I read a previous thread about BWT implementation in Haskell:
http://www.mail-archive.com/haskell-cafe@haskell.org/msg25609.html
and
http://sambangu.blogspot.com/2007/01/burrows-wheeler-transform-in-haskell

They are all in a `brute-force' way, that is implement based on 
Burrows-Wheeler's definition like below:

BWT: sort the rotations of input S to get a matrix M', return the last 
column L, and the row I, where S appears in M'

-- O( n^2 \lg n)
bwt :: (Ord a)=>[a]->([a], Int)
bwt s = (map last m, i) where
    m = sort [ drop i s ++ take i s | i<-[1..length s]]
    (Just i) = elemIndex s m

And the IBWT: Re-construct M' by iteratively sort on input L, add one column 
at a time, and pick the I-th row in M'

-- O( n^3 \lg n )
ibwt :: (Ord a)=> ([a], Int) -> [a]
ibwt (r, i) = m !! i where
    m = iterate f (replicate n []) !! n
    f = sort . zipWith (:) r
    n = length r

I read Richard Bird's book, `Pearls of functional algorithm design', there 
is another solution. Although it is deduced step by step,
the core idea is based on random access the list by index. The algorithm 
mentioned in the book uses suffixes for 
sorting instead of rotations. The performance are same in terms of big-O. I 
wrote the following program accordingly.

BWT: sort S along with the index to get a new order of IDs, and return a 
permutation of S based on IDs.

-- O( n^2 \lg n) if (!!) takes O(n) time
bwt' :: (Ord a)=> [a] -> ([a], Int)
bwt' s =  (l, i) where
    l = map (\i->s !! ((i-1) `mod` n)) ids
    (Just i) = elemIndex 0 ids
    ids = map snd $ sortBy (compare `on` fst) $ zip rots [0..]
    rots = init $ zipWith (++) (tails s) (inits s) -- only non-empties
    n = length s

IBWT: Sort the input L along with index to get a Transform vector, T [1], 
then permute L iteratively on T start from row I. 

-- O( n^2 ) if (!!) takes O(n) time
ibwt' :: (Ord a) => ([a], Int) -> [a]
ibwt' (r, i) =  fst $ iterate f ([], i) !! n where
    t = map snd $ sortBy (compare `on` fst) $ zip r [0..]
    f (s, j) = let j' = t !! j in (s++[r !! j'], j')
    n = length r

However, As I commented, the (!!) takes time proportion to the length of the 
list, Although it can be turned into real Haskell Array
by listArray (0, n-1) xs.

I wonder if there are any purely functional implementations of BWT/IBWT, 
which don't base on random access idea nor in brute-force way.

[Reference]
[1], Mark Nelson. `Data Compression with the Burrows-Wheeler Transform'. 
http://marknelson.us/1996/09/01/bwt/

--
Larry, LIU Xinyu
https://github.com/liuxinyu95/AlgoXY

larry.liuxinyu

Henning Thielemann

larry.liuxinyu

Henning Thielemann

tags

participants (2)