Hi,
I read a previous thread about BWT implementation in Haskell:
http://www.mail-archive.com/haskell-cafe@haskell.org/msg25609.html
and
http://sambangu.blogspot.com/2007/01/burrows-wheeler-transform-in-haskell
They are all in a `brute-force' way, that is implement based on Burrows-Wheeler's definition like below:
BWT: sort the rotations of input S to get a matrix M', return the last column L, and the row I, where S appears in M'
-- O( n^2 \lg n)
bwt :: (Ord a)=>[a]->([a], Int)
bwt s = (map last m, i) where
m = sort [ drop i s ++ take i s | i<-[1..length s]]
(Just i) = elemIndex s m
And the IBWT: Re-construct M' by iteratively sort on input L, add one column at a time, and pick the I-th row in M'
-- O( n^3 \lg n )
ibwt :: (Ord a)=> ([a], Int) -> [a]
ibwt (r, i) = m !! i where
m = iterate f (replicate n []) !! n
f = sort . zipWith (:) r
n = length r
I read Richard Bird's book, `Pearls of functional algorithm design', there is another solution. Although it is deduced step by step,
the core idea is based on random access the list by index. The algorithm mentioned in the book uses suffixes for
sorting instead of rotations. The performance are same in terms of big-O. I wrote the following program accordingly.
BWT: sort S along with the index to get a new order of IDs, and return a permutation of S based on IDs.
-- O( n^2 \lg n) if (!!) takes O(n) time
bwt' :: (Ord a)=> [a] -> ([a], Int)
bwt' s = (l, i) where
l = map (\i->s !! ((i-1) `mod` n)) ids
(Just i) = elemIndex 0 ids
ids = map snd $ sortBy (compare `on` fst) $ zip rots [0..]
rots = init $ zipWith (++) (tails s) (inits s) -- only non-empties
n = length s
IBWT: Sort the input L along with index to get a Transform vector, T [1], then permute L iteratively on T start from row I.
-- O( n^2 ) if (!!) takes O(n) time
ibwt' :: (Ord a) => ([a], Int) -> [a]
ibwt' (r, i) = fst $ iterate f ([], i) !! n where
t = map snd $ sortBy (compare `on` fst) $ zip r [0..]
f (s, j) = let j' = t !! j in (s++[r !! j'], j')
n = length r
However, As I commented, the (!!) takes time proportion to the length of the list, Although it can be turned into real Haskell Array
by listArray (0, n-1) xs.
I wonder if there are any purely functional implementations of BWT/IBWT, which don't base on random access idea nor in brute-force way.
[Reference]
[1], Mark Nelson. `Data Compression with the Burrows-Wheeler Transform'. http://marknelson.us/1996/09/01/bwt/
--
Larry, LIU Xinyu
https://github.com/liuxinyu95/AlgoXY