None brute-force BWT algorithm

Hi, I read a previous thread about BWT implementation in Haskell: http://www.mail-archive.com/haskell-cafe@haskell.org/msg25609.html and http://sambangu.blogspot.com/2007/01/burrows-wheeler-transform-in-haskell They are all in a `brute-force' way, that is implement based on Burrows-Wheeler's definition like below: BWT: sort the rotations of input S to get a matrix M', return the last column L, and the row I, where S appears in M' -- O( n^2 \lg n) bwt :: (Ord a)=>[a]->([a], Int) bwt s = (map last m, i) where m = sort [ drop i s ++ take i s | i<-[1..length s]] (Just i) = elemIndex s m And the IBWT: Re-construct M' by iteratively sort on input L, add one column at a time, and pick the I-th row in M' -- O( n^3 \lg n ) ibwt :: (Ord a)=> ([a], Int) -> [a] ibwt (r, i) = m !! i where m = iterate f (replicate n []) !! n f = sort . zipWith (:) r n = length r I read Richard Bird's book, `Pearls of functional algorithm design', there is another solution. Although it is deduced step by step, the core idea is based on random access the list by index. The algorithm mentioned in the book uses suffixes for sorting instead of rotations. The performance are same in terms of big-O. I wrote the following program accordingly. BWT: sort S along with the index to get a new order of IDs, and return a permutation of S based on IDs. -- O( n^2 \lg n) if (!!) takes O(n) time bwt' :: (Ord a)=> [a] -> ([a], Int) bwt' s = (l, i) where l = map (\i->s !! ((i-1) `mod` n)) ids (Just i) = elemIndex 0 ids ids = map snd $ sortBy (compare `on` fst) $ zip rots [0..] rots = init $ zipWith (++) (tails s) (inits s) -- only non-empties n = length s IBWT: Sort the input L along with index to get a Transform vector, T [1], then permute L iteratively on T start from row I. -- O( n^2 ) if (!!) takes O(n) time ibwt' :: (Ord a) => ([a], Int) -> [a] ibwt' (r, i) = fst $ iterate f ([], i) !! n where t = map snd $ sortBy (compare `on` fst) $ zip r [0..] f (s, j) = let j' = t !! j in (s++[r !! j'], j') n = length r However, As I commented, the (!!) takes time proportion to the length of the list, Although it can be turned into real Haskell Array by listArray (0, n-1) xs. I wonder if there are any purely functional implementations of BWT/IBWT, which don't base on random access idea nor in brute-force way. [Reference] [1], Mark Nelson. `Data Compression with the Burrows-Wheeler Transform'. http://marknelson.us/1996/09/01/bwt/ -- Larry, LIU Xinyu https://github.com/liuxinyu95/AlgoXY

On Wed, 22 Jun 2011, larry.liuxinyu wrote:
Hi,
I read a previous thread about BWT implementation in Haskell: http://www.mail-archive.com/haskell-cafe@haskell.org/msg25609.html and http://sambangu.blogspot.com/2007/01/burrows-wheeler-transform-in-haskell
They are all in a `brute-force' way, that is implement based on Burrows-Wheeler's definition like below:
I thought that the knot-tying solution by Bertram Felgenhauer in the same thread was both elegant and efficient: http://www.mail-archive.com/haskell-cafe%40haskell.org/msg25692.html

Hi, I think that version is still a brute-force solution. The only difference is that it uses EOF (sentinel) so that it can sort the suffixes instead of rotations. However, the big-O complexity is the same. Let's take the rbwt for example:
rbwt xs = let res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res) in tail . map snd . zip xs $ head res
Here we can see that, although the infinity res is lazy-evaluated, it actually sorts the matrix N times, adding one column per evaluation. each time there are N elements to be sorted and the length of every element grows proportion to N, so the time is O( N * (N^2) * \lg (N^2) ) = O(N^3 \lg N) While my brute-force program provided in previous post is as same as O(N^3 \lg N). However, if the random access time is O(1) (on array) but not O(N) (on list), my 2nd program is much faster: Here is the modified version with Array. (import Data.Array) ibwt'' :: (Ord a) => ([a], Int) -> [a] ibwt'' (r, i) = fst $ iterate f ([], i) !! n where t = listArray (0, n-1) $ map snd $ sort $ zip r [0..] ra = listArray (0, n-1) r f (s, j) = let j' = t ! j in (s++[ra ! j'], j') n = length r This version only sort the input data 1 time, (proportion to O(N * \lg N), after that the transform vector t is generated. Then it iterates N times on t to get the result. so the total time is O(N * \lg N) + O(N) = O(N * \lg N) This should be much better than the brute force one. the idea is that, we can get the result without reconstructing the complete matrix, Only two columns (L & F) are enough. But how to turn the random access idea of transform-vector into purely functional settings? here I mean what if Haskell doesn't provide constant access time array? One idea is to to turn the transform vector T into a function, just like what we've done in KMP implementation in FP way. Does such solution exist? Regards. -- Larry, LIU Xinyu https://github.com/liuxinyu95/AlgoXY On Friday, June 24, 2011 6:50:46 AM UTC+8, Henning Thielemann wrote:
On Wed, 22 Jun 2011, larry.liuxinyu wrote:
Hi,
I read a previous thread about BWT implementation in Haskell: http://www.mail-archive.com/haskell-cafe@haskell.org/msg25609.html and
http://sambangu.blogspot.com/2007/01/burrows-wheeler-transform-in-haskell
They are all in a `brute-force' way, that is implement based on
Burrows-Wheeler's definition like below:
I thought that the knot-tying solution by Bertram Felgenhauer in the same thread was both elegant and efficient: http://www.mail-archive.com/haskell-cafe%40haskell.org/msg25692.html
_______________________________________________ Haskell-Cafe mailing list Haskel...@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Thu, 23 Jun 2011, larry.liuxinyu wrote:
I think that version is still a brute-force solution. The only difference is that it uses EOF (sentinel) so that it can sort the suffixes instead of rotations. However, the big-O complexity is the same.
Let's take the rbwt for example:
rbwt xs = let res = sortBy (\(a:as) (b:bs) -> a `compare` b) (zipWith' (:) xs res) in tail . map snd . zip xs $ head res Here we can see that, although the infinity res is lazy-evaluated, it actually sorts the matrix N times, adding one column per evaluation.
Did you also compare the actual runtimes? As far as I understand, the matrix in Bertram's code is sorted once. Sharing of data avoids recomputation.
participants (2)
-
Henning Thielemann
-
larry.liuxinyu