
David Roundy wrote:
On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:
OK, so I *started* writing a request for help, but... well I answered my own question! See the bottom...
....
module BWT3 (bwt) where
import Data.List import qualified Data.ByteString as Raw
rotate :: Int -> Raw.ByteString -> Int -> Raw.ByteString rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
bwt xs = let l = Raw.length xs ys = rotate l xs in Raw.pack $ map (Raw.last . rotate l xs) $ sortBy (\n m -> compare (ys n) (ys m)) [0..(l-1)]
The trouble is that Raw.append is an O(N) operation, making the computation O(N^2) where it ought to be O(NlogN).
Well, it actually takes O(N^2 log N) time: N log N comparisons each taking O(N) time. With O(1) append, each comparison still has a worst-case bound of O(N) (take the degenerate input (repeat 'a')) but it will probably be much faster in practice. The log N factor can be squeezed to a constant by using a simple trie, but to be even faster, you need a clever suffix tree/array. IIRC, O(N) is the best worst case bound for the burrows wheeler transform although they're said to be not so good in practice. Note that the one usually adds an "end of string" character $ in the Burrows-Wheeler transform for compression such that sorting rotated strings becomes sorting suffices. Concerning the algorithm at hand, you can clearly avoid calculating Raw.append over and over: bwt :: Raw.ByteString -> Raw.ByteString bwt xs = Raw.pack . map (Raw.last) . sort $ rotations where n = length xs rotations = take n . map (take n) . tails $ xs `Raw.append` xs assuming that take n is O(1).
Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast speed increases... Jeepers, I can transform 52 KB so fast I can't even get to Task Manager fast enough to *check* the RAM usage! Blimey...
OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43 MB RAM. Ouch.
In this case append is an O(1) operation. But you're still getting killed on prefactors, because you're generating a list of size N and then sorting it. Lists are just not nice data structures to sort, nor are they nice to have for large N.
To get better speed and memory use, I think you'd want to avoid the intermediate list in favor of some sort of strict array, but that'd be ugly.
Eh? You can't really avoid the list for sorting, in the sense that before trying an array/in-place sort, you'd better use a trie. Regards, apfelmus