
Hi Having written a suffix-array based program using lists, I thought I'd speed it up a bit, and perhaps more importantly, save space, using arrays. Basically, the problem is to sort all suffixes of a string, represented as an array of offsets, alphabetically. (Does anybody have such code? Efficient?) The good news is that space consumption is reduced -- except for a few weird spikes (which makes me suspect there are many of them, living for very short durations), memory use is low. But speed is another issue -- the list based version is a lot faster. Okay, I use UArrays and permute them by (//), and this operation is totally dominant in the profile. Is it correct that GHC is very naive about updating them, and even small updates cost O(n)? Is it better to (//) over few large lists than over many small ones? I'll try a IOUArray next; however, is there anything else I could/should try? I know there has been array sorts implemented previously, are anybody aware of recent results? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

But speed is another issue -- the list based version is a lot faster. Okay, I use UArrays and permute them by (//), and this operation is totally dominant in the profile. Is it correct that GHC is very naive about updating them, and even small updates cost O(n)? Is it better to (//) over few large lists than over many small ones?
Yes, (//) is terrible. It *never* tries to update in-place. Pretty much, haskell arrays (the non state ones) are very good datastructures for algorithms in which you initialize them and then read from them, but never overwrite. If you need to overwrite, you're almost always better off with a more functional datastructure like Data.FiniteMap (or Lists), or using either IOUArray or STUArray. DiffArrays are very flaky and I would recommend staying away from them, since sometimes (someone more experienced with them can comment more on them) they behave very unexpectedly (with respect to creating new versions of themselves). - Hal

Hal Daume III
Yes, (//) is terrible. It *never* tries to update in-place.
Any reason it couldn't be done in-place? (I.e. thaw, update all, and freeze again) Am I missing something -- Could partial results be used, the update list be infinite, or anything like that?
DiffArrays are very flaky and I would recommend staying away from them,
Yep, my program crashed with them :-) (I may inadvertently be doing something nasty or illegal, though. The docs I found weren't all that great) -kzm -- If I haven't seen further, it is by standing in the footprints of giants

ketil@ii.uib.no (Ketil Z. Malde) writes:
Hal Daume III
writes:
Yes, (//) is terrible. It *never* tries to update in-place.
Any reason it couldn't be done in-place? (I.e. thaw, update all, and freeze again) Am I missing something -- Could partial results be used, the update list be infinite, or anything like that?
I'm toying with STArrays in order to fix this. In prinicple it should be simple, but it ended up a tangled mess of incomprehensible warnings and strange errors. Oh well, I've boiled it down to something like: replace :: UArray Int Int -> [(Int,Int)] -> UArray Int Int replace a p = runST (thaw a >>= \u -> update u p >> freeze u) update :: STUArray () Int Int -> [(Int,Int)] -> ST () () update u ps = mapM_ (uncurry (writeArray u)) ps Which gives me Compiling SuffixArray ( SuffixArray.lhs, interpreted ) SuffixArray.lhs:131: Cannot unify the type-signature variable `s' with the type `()' Expected type: ST s a Inferred type: ST () (b Int Int) In the expression: (thaw a) >>= (\ u -> (update u p) >> (freeze u)) In the first argument of `runST', namely `((thaw a) >>= (\ u -> (update u p) >> (freeze u)))' I've tried rearranging the code in various ways, but I can get no further. Sounds to me that setting s=() and a=(b Int Int) would do the trick, but apparently the compiler disagrees. I haven't used the ST monad before, so perhaps I'm missing something obvious? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

ketil@ii.uib.no (Ketil Z. Malde) writes:
ketil@ii.uib.no (Ketil Z. Malde) writes:
Hal Daume III
writes:
Yes, (//) is terrible. It *never* tries to update in-place.
replace :: UArray Int Int -> [(Int,Int)] -> UArray Int Int replace a p = runST (thaw a >>= \u -> update u p >> freeze u)
update :: STUArray () Int Int -> [(Int,Int)] -> ST () () update u ps = mapM_ (uncurry (writeArray u)) ps
I've tried rearranging the code in various ways, but I can get no further.
It's funny, you know, how asking questions on the internet gets you the answer quickly. Just after sending this, I scratched my head, and did: update :: STUArray s Int Int -> [(Int,Int)] -> ST s () and, well, that apparently worked. Like a charm. The 's' parameter is apparently just magic, or in Marcin Kowalczyk's words (old (26 Feb 2001) post to the haskell list): | The type variable 's' is used in a very tricky way, to ensure | safety when | runST :: (forall s. ST s a) -> a | is used to wrap the ST-monadic computation in a purely functional | interface. It does not correspond to the type of data being | manipulated. (I'll be right back with the benchmarks.) -kzm -- If I haven't seen further, it is by standing in the footprints of giants

ketil@ii.uib.no (Ketil Z. Malde) writes:
replace :: UArray Int Int -> [(Int,Int)] -> UArray Int Int replace a p = runST (thaw a >>= \u -> update u p >> freeze u)
update :: STUArray s Int Int -> [(Int,Int)] -> ST s () update u ps = mapM_ (uncurry (writeArray u)) ps
(I'll be right back with the benchmarks.)
I know you're all eagerly waiting for this, so here's a small progress report. Or lack-of-progress report, if you like. Apparently, I get really huge memory consumption when using the above repeatedly. Normally, I can deal with it, but profiling (-h) doesn't show any likely culprit, the curves stay well below 60k for the most part. I know there are different kinds of memory profiles (retainer profile, etc), is that where I have too look? Or is the problem something else? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

You shouldn't try to write these functions. You should do all array modifications within the ST monad, rather than looking for a pure solution. - Hal -- Hal Daume III "Computer science is no more about computers | hdaume@isi.edu than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On 21 Jan 2003, Ketil Z. Malde wrote:
ketil@ii.uib.no (Ketil Z. Malde) writes:
replace :: UArray Int Int -> [(Int,Int)] -> UArray Int Int replace a p = runST (thaw a >>= \u -> update u p >> freeze u)
update :: STUArray s Int Int -> [(Int,Int)] -> ST s () update u ps = mapM_ (uncurry (writeArray u)) ps
(I'll be right back with the benchmarks.)
I know you're all eagerly waiting for this, so here's a small progress report. Or lack-of-progress report, if you like.
Apparently, I get really huge memory consumption when using the above repeatedly. Normally, I can deal with it, but profiling (-h) doesn't show any likely culprit, the curves stay well below 60k for the most part. I know there are different kinds of memory profiles (retainer profile, etc), is that where I have too look? Or is the problem something else?
-kzm -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hal Daume III
You shouldn't try to write these functions. You should do all array modifications within the ST monad, rather than looking for a pure solution.
All right, but why? It seems an obvious trick, take some pieces that benefit from imperative processing, and wrap them in ST, leaving the rest of the program as it were. Can you (or anybody else) explain the memory behaviour I see (using hundreds of megabytes, but only tens of K visible in the profiling output)? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

All right, but why? It seems an obvious trick, take some pieces that benefit from imperative processing, and wrap them in ST, leaving the rest of the program as it were.
Because the reason for your terrible performance is that arrays are being copied willy nilly :). In order to avoid this and to get in-place update (which is what you wanted originally, iirc), you need to do all your modifications inside the ST or IO monad.
Can you (or anybody else) explain the memory behaviour I see (using hundreds of megabytes, but only tens of K visible in the profiling output)?
I cannot :). If your program is short and I can take a look at it, I might be able to say something though.... - Hal

Any reason it couldn't be done in-place? (I.e. thaw, update all, and freeze again) Am I missing something -- Could partial results be used, the update list be infinite, or anything like that?
I believe that's essentially what normal arrays are doing, but that's not inplace. In the process of thawing, you're copying the array. If you're not copying it, then the results are unsound.

Hal Daume III
Any reason it couldn't be done in-place? (I.e. thaw, update all, and freeze again) Am I missing something -- Could partial results be used, the update list be infinite, or anything like that?
I believe that's essentially what normal arrays are doing,
Makes sense, I believe -- and I don't seem to be getting any better performance by doing it explicitly, so you're probably right.
but that's not inplace.
But it's O(n), not O(n^2). It's just a factor of two compared to entirely in place, not a big deal. In theory.
In the process of thawing, you're copying the array. If you're not copying it, then the results are unsound.
Right. So I noticed (trying to use unsafeThaw; unsafeFreeze is okay, of course). -kzm -- If I haven't seen further, it is by standing in the footprints of giants
participants (2)
-
Hal Daume III
-
ketil@ii.uib.no