ST not strict enough?

Hi All, I'm having some trouble with memory usage in rebuilding a ByteString with some sequences escaped. I thought I'd try vectors. However, it seems that even a relatively simple function, one that places all the bytes of a ByteString in to a vector, uses a great deal of memory. I've pulled this function out into a mini-module for testing: https://github.com/solidsnack/arx/blob/ST/Rebuild.hs#L37 On a one megabyte input, it peaks at ~38M of memory: :; dd if=/dev/zero bs=1M count=1 | ./rebuildprof +RTS -s ... 38,724,208 bytes maximum residency (5 sample(s)) 1,983,720 bytes maximum slop 76 MB total memory in use (0 MB lost due to fragmentation) ... %GC time 61.3% (61.5% elapsed) A heap profile by type, with -hy, shows a linear rise in ST items -- up to ~26M -- and then a linear decrease. It would stand to reason that, with sufficient strictness, the memory allocated to ST would stay constant and small. Should I be annotating my functions with strictness, for the vector reference, for example? Should I be using STUArrays, instead? -- Jason Dusek () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

* Jason Dusek
I'm having some trouble with memory usage in rebuilding a ByteString with some sequences escaped. I thought I'd try vectors. However, it seems that even a relatively simple function, one that places all the bytes of a ByteString in to a vector, uses a great deal of memory.
Should I be annotating my functions with strictness, for the vector reference, for example? Should I be using STUArrays, instead?
Hi Jason, I think what's happening here is ByteString's "strictness" makes things actually lazy on your side. Namely, unpack function produces its result "strictly", whole list at once. As a result, the resulting list cannot be consumed one-by-one, so it takes memory. You see ST thunks because mapM_ f as = sequence_ (map f as) and that map probably gets fused with unpack. I guess the proper solution here is to use lazy bytestring and make sure the chunks are not very big. -- Roman I. Cheplyaka :: http://ro-che.info/

2011/11/15 Roman Cheplyaka
* Jason Dusek
[2011-11-15 20:08:48+0000] I'm having some trouble with memory usage in rebuilding a ByteString with some sequences escaped. I thought I'd try vectors. However, it seems that even a relatively simple function, one that places all the bytes of a ByteString in to a vector, uses a great deal of memory.
I think what's happening here is ByteString's "strictness" makes things actually lazy on your side.
Namely, unpack function produces its result "strictly", whole list at once. As a result, the resulting list cannot be consumed one-by-one, so it takes memory. You see ST thunks because
mapM_ f as = sequence_ (map f as)
and that map probably gets fused with unpack.
I guess the proper solution here is to use lazy bytestring and make sure the chunks are not very big.
Hi Roman, Switching to the lazy ByteStrings API does, indeed, help; total memory usage is around 16M. I will have a look at the rules that are fired to see what I can learn. -- Jason Dusek () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Hi Jason,
On Tue, Nov 15, 2011 at 12:08 PM, Jason Dusek
Should I be annotating my functions with strictness, for the vector reference, for example? Should I be using STUArrays, instead?
From http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.4.1.0/Control-M... "The >>= and >> operations are strict in the state (though not in values stored in the state)." which implies that modifySTRef counter (+1) is too lazy. -- Johan

2011/11/15 Johan Tibell
On Tue, Nov 15, 2011 at 12:08 PM, Jason Dusek
wrote: Should I be annotating my functions with strictness, for the vector reference, for example? Should I be using STUArrays, instead?
From http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.4.1.0/Control-M...
"The >>= and >> operations are strict in the state (though not in values stored in the state)."
which implies that
modifySTRef counter (+1)
is too lazy.
As a first cut at strictifying the ST operations, I introduced a strict plus and strict vector write operation, strictifying every parameter that admitted it. (+!) a b = ((+) $!! a) $!! b w v n b = (Vector.unsafeWrite v $!! n) $!! b This did not alter memory usage in any noticeable way. (Tried it with strict and lazy ByteStrings and both had the same memory usage as they did without the extra strictness.) It does seem off odd that building a vector byte by byte is so hard to do performantly. Maybe the memory usage ends up being okay when working with larger structures, though. -- Jason Dusek () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments diff --git a/Rebuild.hs b/Rebuild.hs @@ -15,6 +15,7 @@ import Data.STRef import Data.String import Data.Word +import Control.DeepSeq import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as Vector (create, length) import qualified Data.Vector.Unboxed.Mutable as Vector hiding (length) @@ -46,8 +47,8 @@ rebuildAsVector bytes = byteVector n <- readSTRef counter return (Vector.unsafeSlice 0 n v) writeOneByte v counter b = do n <- readSTRef counter - Vector.unsafeWrite v n b + w v n b modifySTRef counter (+!1) + (+!) a b = ((+) $!! a) $!! b + w v n b = (Vector.unsafeWrite v $!! n) $!! b

On Wed, Nov 16, 2011 at 11:58 AM, Jason Dusek
diff --git a/Rebuild.hs b/Rebuild.hs @@ -15,6 +15,7 @@ import Data.STRef import Data.String import Data.Word
+import Control.DeepSeq import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as Vector (create, length) import qualified Data.Vector.Unboxed.Mutable as Vector hiding (length) @@ -46,8 +47,8 @@ rebuildAsVector bytes = byteVector n <- readSTRef counter return (Vector.unsafeSlice 0 n v) writeOneByte v counter b = do n <- readSTRef counter - Vector.unsafeWrite v n b + w v n b modifySTRef counter (+!1) + (+!) a b = ((+) $!! a) $!! b + w v n b = (Vector.unsafeWrite v $!! n) $!! b
+! doesn't work unless modifySTRef is already strict in the result of the function application. You need to write modifySTRef' that seq:s the result of the function application before calling writeSTRef. -- Johan

On Wed, Nov 16, 2011 at 12:07 PM, Johan Tibell
On Wed, Nov 16, 2011 at 11:58 AM, Jason Dusek
wrote: diff --git a/Rebuild.hs b/Rebuild.hs @@ -15,6 +15,7 @@ import Data.STRef import Data.String import Data.Word
+import Control.DeepSeq import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as Vector (create, length) import qualified Data.Vector.Unboxed.Mutable as Vector hiding (length) @@ -46,8 +47,8 @@ rebuildAsVector bytes = byteVector n <- readSTRef counter return (Vector.unsafeSlice 0 n v) writeOneByte v counter b = do n <- readSTRef counter - Vector.unsafeWrite v n b + w v n b modifySTRef counter (+!1) + (+!) a b = ((+) $!! a) $!! b + w v n b = (Vector.unsafeWrite v $!! n) $!! b
+! doesn't work unless modifySTRef is already strict in the result of the function application. You need to write modifySTRef' that seq:s the result of the function application before calling writeSTRef.
Just double checked. modifySTRef is too lazy: -- |Mutate the contents of an 'STRef' modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref We need Data.STRef.Strict

On Wed, Nov 16, 2011 at 12:16:34PM -0800, Johan Tibell wrote:
Just double checked. modifySTRef is too lazy:
-- |Mutate the contents of an 'STRef' modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref
We need Data.STRef.Strict
That would be awesome

On Wed, Nov 16, 2011 at 2:16 PM, Johan Tibell
On Wed, Nov 16, 2011 at 12:07 PM, Johan Tibell
wrote: On Wed, Nov 16, 2011 at 11:58 AM, Jason Dusek
wrote: diff --git a/Rebuild.hs b/Rebuild.hs @@ -15,6 +15,7 @@ import Data.STRef import Data.String import Data.Word
+import Control.DeepSeq import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as Vector (create, length) import qualified Data.Vector.Unboxed.Mutable as Vector hiding (length) @@ -46,8 +47,8 @@ rebuildAsVector bytes = byteVector n <- readSTRef counter return (Vector.unsafeSlice 0 n v) writeOneByte v counter b = do n <- readSTRef counter - Vector.unsafeWrite v n b + w v n b modifySTRef counter (+!1) + (+!) a b = ((+) $!! a) $!! b + w v n b = (Vector.unsafeWrite v $!! n) $!! b
+! doesn't work unless modifySTRef is already strict in the result of the function application. You need to write modifySTRef' that seq:s the result of the function application before calling writeSTRef.
Just double checked. modifySTRef is too lazy: -- |Mutate the contents of an 'STRef' modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref We need Data.STRef.Strict
We already have one in base - it re-exports Data.STRef in whole :-) http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-STRef-Strict... Antoine

On Wed, Nov 16, 2011 at 12:33 PM, Antoine Latter
We already have one in base - it re-exports Data.STRef in whole :-)
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-STRef-Strict...
Then it's wrong. :( In what sense is it strict? I think it should be strict in the value stored in the ref. -- Johan

On Wednesday 16 November 2011, 22:45:16, Johan Tibell wrote:
On Wed, Nov 16, 2011 at 12:33 PM, Antoine Latter
wrote: We already have one in base - it re-exports Data.STRef in whole :-)
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-STRef- Strict.html
Then it's wrong. :( In what sense is it strict?
In the sense of Control.Monad.ST.Strict vs. Control.Monad.ST.Lazy
I think it should be strict in the value stored in the ref.
Yes, we probably need that.

I ran into a similar problem with modifySTRef causing allocation and
GC. Creating my own strict version of modifySTRef got rid of all that
and my program ran without any allocation at all.
On Nov 16, 3:16 pm, Johan Tibell
On Wed, Nov 16, 2011 at 12:07 PM, Johan Tibell
wrote: On Wed, Nov 16, 2011 at 11:58 AM, Jason Dusek
wrote: diff --git a/Rebuild.hs b/Rebuild.hs @@ -15,6 +15,7 @@ import Data.STRef import Data.String import Data.Word
+import Control.DeepSeq import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as Vector (create, length) import qualified Data.Vector.Unboxed.Mutable as Vector hiding (length) @@ -46,8 +47,8 @@ rebuildAsVector bytes = byteVector n <- readSTRef counter return (Vector.unsafeSlice 0 n v) writeOneByte v counter b = do n <- readSTRef counter - Vector.unsafeWrite v n b + w v n b modifySTRef counter (+!1) + (+!) a b = ((+) $!! a) $!! b + w v n b = (Vector.unsafeWrite v $!! n) $!! b
+! doesn't work unless modifySTRef is already strict in the result of the function application. You need to write modifySTRef' that seq:s the result of the function application before calling writeSTRef.
Just double checked. modifySTRef is too lazy:
-- |Mutate the contents of an 'STRef' modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref
We need Data.STRef.Strict
_______________________________________________ Haskell-Cafe mailing list Haskell-C...@haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe

2011/11/16 Johan Tibell
On Wed, Nov 16, 2011 at 12:07 PM, Johan Tibell
wrote: +! doesn't work unless modifySTRef is already strict in the result of the function application. You need to write modifySTRef' that seq:s the result of the function application before calling writeSTRef.
Just double checked. modifySTRef is too lazy: -- |Mutate the contents of an 'STRef' modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref We need Data.STRef.Strict
Tried a modifySTRef' defined this way: modifySTRef' ref f = do val <- (f $!!) <$> readSTRef ref writeSTRef ref (val `seq` val) ...but there was no change in memory usage. -- Jason Dusek () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

On Wed, Nov 16, 2011 at 2:23 PM, Jason Dusek
Just double checked. modifySTRef is too lazy: -- |Mutate the contents of an 'STRef' modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref We need Data.STRef.Strict
Tried a modifySTRef' defined this way:
modifySTRef' ref f = do val <- (f $!!) <$> readSTRef ref writeSTRef ref (val `seq` val)
...but there was no change in memory usage.
Why not just modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = do x <- readSTRef ref writeSTRef ref $! f x (Note that I didn't check if modifySTRef was actually a problem in this case). -- Johan

2011/11/16 Johan Tibell
On Wed, Nov 16, 2011 at 2:23 PM, Jason Dusek
wrote: Tried a modifySTRef' defined this way:
modifySTRef' ref f = do val <- (f $!!) <$> readSTRef ref writeSTRef ref (val `seq` val)
...but there was no change in memory usage.
Why not just
modifySTRef :: STRef s a -> (a -> a) -> ST s () modifySTRef ref f = do x <- readSTRef ref writeSTRef ref $! f x
(Note that I didn't check if modifySTRef was actually a problem in this case).
I just didn't want to miss an opportunity to put in extra strictness annotations! School of redundancy school. -- Jason Dusek () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

Instead of rewriting modifySTRef, why not just do :
modifySTRef counter (\x -> let y = x+1 in y `seq` y)
Is there a problem with that?
2011/11/16 Johan Tibell
On Wed, Nov 16, 2011 at 11:58 AM, Jason Dusek
wrote: diff --git a/Rebuild.hs b/Rebuild.hs @@ -15,6 +15,7 @@ import Data.STRef import Data.String import Data.Word
+import Control.DeepSeq import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as Vector (create, length) import qualified Data.Vector.Unboxed.Mutable as Vector hiding (length) @@ -46,8 +47,8 @@ rebuildAsVector bytes = byteVector n <- readSTRef counter return (Vector.unsafeSlice 0 n v) writeOneByte v counter b = do n <- readSTRef counter - Vector.unsafeWrite v n b + w v n b modifySTRef counter (+!1) + (+!) a b = ((+) $!! a) $!! b + w v n b = (Vector.unsafeWrite v $!! n) $!! b
+! doesn't work unless modifySTRef is already strict in the result of the function application. You need to write modifySTRef' that seq:s the result of the function application before calling writeSTRef.
-- Johan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Friday 18 November 2011, 11:18:33, Yves Parès wrote:
Instead of rewriting modifySTRef, why not just do :
modifySTRef counter (\x -> let y = x+1 in y `seq` y)
Is there a problem with that?
Yes, y `seq` y is precisely the same as y. a `seq` b means whenever evaluation of b is demanded, also evaluate a (to WHNF). So y `seq` y ~> whenever evaluation of y is demanded, also evaluate y.

Okay, thanks
I was pretty sure I was wrong (or else somebody would already have come up
with that solution), but I wanted to know why.
2011/11/18 Daniel Fischer
On Friday 18 November 2011, 11:18:33, Yves Parès wrote:
Instead of rewriting modifySTRef, why not just do :
modifySTRef counter (\x -> let y = x+1 in y `seq` y)
Is there a problem with that?
Yes, y `seq` y is precisely the same as y.
a `seq` b means whenever evaluation of b is demanded, also evaluate a (to WHNF). So y `seq` y ~> whenever evaluation of y is demanded, also evaluate y.

(Sorry for the double mail)
...so there is no way to do that inside the function passed to modifySTRef?
In other words, there is no way to ensure *inside* a function that its
result will be evaluated strictly?
2011/11/18 Daniel Fischer
On Friday 18 November 2011, 11:18:33, Yves Parès wrote:
Instead of rewriting modifySTRef, why not just do :
modifySTRef counter (\x -> let y = x+1 in y `seq` y)
Is there a problem with that?
Yes, y `seq` y is precisely the same as y.
a `seq` b means whenever evaluation of b is demanded, also evaluate a (to WHNF). So y `seq` y ~> whenever evaluation of y is demanded, also evaluate y.

On Friday 18 November 2011, 13:05:06, Yves Parès wrote:
...so there is no way to do that inside the function passed to modifySTRef? In other words, there is no way to ensure inside a function that its result will be evaluated strictly?
Well, modifySTRef ref fun = do val <- readSTRef ref writeSTRef ref (fun val) (resp. modifySTRef ref f = writeSTRef ref . f =<< readSTRef ref as it's actually written in Data.STRef). What's written to ref is the thunk (fun val), meaning, "when the value is demanded, evaluate fun applied to the argument val". So, no, the function is not entered before the result is demanded, hence it can't. It can ensure that the result is evaluated to a deeper level than required by the calling context when the function is entered, though.

On 18 November 2011 13:17, Daniel Fischer
On Friday 18 November 2011, 13:05:06, Yves Parès wrote:
...so there is no way to do that inside the function passed to modifySTRef? In other words, there is no way to ensure inside a function that its result will be evaluated strictly?
Well,
modifySTRef ref fun = do val <- readSTRef ref writeSTRef ref (fun val)
But note therefore that you can just: val <- readSTRef ref writeSTRef ref $! fun val rather than using modifySTRef. Duncan

On Fri, Nov 18, 2011 at 4:05 AM, Yves Parès
(Sorry for the double mail) ...so there is no way to do that inside the function passed to modifySTRef? In other words, there is no way to ensure *inside* a function that its result will be evaluated strictly?
modifySTRef looks like this (sugared up a bit): modifySTRef r f = do x <- readSTRef r writeSTRef r (f x) Note that this writes the *thunk* (f x) to the STRef, and there is absolutely nothing you can do about it. "f" isn't demanded by this function, let alone "f x". You can implement your own strict modifySTRef' easily though, as other people in this thread have shown.

On Wed, Nov 16, 2011 at 07:58:51PM +0000, Jason Dusek wrote:
2011/11/15 Johan Tibell
: On Tue, Nov 15, 2011 at 12:08 PM, Jason Dusek
wrote: Should I be annotating my functions with strictness, for the vector reference, for example? Should I be using STUArrays, instead?
From http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.4.1.0/Control-M...
"The >>= and >> operations are strict in the state (though not in values stored in the state)."
which implies that
modifySTRef counter (+1)
is too lazy.
As a first cut at strictifying the ST operations, I introduced a strict plus and strict vector write operation, strictifying every parameter that admitted it.
(+!) a b = ((+) $!! a) $!! b w v n b = (Vector.unsafeWrite v $!! n) $!! b
This did not alter memory usage in any noticeable way. (Tried it with strict and lazy ByteStrings and both had the same memory usage as they did without the extra strictness.)
It does seem off odd that building a vector byte by byte is so hard to do performantly. Maybe the memory usage ends up being okay when working with larger structures, though.
I think your use of (+1) in the STRef was safe here because unsafeWrite was forcing it relatively quickly. The laziness in ST values just bit me a few days ago because I wasn't forcing the values soon enough and was getting stack overflows when evaluation was finally triggered. Have you tried building the vector using things besides write/ST? It might be a bit faster to use something like Vector.unfoldr or Vector.generateM and ByteString.index to build up a pure Vector. After that you could use Vector.unsafeThaw to convert that pure Vector into an MVector.

2011/11/16 Tristan Ravitch
Have you tried building the vector using things besides write/ST? It might be a bit faster to use something like Vector.unfoldr or Vector.generateM and ByteString.index to build up a pure Vector. After that you could use Vector.unsafeThaw to convert that pure Vector into an MVector.
I tried unfoldrN and, indeed, the memory usage has gone down. Residency seems to be <45K, regardless of input size; and the productivity is above 90% even for small (128K) inputs. Thanks for your suggestion. -- Jason Dusek () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments
participants (10)
-
Antoine Latter
-
Daniel Fischer
-
Duncan Coutts
-
Jason Dusek
-
Johan Tibell
-
Roman Cheplyaka
-
Ryan Ingram
-
tomberek
-
Tristan Ravitch
-
Yves Parès