Space usage problems

Ian Lynagh

10 Jan 2006 10 Jan '06

4:44 p.m.

Hi all, I am having space issues with some decompression code; I've attached a much simplified version as Test1.hs. At the bottom (foo/bar) is the equivalent of deflate. This should be a standalone module which doesn't know about the rest. In the middle (readChunks) is the equivalent of gunzip. It repeatedly calls foo until there is no more input left. At the top is a simple main function that calls them. If I do dd if=/dev/zero of=data bs=1000 count=3000" # making data around 3MB ghc --make Test1 -o Test1 -O -Wall ./Test1 then in top I see Test1 increasing memory usage to around 150MB. I think this is because the "let (ys, zs) = foo xs" means zs holds on to xs (it's hard to be sure as compiling for profiling is too happy to change the behaviour). I tried (Test2) changing foo to be a monad transformer over the calling monad, so the caller's remaining input was updated as we went along, but (as well as memory usage not obviously being fixed) this is giving me a stack overflow. Has anyone got any suggestions for making a constant space, constant stack version? Thanks Ian

Attachments:

Test1.hs (text/plain — 855 bytes)
Test2.hs (text/plain — 1.0 KB)

Show replies by date

Chris Kuklewicz

10 Jan 10 Jan

5:28 p.m.

I'll make a guess... Ian Lynagh wrote:

...

Hi all,

In the middle (readChunks) is the equivalent of gunzip. It repeatedly calls foo until there is no more input left.

At the top is a simple main function that calls them.

If I do

dd if=/dev/zero of=data bs=1000 count=3000" # making data around 3MB ghc --make Test1 -o Test1 -O -Wall ./Test1

then in top I see Test1 increasing memory usage to around 150MB. I think this is because the "let (ys, zs) = foo xs" means zs holds on to xs (it's hard to be sure as compiling for profiling is too happy to change the behaviour).

I don't have a comment on your guess.

...

I tried (Test2) changing foo to be a monad transformer over the calling monad, so the caller's remaining input was updated as we went along, but (as well as memory usage not obviously being fixed) this is giving me a stack overflow.

I will ignore Test2

...

Has anyone got any suggestions for making a constant space, constant stack version?

Not yet.

...

Thanks Ian

...

------------------------------------------------------------------------

module Main (main) where

import Control.Monad (liftM) import Control.Monad.State (State, runState, evalState, get, put)

main :: IO () main = do xs <- readFile "data" ys <- readFile "data" print (evalState readChunks xs == ys)

The equality should be constant space.

...

---

type FirstMonad = State String

readChunks :: FirstMonad String readChunks = do xs <- get if null xs then return [] else do let (ys, zs) = foo xs put zs

And zs is the final state of "runState bar" which is suspect is [] And ys is the whole input (which is now all in memory)

...

rest <- readChunks return (ys ++ rest)

---

type SecondMonad = State String

foo :: String -> (String, String) foo = runState bar

bar :: SecondMonad String bar = do inp <- get case inp of [] -> return [] x:xs -> do put xs liftM (x:) bar

The liftM should be equivalent to temp <- bar return ( (x:) temp ) It looks like the first call to foo will have bar consuming the entire input string. So the flow looks like main readChuncks all-input foo all-input bar (iterated over whole input length) foo returns (all-input, []) "rest <- readChunks" (recursive call, sees null xs then "return []") "return (ys ++ rest)" which is return (all-input ++ []) In essence, your bar traverses the whole string until the state is empty. This loads your whole file into memory

Ian Lynagh

5:46 p.m.

On Tue, Jan 10, 2006 at 05:28:03PM +0000, Chris Kuklewicz wrote:

...

I'll make a guess...

Ian Lynagh wrote:

...
Hi all,

foo :: String -> (String, String) foo = runState bar

bar :: SecondMonad String bar = do inp <- get case inp of [] -> return [] x:xs -> do put xs liftM (x:) bar The liftM should be equivalent to temp <- bar return ( (x:) temp )

It looks like the first call to foo will have bar consuming the entire input string.

I'm not entirely sure what you mean here. The result will be the entire input string, but State is a lazy monad, so it won't have to consume it all before it starts returning it. For example, if you replace the definition of foo with foo xs = (evalState bar xs, "") then the program runs in constant space (but this isn't a solution to the real problem, as bar will only consume a prefix of the string there). Thanks Ian

Chris Kuklewicz

5:55 p.m.

I will continue to guess... Ian Lynagh wrote:

...

On Tue, Jan 10, 2006 at 05:28:03PM +0000, Chris Kuklewicz wrote:

...
I'll make a guess...

Ian Lynagh wrote:

...
Hi all,

foo :: String -> (String, String) foo = runState bar

bar :: SecondMonad String bar = do inp <- get case inp of [] -> return [] x:xs -> do put xs liftM (x:) bar

The liftM should be equivalent to temp <- bar return ( (x:) temp )

It looks like the first call to foo will have bar consuming the entire input string.

I'm not entirely sure what you mean here. The result will be the entire input string, but State is a lazy monad, so it won't have to consume it all before it starts returning it.

For example, if you replace the definition of foo with

foo xs = (evalState bar xs, "")

then the program runs in constant space (but this isn't a solution to the real problem, as bar will only consume a prefix of the string there).

Yes, exactly as I would have predicted. Your "let (yx,zs) = foo xs put zs" takes the second of the tuple retuned from "foo = runState bar" and put's it. Then in the recursive readChucks call, it does xs <- get if (null xs) So it has to decide if xs (which is zs which is the snd value from foo which is the state from runState bar which is "" or []) is null or not. This forces it to get the head of the String state that bar returns, or [] since there is no head. But it does not know that it is [] until bar is fully finished, which loads the whole file into memory. When you put (evalState bar xs, "") then zs is [] and put [] leads to get [] and null [] is true so it returns [] to the nested readChunks call. This does not force the file to be read.

Daniel Fischer

9:41 p.m.

Am Dienstag, 10. Januar 2006 17:44 schrieb Ian Lynagh:

...

Hi all,

I am having space issues with some decompression code; I've attached a much simplified version as Test1.hs.

At the bottom (foo/bar) is the equivalent of deflate. This should be a standalone module which doesn't know about the rest.

In the middle (readChunks) is the equivalent of gunzip. It repeatedly calls foo until there is no more input left.

At the top is a simple main function that calls them.

If I do

dd if=/dev/zero of=data bs=1000 count=3000" # making data around 3MB ghc --make Test1 -o Test1 -O -Wall ./Test1

then in top I see Test1 increasing memory usage to around 150MB. I think this is because the "let (ys, zs) = foo xs" means zs holds on to xs (it's hard to be sure as compiling for profiling is too happy to change the behaviour).

I had 72 Mb space usage for a 3Mb file. I believe, it's the 'put zs' that's consuming the memory. I changed it to readChunks = do xs <- get if null xs then return [] else do let (ys, zs) = foo xs rest = evalState readChunks zs return (ys ++ rest) and got much smaller memory usage (10Mb) -- not sure, how sensible that would be for real work and why it reduces memory. If bar can start returning before it's finished, then the same holds for the modified readChunks, but the original would have to wait for the completion of bar (via foo) until zs can be put, so the complete ys would have to be in memory at once. Just checked, modified version also runs in 10Mb for a 12mb data file, so indeed bar starts returning before finishing and it seems the above is right. ./test4 +RTS -sstderr True 1,496,184,404 bytes allocated in the heap 987,852,924 bytes copied during GC 3,226,492 bytes maximum residency (162 sample(s)) 5707 collections in generation 0 ( 12.66s) 162 collections in generation 1 ( 14.82s) 10 Mb total memory in use INIT time 0.00s ( 0.01s elapsed) MUT time 6.88s ( 15.06s elapsed) GC time 27.48s ( 55.93s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 34.36s ( 71.00s elapsed) %GC time 80.0% (78.8% elapsed) Alloc rate 217,468,663 bytes per MUT second Productivity 20.0% of total user, 9.7% of total elapsed

...

I tried (Test2) changing foo to be a monad transformer over the calling monad, so the caller's remaining input was updated as we went along, but (as well as memory usage not obviously being fixed) this is giving me a stack overflow.

Has anyone got any suggestions for making a constant space, constant stack version?

Thanks Ian

Hope that helps, Daniel

Ian Lynagh

10 p.m.

On Tue, Jan 10, 2006 at 04:44:33PM +0000, Ian Lynagh wrote:

...

readChunks :: FirstMonad String readChunks = do xs <- get if null xs then return [] else do let (ys, zs) = foo xs put zs rest <- readChunks return (ys ++ rest)

It looks like changing this let to a case fixes this example, but at the time I'd experimented with that there must have been other issues clouding the effect, such as the following. Foo1 (attached) uses large amounts of memory whereas Foo2 (also attached) runs in a little constant space. The difference is only changing this: else do chunk <- case foo xs of (ys, zs) -> do put zs return ys chunks <- readChunks return (chunk ++ chunks) to this: else case foo xs of (ys, zs) -> do put zs chunks <- readChunks return (ys ++ chunks) but I don't have a good feeling for why this should be the case given I'd expect chunk to be forced, and thus the case evaluated, at the same point in Foo1 as the case is evaluated in Foo2. Is this just a case of GHC's optimiser's behaviour depending on subtle source changes, or am I missing something? Thanks Ian

Simon Marlow

11 Jan 11 Jan

10:36 a.m.

Ian Lynagh wrote:

...

On Tue, Jan 10, 2006 at 04:44:33PM +0000, Ian Lynagh wrote:

...
readChunks :: FirstMonad String readChunks = do xs <- get if null xs then return [] else do let (ys, zs) = foo xs put zs rest <- readChunks return (ys ++ rest)

It looks like changing this let to a case fixes this example, but at the time I'd experimented with that there must have been other issues clouding the effect, such as the following.

Foo1 (attached) uses large amounts of memory whereas Foo2 (also attached) runs in a little constant space. The difference is only changing this:

else do chunk <- case foo xs of (ys, zs) -> do put zs return ys chunks <- readChunks return (chunk ++ chunks)

to this:

else case foo xs of (ys, zs) -> do put zs chunks <- readChunks return (ys ++ chunks)

I had great difficulty understanding this, but I think I do now. It's a bit easier to understand if you inline the monads away. Foo1 translates to this: bar [] = ([],[]) bar (x:xs) = let (zs,ys) = bar xs in (x:zs,ys) readChunks [] = ([],[]) readChunks xs = let (ys,zs) = bar xs (chunks,rest) = readChunks zs in (ys ++ chunks, rest) and Foo2: readChunks [] = ([],[]) readChunks xs = case bar xs of (zs,ys) -> let (chunks,rest) = readChunks ys in (zs ++ chunks, rest) This is pretty much what GHC ends up with when you give -O (actually it turns some of the tuples into unboxed tuples, but that's not important). We can see in Foo1 that chunks is a thunk holding on to zs, which is a thunk that holds on to xs, so you never get to release xs until the whole result list (ys) is traversed. GHC's lazy tuple optimisation doesn't kick in, because neither chunks nor rest are evaluated. However, it's not so clear why Foo2 is better. chunks holds on to ys, the second of the pair returned by bar. In fact, ys will point to a chain of thunks that looks like this: ys = snd (_, snd (_, snd (_, snd (_, snd ...)))) every time GC runs, it can completely eliminate this list via the well-known lazy tuple optimisation. Unfortunately it doesn't *completely* eliminate the list, because of a shortcoming in our implementation, actually reported earlier by Ian Lynagh with a very similar program :-) Fortunately in this example we do seem to be reducing enough of the list to eliminate the space leak, though. My suggestion: don't use the lazy state monad if you can help it. Cheers, Simon

Ian Lynagh

2:14 p.m.

On Wed, Jan 11, 2006 at 10:36:47AM +0000, Simon Marlow wrote:

...

My suggestion: don't use the lazy state monad if you can help it.

But a strict state monad would force everything to be loaded into memory at once, right? What would you suggest I use instead? Or do I just have to tread carefully to keep this optimisation happy until the GCer is improved? Thanks Ian

Simon Marlow

3 p.m.

Ian Lynagh wrote:

...

On Wed, Jan 11, 2006 at 10:36:47AM +0000, Simon Marlow wrote:

...
My suggestion: don't use the lazy state monad if you can help it.

But a strict state monad would force everything to be loaded into memory at once, right?

What would you suggest I use instead?

I'm not sure - can you describe exactly what you want to do from a higher level? It might help to re-think the problem from the top down.

...

Or do I just have to tread carefully to keep this optimisation happy until the GCer is improved?

I can't see us fixing this in the short term, I'm afraid. Cheers, Simon

Ian Lynagh

15 Jan 15 Jan

9:52 p.m.

On Wed, Jan 11, 2006 at 03:00:45PM +0000, Simon Marlow wrote:

...

Ian Lynagh wrote:

...
On Wed, Jan 11, 2006 at 10:36:47AM +0000, Simon Marlow wrote:

...
My suggestion: don't use the lazy state monad if you can help it.

But a strict state monad would force everything to be loaded into memory at once, right?

What would you suggest I use instead?

I'm not sure - can you describe exactly what you want to do from a higher level? It might help to re-think the problem from the top down.

OK, I have one library which provides inflate :: [Word8] -- The input -> ([Word8], -- A prefix of the input inflated (uncompressed) [Word8]) -- The remainder of the input I can't tell how much of the input I'll be inflating in advance, I only know when I reach the end of the compressed part. Inflating has an array and a few other bits of state while it uncompresses the input. (I'm assuming the inflation won't fail for now; later I might want something like inflate :: [Word8] -- The input -> ([Word8], -- A prefix of the input, inflated (uncompressed) [Word8], -- The remainder of the input Bool) -- Inflation failed where you would need to write the inflated data to a file, say, before checking the Bool to see if there was an error (if you want to work in constant space)). I'm happy to have a different type for inflate if necessary (e.g. inflate :: m [Word8] -> ([Word8] -> m ()) -> m [Word8] where inflate uses the Monad of the caller to read and write the remaining input; this leads to something using a monad transformer for inflates other state, along the lines of Test2 in my original message, which lead to a stack overflow) but it shouldn't be wedded to the following library: I then have another library with a function that does: while (some input left) read header call inflate read footer return (concat all the inflate results) Reading headers is a fiddly enough task that passing the input around by hand is undesirable. Thanks Ian

Bulat Ziganshin

17 Jan 17 Jan

9:10 p.m.

New subject: Re[2]: Re: Space usage problems

Hello Ian, Monday, January 16, 2006, 12:52:42 AM, you wrote: IL> OK, I have one library which provides IL> inflate :: [Word8] -- The input IL> -> ([Word8], -- A prefix of the input inflated (uncompressed) IL> [Word8]) -- The remainder of the input you can use strict state monad for this task. "strictness" here designate that it evaluates higher level of state, but it don't fully evaluates the state (which itslef is impossible without using DeepSeq class technique). so, for example, when you perform something like "state = tail state" the lazy state monad may store call to tail function in the state field and not evaluate it before needed, while strict state monad will evaluate this higher-level expression and store exactly the lazy expression that represents remainder of list it's interesting that your work has a lot of common with my Binary/Streams library. i'm used the same monadic types to define char enocding/decoding routines (see DataCharEncoding.hs), i've defined universal "mutable references" interface to transparently work with variables in IO/ST monads, i've extended John's StringReader/StringBuffer types to work in any monad. hust now i'm working on extending my "ByteStream->BitStream" transformer to also support any monads although i'm not sure that Haskell implementation of inflate/deflate algorithms will be a useful (just because it will be 100-1000 times slower than existing C routines), nevertheless i glad to offer my help, especially in optimizing code and making it monad-neutral in particular, looking at your code in MissingH 0.13, i recommend you try to use DiffUArray instead of Array IL> while (some input left) read header call inflate read IL> footer return (concat all the inflate results) IL> Reading headers is a fiddly enough task that passing the input around by IL> hand is undesirable. btw, "reading headers" is an perfect task for my Binary library. the ony difference is what in your library "getBits" is an built-in operation of special InfM monad, while in my library "getBits" operation is applied to the Stream objects, but nevertheless work in any monad (to be exact, it works in the monad, to which this Stream obect belongs): getBits :: (Monad m, Stream m h) => Int -> h -> m Int "getBits bits h" returns 'bits' bits read from stream 'h' most of types supporting Stream interface, works only in IO monad (including Handle and MemoryBuffer), but at least StringReader & StringBuffer types can be specialized to IO and ST monads and, in general, to any monad, which is able to support readRef/writeRef operations if you want, you can try to implement the whole inflate process on top of my Binary/Streams library, using these `getBits` operations. something like these: -- 's' is String containing your input data runST (do h <- openBitAlignedLE =<< newStringReader s ... -- here you can use `getBits bits h` to read `s` as bits sequence return ...) ps: http://freearc.narod.ru/Binary.tar.gz -- Best regards, Bulat mailto:bulatz@HotPOP.com

Ian Lynagh

10:01 p.m.

Hi Bulat, On Wed, Jan 18, 2006 at 12:10:45AM +0300, Bulat Ziganshin wrote:

...

Monday, January 16, 2006, 12:52:42 AM, you wrote:

IL> OK, I have one library which provides

IL> inflate :: [Word8] -- The input IL> -> ([Word8], -- A prefix of the input inflated (uncompressed) IL> [Word8]) -- The remainder of the input

you can use strict state monad for this task.

But then none of the output would be available until all the input was consumed, right?

...

in particular, looking at your code in MissingH 0.13, i recommend you try to use DiffUArray instead of Array

The code in missingh is old, entirely unoptimised and quite possibly slightly buggy, incidentally. Thanks Ian

7117

Age (days ago)

7124

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

Bulat Ziganshin
Chris Kuklewicz
Daniel Fischer
Ian Lynagh
Simon Marlow