RE: storing highly shared data structures

On 22 December 2005 13:43, Christian Maeder wrote:
for storing highly shared data structures we use so called Annotated Terms (shortly ATerms, details below).
http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm
In contrast to the Binary (or GhcBinary) class we compute the sharing, which saves a lot of space for data types that keep redundant information.
With this we can store some of our data structures (of course only non-cyclic and finite ones) in a few KBs that need MBs if stored without sharing (as when using the Binary or the Show/Read classes).
So far so good. The problem remaining is that an object is _traversed_ as if being unshared and thus the _time_ for the ATermTable construction becomes too long for us.
GHC's internal data structures (on the heap) are in many cases shared, by pointer references. I.e. if I add a (single) symbol table to every symbol that I use, then the symbol table will not be copied, but only a reference added to my symbol.
How can I detect this sharing in order to avoid traversing the very same symbol table for every symbol?
I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an object I look up its address and check if is was traversed before (if not I traverse it and store it in my map for future lookups).
1.) I'm not sure if it is safe to use "Ptr ()" (meanwhile garbage collection, heap compaction and what not could happen).
Right - Ptr isn't the right thing here, because GC will move objects around. That's why we have StablePtr and StableName. In fact, what you really want here is the pointer-equality memo table implementation in the Memo module (package util). This is scheduled to be removed in 6.6, but it will be available as a Cabal package. Cheers, Simon

Simon Marlow wrote:
Right - Ptr isn't the right thing here, because GC will move objects around. That's why we have StablePtr and StableName.
may it be that makeStableName is expensive? (or it is my additional Map?) My old version is faster, because the version with makeStableName does very much GC. Christian 1. with makeStableName (and a Map): 2,447,401,824 bytes allocated in the heap 703,294,688 bytes copied during GC 50,780,688 bytes maximum residency (24 sample(s)) 9328 collections in generation 0 (129.88s) 24 collections in generation 1 ( 4.10s) 99 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 27.28s ( 28.91s elapsed) GC time 133.98s (140.08s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 161.26s (168.99s elapsed) %GC time 83.1% (82.9% elapsed) Alloc rate 89,714,143 bytes per MUT second Productivity 16.9% of total user, 16.1% of total elapsed 2. without makeStableName: 7,560,158,340 bytes allocated in the heap 578,736,496 bytes copied during GC 44,307,024 bytes maximum residency (25 sample(s)) 28832 collections in generation 0 ( 6.83s) 25 collections in generation 1 ( 3.20s) 102 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 139.71s (146.74s elapsed) GC time 10.03s ( 10.94s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 149.74s (157.68s elapsed) %GC time 6.7% (6.9% elapsed) Alloc rate 54,113,222 bytes per MUT second Productivity 93.3% of total user, 88.6% of total elapsed

Christian Maeder wrote:
Simon Marlow wrote:
Right - Ptr isn't the right thing here, because GC will move objects around. That's why we have StablePtr and StableName.
may it be that makeStableName is expensive? (or it is my additional Map?)
My old version is faster, because the version with makeStableName does very much GC.
Christian
1. with makeStableName (and a Map):
2,447,401,824 bytes allocated in the heap 703,294,688 bytes copied during GC 50,780,688 bytes maximum residency (24 sample(s))
9328 collections in generation 0 (129.88s) 24 collections in generation 1 ( 4.10s)
99 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed) MUT time 27.28s ( 28.91s elapsed) GC time 133.98s (140.08s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 161.26s (168.99s elapsed)
%GC time 83.1% (82.9% elapsed)
Alloc rate 89,714,143 bytes per MUT second
Productivity 16.9% of total user, 16.1% of total elapsed
Interesting... this could mean that updating the stable name table on each GC is very costly. This deserves more investigation, but I'm afraid it's not high on the priority list for me. I can help if anyone else wants to look into it. Cheers, Simon

Hello Christian, Friday, January 06, 2006, 9:43:39 PM, you wrote: CM> My old version is faster, because the version with makeStableName does CM> very much GC. CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed) try to add infamous "+RTS -A10m" switch ;) it's the third time i give this advice during last weeks :) i added it to the ghc wiki faq, opening up new section "Optimization issues" -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
CM> My old version is faster, because the version with makeStableName does CM> very much GC.
CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed)
try to add infamous "+RTS -A10m" switch ;)
You saved my day, thank you Bulat! Without that flag: MUT time 24.30s ( 24.76s elapsed) GC time 131.25s (140.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 155.55s (164.77s elapsed) And with it: MUT time 23.86s ( 24.86s elapsed) GC time 11.03s ( 11.68s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 34.89s ( 36.54s elapsed)

Christian Maeder wrote:
Bulat Ziganshin wrote:
CM> My old version is faster, because the version with makeStableName does CM> very much GC.
CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed)
try to add infamous "+RTS -A10m" switch ;)
You saved my day, thank you Bulat!
Without that flag:
MUT time 24.30s ( 24.76s elapsed) GC time 131.25s (140.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 155.55s (164.77s elapsed)
And with it:
MUT time 23.86s ( 24.86s elapsed) GC time 11.03s ( 11.68s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 34.89s ( 36.54s elapsed)
The real problem seems to be that minor GCs are taking too long. Having said that, you can usually reduce GC overhead with large -A or -H options. Cheers, Simon

On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote:
Christian Maeder wrote:
Bulat Ziganshin wrote:
CM> My old version is faster, because the version with makeStableName does CM> very much GC.
CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed)
try to add infamous "+RTS -A10m" switch ;) You saved my day, thank you Bulat! Without that flag: MUT time 24.30s ( 24.76s elapsed) GC time 131.25s (140.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 155.55s (164.77s elapsed) And with it: MUT time 23.86s ( 24.86s elapsed) GC time 11.03s ( 11.68s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 34.89s ( 36.54s elapsed)
The real problem seems to be that minor GCs are taking too long. Having said that, you can usually reduce GC overhead with large -A or -H options.
Is the full table of stable names being traversed at every minor GC? If so, it might be worth somehow segregating the nursery stable names. I bet this complicates management a bunch, though, since right now there's only one table to look things up in. -Jan
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Jan-Willem Maessen wrote:
On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote:
Christian Maeder wrote:
Bulat Ziganshin wrote:
CM> My old version is faster, because the version with makeStableName does CM> very much GC.
CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed)
try to add infamous "+RTS -A10m" switch ;)
You saved my day, thank you Bulat! Without that flag: MUT time 24.30s ( 24.76s elapsed) GC time 131.25s (140.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 155.55s (164.77s elapsed) And with it: MUT time 23.86s ( 24.86s elapsed) GC time 11.03s ( 11.68s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 34.89s ( 36.54s elapsed)
The real problem seems to be that minor GCs are taking too long. Having said that, you can usually reduce GC overhead with large -A or -H options.
Is the full table of stable names being traversed at every minor GC? If so, it might be worth somehow segregating the nursery stable names. I bet this complicates management a bunch, though, since right now there's only one table to look things up in.
It's traversed, but not re-hashed. So it shouldn't be a bottleneck, unless the table is really huge. I agree that separating the gen 0 stable names into a separate table would be the right fix, though. Cheers, Simon

Hello Simon, Tuesday, January 10, 2006, 12:26:30 PM, you wrote:
CM> My old version is faster, because the version with makeStableName does CM> very much GC.
CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed)
try to add infamous "+RTS -A10m" switch ;)
SM> The real problem seems to be that minor GCs are taking too long. Having SM> said that, you can usually reduce GC overhead with large -A or -H options. it is the same problem as with scanning large IOArrays on each GC. in this case, i think, table of stable names scanned each time and therefore program works so slow. if ghc will dynamically increase "+RTS -A" area when there are a large IOArrays/stable names table, this would help improve speed significantly. also, i want to remind my old suggestion to temporarily disable all GCs, or better to temporarily change "-A" area at the programs request. That will help programs like Joels one and my own, where large data structures (say, 5 or 20 mb) are created, used and then completely discarded -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
Hello Simon,
Tuesday, January 10, 2006, 12:26:30 PM, you wrote:
CM> My old version is faster, because the version with makeStableName does CM> very much GC.
CM> MUT time 27.28s ( 28.91s elapsed) CM> GC time 133.98s (140.08s elapsed)
try to add infamous "+RTS -A10m" switch ;)
SM> The real problem seems to be that minor GCs are taking too long. Having SM> said that, you can usually reduce GC overhead with large -A or -H options.
it is the same problem as with scanning large IOArrays on each GC.
Actually I think it's a different problem, with the same workaround.
in this case, i think, table of stable names scanned each time and therefore program works so slow. if ghc will dynamically increase "+RTS -A" area when there are a large IOArrays/stable names table, this would help improve speed significantly.
I'm not keen on this, because its a hack and doesn't address the real cause of the problem. We're working on addressing the array issue, see this ticket I created today: http://cvs.haskell.org/trac/ghc/ticket/650
also, i want to remind my old suggestion to temporarily disable all GCs, or better to temporarily change "-A" area at the programs request. That will help programs like Joels one and my own, where large data structures (say, 5 or 20 mb) are created, used and then completely discarded
You can change the allocation area size from within a program quite easily. Write a little C function to assign to RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and call it from Haskell; the next time GC runs it will allocate the larger nursery. Please try this and let me know if it works. Cheers, Simon

Simon Marlow wrote:
You can change the allocation area size from within a program quite easily. Write a little C function to assign to RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and call it from Haskell; the next time GC runs it will allocate the larger nursery. Please try this and let me know if it works.
Can someone supply a code snippet for me (as I don't speak C and always hoped to avoid C by using haskell)? Christian

On 1/11/06, Christian Maeder
Simon Marlow wrote:
You can change the allocation area size from within a program quite easily. Write a little C function to assign to RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and call it from Haskell; the next time GC runs it will allocate the larger nursery. Please try this and let me know if it works.
Can someone supply a code snippet for me (as I don't speak C and always hoped to avoid C by using haskell)? Hi
Following compiles and runs for me (ghc 6.4.1), but I didn't test any effects. C-bits: #include "Rts.h" #include "RtsFlags.h" void SetMinAllocArea(int a) { RtsFlags.GcFlags.minAllocAreaSize=a; } Haskell bits: import Foreign.C.Types foreign import ccall "SetMinAllocArea" setMinAllocArea :: CInt -> IO () It requires compiling c-bits without --make, something like ghc -c bits.c -o bits.o and including it on final linking or --make step (or ar/ld, similary) ghc bits.o --make main HTH, Esa

Esa Ilari Vuokko wrote:
On 1/11/06, Christian Maeder
wrote: Simon Marlow wrote:
You can change the allocation area size from within a program quite easily. Write a little C function to assign to RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and call it from Haskell; the next time GC runs it will allocate the larger nursery. Please try this and let me know if it works.
Can someone supply a code snippet for me (as I don't speak C and always hoped to avoid C by using haskell)?
Hi
Following compiles and runs for me (ghc 6.4.1), but I didn't test any effects.
C-bits: #include "Rts.h" #include "RtsFlags.h" void SetMinAllocArea(int a) { RtsFlags.GcFlags.minAllocAreaSize=a; }
Haskell bits: import Foreign.C.Types foreign import ccall "SetMinAllocArea" setMinAllocArea :: CInt -> IO ()
It requires compiling c-bits without --make, something like ghc -c bits.c -o bits.o and including it on final linking or --make step (or ar/ld, similary) ghc bits.o --make main
I forgot to mention, minAllocAreaSize is in units of BLOCK_SIZE, so divide bytes by BLOCK_SIZE before setting it. You can get BLOCK_SIZE from Rts.h. Cheers, Simon

Hello Christian, Wednesday, January 11, 2006, 4:02:10 PM, you wrote:
RtsFlags.GcFlags.minAllocAreaSize (#include "RtsFlags.h" first), and call it from Haskell; the next time GC runs it will allocate the larger nursery. Please try this and let me know if it works.
CM> Can someone supply a code snippet for me (as I don't speak C and always CM> hoped to avoid C by using haskell)? are you sure that you need this? if you only need to "burn in" "+RTS -A" option value, it's enough to use documented way: char *ghc_rts_opts = "-A10m"; (see 4.14.5 in GHC user manual) the trick provided by Simon need only to change "-A" during program execution, for example depending on options or around GC-unfriendly part of program -- Best regards, Bulat mailto:bulatz@HotPOP.com

Bulat Ziganshin wrote:
try to add infamous "+RTS -A10m" switch ;)
Maybe -H300m is more famous? MUT time 24.92s ( 29.79s elapsed) GC time 6.32s ( 7.67s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 31.24s ( 37.46s elapsed) Christian
participants (5)
-
Bulat Ziganshin
-
Christian Maeder
-
Esa Ilari Vuokko
-
Jan-Willem Maessen
-
Simon Marlow