
I've been following with interest the recent discussions on reddit about the extremely slow hash tables in Haskell compared to F# and OCaml, and as I understood it, this performance problem is caused by GC not liking mutable arrays http://hackage.haskell.org/trac/ghc/ticket/650 It appears from the discussion there that this is more than a simple bug, but a fundamental difficulty (slow writes vs slow GC trade-off). What I'm wondering though is how can this be unique to GHC: all arrays in OCaml and probably F# are mutable (and usually unboxed). How is this problem addressed there? Why is this supposed to be specific to boxed arrays only: wouldn't GC have to scan the whole mutable array whether it's boxed or unboxed?

Hello FFT, Monday, April 6, 2009, 11:07:33 AM, you wrote:
this problem addressed there? Why is this supposed to be specific to boxed arrays only: wouldn't GC have to scan the whole mutable array whether it's boxed or unboxed?
you need to scan only boxes: if array just contains plain cpu-level numbers, there is nothing to scan one way to solve this problem is to make one `modified` bit per each 256 elements rather than entire array so GC will have to scan only modified chunks -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello FFT, Monday, April 6, 2009, 12:32:53 PM, you wrote:
you need to scan only boxes: if array just contains plain cpu-level numbers, there is nothing to scan
Are those the only legal contents of STUArray?
numbers, chars, vanilla pointers. UArray just mimics C arrays, after all -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Mon, Apr 6, 2009 at 1:49 AM, Bulat Ziganshin
Are those the only legal contents of STUArray?
numbers, chars, vanilla pointers. UArray just mimics C arrays, after all
I haven't gotten to learning about them in detail yet, but my hope was
that STUArray was like vector<T> in C++, and STArray was like
vector

Hello FFT, Monday, April 6, 2009, 12:56:51 PM, you wrote:
Are those the only legal contents of STUArray?
numbers, chars, vanilla pointers. UArray just mimics C arrays, after all
I haven't gotten to learning about them in detail yet, but my hope was that STUArray was like vector<T> in C++, and STArray was like vector
. Both are fairly general.
well, that's good comparison for some degree, but the catch is that Haskell doesn't have unboxed types at all. therefore, [ST]UArray is something that you can't implement in pure Haskell, it's just like providing API to some C arrays. they are implemented via special operations in GHC runtime that are limited to support only numbers/chars/pointers. so you can't have UArray of Complex numbers (although you can use two array of Doubles of course) UArray is rather old thing, nowadays you may use StorableArray or one of many newer array/vector libraries. unfortunately, they tend to don't support ST monad, so you will need to write a little wrappers using unsafeIOtoST operation
So if I need a array of complex numbers in Haskell, will I need an extra level of indirection compared to C? And in addition to that some serious issues with GC speed if those arrays need to be mutable?
[1] http://haskell.org/haskellwiki/Library/ArrayRef [2] http://www.haskell.org/haskellwiki/Storable_Vector recent cafe threads http://www.haskell.org/pipermail/haskell-cafe/2008-July/045229.html http://www.haskell.org/pipermail/haskell-cafe/2009-March/057240.html -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Monday 06 April 2009 4:10:43 am Bulat Ziganshin wrote:
one way to solve this problem is to make one `modified` bit per each 256 elements rather than entire array so GC will have to scan only modified chunks
For reference, I constructed a benchmark that takes advantage of GHC's tagging of whole arrays to approximate this: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3316 Since each array has a dirty bit, making a type that stores an array of arrays that add up to the desired size is similar to having a dirty bit for chunks the size of the sub-array. The test then fills a 10 million element array. However, something about the benchmark makes it perform poorly for both small chunks and large chunks. -sstderr reports that lots of copying occurs for small chunk sizes, and I haven't bothered to figure out why this is the case. You can, however, see that marking dirty chunks in this fashion would be profitable. The un-chunked array takes around a minute here, while with chunks of 10,000 (which seems to be about the optimal value with the above copying tradeoff), it takes about 6 seconds, and that's still with 60+% GC time. -- Dan

Hello Dan, Monday, April 6, 2009, 12:35:14 PM, you wrote:
the size of the sub-array. The test then fills a 10 million element array.
However, something about the benchmark makes it perform poorly for both small chunks and large chunks. -sstderr reports that lots of copying occurs for small chunk sizes, and I haven't bothered to figure out why this is the case. You can, however, see that marking dirty chunks in this fashion would be profitable. The un-chunked array takes around a minute here, while with chunks of 10,000 (which seems to be about the optimal value with the above copying tradeoff), it takes about 6 seconds, and that's still with 60+% GC time.
i don't think that 60% GC time is bad for *this* benchmark. array filling is very trivial operation, after all. important part is 10x GC times reduce, apply these numbers to original benchmark -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (3)
-
Bulat Ziganshin
-
Dan Doel
-
FFT