
On 23/04/2010 19:03, Denys Rtveliashvili wrote:
Tue Dec 1 16:03:21 GMT 2009 Simon Marlow
mailto:marlowsd@gmail.com> * Make allocatePinned use local storage, and other refactorings The version I have checked out is 6.12 and that's why I haven't seen this patch. Are there any plans for including this patch in the next GHC release?
It'll be in the next major release (6.14.1).
Right, but these are not common cases that need to be optimised. newCAF is only called once per CAF, thereafter it is accessed without locks. Can't recall from the top of my head, but I think I had a case when newCAF was used very actively in a simple piece of code. The code looked like this:
sequence_ $ replicate N $ doSmth
The Cmm code showed that it produced calls to newCAF and something related to black holes.
Right, but newCAF should only be called once for any given CAF, thereafter the CAF will have been updated.
And when I added "return ()" after that line, the black holes new calls to "newCAF" have disappeared. It was on 6.12.1, I believe. I still have no idea why it happened and why these black holes where necessary, but I'll try to reproduce it one more time and show you an example if it has any interest for you.
If you find a case where newCAF is being called repeatedly, that would be interesting yes.
It may be that we could find benchmarks where access to the block allocator is the performance bottleneck, indeed in the parallel GC we sometimes see contention for it. If that turns out to be a problem then we may need to think about per-CPU free lists in the block allocator, but I think it would entail a fair bit of complexity and if we're not careful extra memory overhead, e.g. where one CPU has all the free blocks in its local free list and the others have none. So I'd like to avoid going down that route unless we absolutely have to. The block allocator is nice and simple right now.
I suppose I should check out the HEAD then and give it a try, because earlier I had performance issues in the threaded runtime (~20% of overhead and far more noise) in an application which was doing some slicing, reshuffling and composing text via ByteStrings with a modest amount of passing data around via "Chan"s.
I'd be interested in seeing a program that has 20% overhead with -threaded. You should watch out for bound threads though: with -threaded the main thread is a bound thread, and communication with the main thread is much slower than between unbound threads. See http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.1/Control-C...
On a slightly different topic: please could you point me to a place where stg_upd_frame_info is generated? I can't find it in *.c, *.cmm or *.hs and guess it is something very special.
rts/Updates.cmm: INFO_TABLE_RET( stg_upd_frame, UPDATE_FRAME, UPD_FRAME_PARAMS) { ... } Cheers, Simon