Re: FFI calls: is it possible to allocate a small memory block on a stack?

27 Apr 2010

      On 23/04/2010 19:03, Denys Rtveliashvili wrote:
...
...
Tue Dec  1 16:03:21 GMT 2009  Simon Marlowmailto:marlowsd@gmail.com>
      * Make allocatePinned use local storage, and other refactorings
The version I have checked out is 6.12 and that's why I haven't seen
this patch.
Are there any plans for including this patch in the next GHC release?
It'll be in the next major release (6.14.1).
...
...
Right, but these are not common cases that need to be optimised.  newCAF
is only called once per CAF, thereafter it is accessed without locks.
Can't recall from the top of my head, but I think I had a case when
newCAF was used very actively in a simple piece of code. The code looked
like this:
sequence_ $ replicate N $ doSmth
The Cmm code showed that it produced calls to newCAF and something
related to black holes.
Right, but newCAF should only be called once for any given CAF, 
thereafter the CAF will have been updated.
...
And when I added "return ()" after that line,
the black holes new calls to "newCAF" have disappeared. It was on
6.12.1, I believe. I still have no idea why it happened and why these
black holes where necessary, but I'll try to reproduce it one more time
and show you an example if it has any interest for you.
If you find a case where newCAF is being called repeatedly, that would 
be interesting yes.
...
...
It may be that we could find benchmarks where access to the block
allocator is the performance bottleneck, indeed in the parallel GC we
sometimes see contention for it.  If that turns out to be a problem then
we may need to think about per-CPU free lists in the block allocator,
but I think it would entail a fair bit of complexity and if we're not
careful extra memory overhead, e.g. where one CPU has all the free
blocks in its local free list and the others have none.  So I'd like to
avoid going down that route unless we absolutely have to.  The block
allocator is nice and simple right now.
I suppose I should check out the HEAD then and give it a try, because
earlier I had performance issues in the threaded runtime (~20% of
overhead and far more noise) in an application which was doing some
slicing, reshuffling and composing text via ByteStrings with a modest
amount of passing data around via "Chan"s.
I'd be interested in seeing a program that has 20% overhead with 
-threaded.  You should watch out for bound threads though: with 
-threaded the main thread is a bound thread, and communication with the 
main thread is much slower than between unbound threads. See

http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.1/Control-C...
...
On a slightly different topic: please could you point me to a place
where stg_upd_frame_info is generated? I can't find it in *.c, *.cmm or
*.hs and guess it is something very special.
rts/Updates.cmm:

INFO_TABLE_RET( stg_upd_frame, UPDATE_FRAME, UPD_FRAME_PARAMS)
{
...
}

Cheers,
	Simon