FFI calls: is it possible to allocate a small memory block on a stack?

Denys Rtveliashvili

14 Apr 2010 14 Apr '10

5:02 a.m.

Good morning, Yesterday I did a few tests to measure the performance of FFI calls and found that the calls themselves are very quick (1-2 nanosecond). However, there is a kind of FFI calls when one have to allocate a temporary memory block (for a struct, or a temporary buffer). One of examples is a call to "gettimeofday" or "clock_gettime". Unfortunately, the usual way of doing it (using the "alloca" function) is quite slow (~40 nanoseconds). I was wondering, is there any way to allocate a small chunk of data on a thread's stack? That should be cheap, as by large it is just a shift of a pointer and, perhaps, a few other trivial operations. I understand that it is not safe to allocate large blocks on the stack. But we could possibly have a function similar to "alloca" which would allocate small blocks on stack and would use the "alloca" for big ones. With kind regards, Denys Rtveliashvili

Attachments:

attachment.html (text/html — 1.2 KB)

Show replies by date

Simon Marlow

14 Apr 14 Apr

9:47 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

On 14/04/10 06:02, Denys Rtveliashvili wrote:

...

Good morning,

Yesterday I did a few tests to measure the performance of FFI calls and found that the calls themselves are very quick (1-2 nanosecond). However, there is a kind of FFI calls when one have to allocate a temporary memory block (for a struct, or a temporary buffer). One of examples is a call to "gettimeofday" or "clock_gettime". Unfortunately, the usual way of doing it (using the "alloca" function) is quite slow (~40 nanoseconds).

I was wondering, is there any way to allocate a small chunk of data on a thread's stack? That should be cheap, as by large it is just a shift of a pointer and, perhaps, a few other trivial operations. I understand that it is not safe to allocate large blocks on the stack. But we could possibly have a function similar to "alloca" which would allocate small blocks on stack and would use the "alloca" for big ones.

While alloca is not as cheap as, say, C's alloca, you should find that it is much quicker than C's malloc. I'm sure there's room for optimisation if it's critical for you. There may well be low-hanging fruit: take a look at the Core for alloca. The problem with using the stack is that alloca needs to allocate non-movable memory, and in GHC thread stacks are movable. Cheers, Simon

Denys Rtveliashvili

15 Apr 15 Apr

7:34 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

...

While alloca is not as cheap as, say, C's alloca, you should find that it is much quicker than C's malloc. I'm sure there's room for optimisation if it's critical for you. There may well be low-hanging fruit: take a look at the Core for alloca.

The problem with using the stack is that alloca needs to allocate non-movable memory, and in GHC thread stacks are movable.

Cheers, Simon

Thank you for reply. I think I have had a few wrong assumptions. One of them is that stack is non-movable. Of course, for this purpose I need a non-movable region and a pinned array on a heap is probably the only choice. Also, I was hoping it is possible to use the low-level stack (the one which is being used when instructions such as "push" and "pop" are executed), but I guess it is not possible in case of GHC-generated code. As for the performance of "alloca", I though it would be faster than "malloc". However, in a simple test I have just written it is actually slower. The test allocates 16-bytes arrays and immediately de-allocates them. This operation is repeated 1000000000 times. On my computer the C program takes 27 seconds to complete while Haskell version takes about 41. ------------ #include int main (int argc, char **argv) { for(long i = 0; i < 1000000000; i ++) { free(malloc(16)); } } ------------ module Main where import Control.Monad import Foreign.Storable import Foreign.Marshal.Alloc import Foreign.Ptr data Data = Data instance Storable Data where sizeOf _ = 16 alignment _ = 16 peek _ = return Data poke _ _ = return () main = sequence_ $ replicate 1000000000 $ alloca $ \ptr -> if (nullPtr::Ptr Data) == ptr then fail "Can't be" else return "" ------------ I would gladly take a look at the Core of "alloca". But frankly, I am not sure how to tell ghc to show me that. With the help of -ddump-simpl and -fext-core I can make it show me the Core, but it does not have the body of the "alloca" itself, just a call to it. And when I look at C-- source with the help of -ddump-cmm the source is transformed too much already to tell where "alloca" is.

Brandon S. Allbery KF8NH

17 Apr 17 Apr

6:05 p.m.

On Apr 15, 2010, at 15:34 , Denys Rtveliashvili wrote:

...

As for the performance of "alloca", I though it would be faster than "malloc". However, in a simple test I have just written it is actually slower. The test allocates 16-bytes arrays and immediately de-allocates them. This operation is repeated 1000000000 times. On my computer the C program takes 27 seconds to complete while Haskell version takes about 41.

Your C program is doing nothing but adjusting linked lists after the first time the block is allocated. free() puts the block at the head of a linked list of free blocks, and malloc() will find it there, unlink and return it. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Denys Rtveliashvili

18 Apr 18 Apr

9:28 a.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

...

While alloca is not as cheap as, say, C's alloca, you should find that it is much quicker than C's malloc. I'm sure there's room for optimisation if it's critical for you. There may well be low-hanging fruit: take a look at the Core for alloca.

Thank you, Simon. Indeed, there is a low-hanging fruit. "alloca"'s type is "Storable a => (Ptr a -> IO b) -> IO b" and it is not inlined even though the function is small. And calls to functions of such signature are expensive (I suppose that's because of look-up into typeclass dictionary). However, when I added an "INLINE" pragma for the function into Foreign.Marshal.Alloc the time of execution dropped from 40 to 20 nanoseconds. I guess the same effect will take place if other similar functions get marked with "INLINE". Is there a reason why we do not want small FFI-related functions with typeclass arguments be marked with "INLINE" pragma and gain a performance improvement? The only reason that comes to my mind is the size of code, but actually the resulting code looks very small and neat. With kind regards, Denys Rtveliashvili

Simon Marlow

19 Apr 19 Apr

1:51 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

On 18/04/2010 10:28, Denys Rtveliashvili wrote:

...

...
While alloca is not as cheap as, say, C's alloca, you should find that it is much quicker than C's malloc. I'm sure there's room for optimisation if it's critical for you. There may well be low-hanging fruit: take a look at the Core for alloca. Thank you, Simon.

Indeed, there is a low-hanging fruit.

"alloca"'s type is "Storable a => (Ptr a -> IO b) -> IO b" and it is not inlined even though the function is small. And calls to functions of such signature are expensive (I suppose that's because of look-up into typeclass dictionary). However, when I added an "INLINE" pragma for the function into Foreign.Marshal.Alloc the time of execution dropped from 40 to 20 nanoseconds. I guess the same effect will take place if other similar functions get marked with "INLINE".

Is there a reason why we do not want small FFI-related functions with typeclass arguments be marked with "INLINE" pragma and gain a performance improvement? The only reason that comes to my mind is the size of code, but actually the resulting code looks very small and neat.

Adding an INLINE pragma is the right thing for alloca and similar functions. alloca is a small overloaded wrapper around allocaBytesAligned, and without the INLINE pragma the body of allocaBytesAligned gets inlined into alloca itself, making it too big to be inlined at the call site (you can work around it with e.g. -funfolding-use-threshold=100). This is really a case of manual worker/wrapper: we want to tell GHC that alloca is a wrapper, and the way to do that is with INLINE. Ideally GHC would manage this itself - there's a lot of scope for doing some general code splitting, I don't think anyone has explored that yet. Cheers, Simon

Denys Rtveliashvili

22 Apr 22 Apr

8:25 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

Thank you, Simon I have identified a number of problems and have created patches for a couple of them. A ticket #4004 was raised in trac and I hope that someone would take a look and put it into repository if the patches look good. Things I did: * Inlining for a few functions * changed multiplication and division in include/Cmm.h to bit shifts Things that can be done: * optimizations in the threaded RTS. Locking is used frequently, and every locking on a normal mutex in "POSIX threads" costs about 20 nanoseconds on my computer. * moving some computations from Cmm code to Haskell. This requires passing an information on word size and things like that to Haskell code, but the benefit is that some computations can be performed statically as they depend primarily on the data type we allocate space for. * fix/improvement for Cmm compiler. There is some code in it already which substitutes divisions and multiplications by 2^n by bit shifts, but for some reason it does not work. Also, divisions can be replaced by multiplications with bit shifts in general case. --- Also, while looking at this thing I've got a number of questions. One of them is this: What is the meaning of "pinned_object_block" in rts/sm/Storage.h and why is it shared between TSOs? It looks like "allocatePinned" has to lock on SM_MUTEX every time it is called (in threaded RTS) because other threads can be accessing it. More than that, this block of memory is assigned to a nursery of one of the TSOs. Why should it be shared with the rest of the world then instead of being local to TSO? On the side note, is London HUG still active? The website seems to be down... With kind regards, Denys Rtveliashvili

...

Adding an INLINE pragma is the right thing for alloca and similar functions.

alloca is a small overloaded wrapper around allocaBytesAligned, and without the INLINE pragma the body of allocaBytesAligned gets inlined into alloca itself, making it too big to be inlined at the call site (you can work around it with e.g. -funfolding-use-threshold=100). This is really a case of manual worker/wrapper: we want to tell GHC that alloca is a wrapper, and the way to do that is with INLINE. Ideally GHC would manage this itself - there's a lot of scope for doing some general code splitting, I don't think anyone has explored that yet.

Cheers, Simon

Simon Marlow

9:14 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

On 22/04/10 21:25, Denys Rtveliashvili wrote:

...

Thank you, Simon

I have identified a number of problems and have created patches for a couple of them. A ticket #4004 was raised in trac and I hope that someone would take a look and put it into repository if the patches look good.

Things I did: * Inlining for a few functions

Thanks - I already did this for alloca/malloc, I'll add the others from your patch.

...

* changed multiplication and division in include/Cmm.h to bit shifts

This really shouldn't be required, I'll look into why the optimisation isn't working.

...

Things that can be done: * optimizations in the threaded RTS. Locking is used frequently, and every locking on a normal mutex in "POSIX threads" costs about 20 nanoseconds on my computer.

We go to quite a lot of trouble to avoid locking in the common cases and fast paths - most of our data structures are CPU-local. Where in particular have you encountered locking that could be reduced?

...

* moving some computations from Cmm code to Haskell. This requires passing an information on word size and things like that to Haskell code, but the benefit is that some computations can be performed statically as they depend primarily on the data type we allocate space for. * fix/improvement for Cmm compiler. There is some code in it already which substitutes divisions and multiplications by 2^n by bit shifts, but for some reason it does not work. Also, divisions can be replaced by multiplications with bit shifts in general case.

---

Also, while looking at this thing I've got a number of questions. One of them is this:

What is the meaning of "pinned_object_block" in rts/sm/Storage.h and why is it shared between TSOs? It looks like "allocatePinned" has to lock on SM_MUTEX every time it is called (in threaded RTS) because other threads can be accessing it. More than that, this block of memory is assigned to a nursery of one of the TSOs. Why should it be shared with the rest of the world then instead of being local to TSO?

The pinned_object_block is CPU-local, usually no locking is required. Only when the block is full do we have to get a new block from the block allocator, and that requires a lock, but it's a rare case. Cheers, Simon

...

On the side note, is London HUG still active? The website seems to be down...

With kind regards, Denys Rtveliashvili

...
Adding an INLINE pragma is the right thing for alloca and similar functions.

alloca is a small overloaded wrapper around allocaBytesAligned, and without the INLINE pragma the body of allocaBytesAligned gets inlined into alloca itself, making it too big to be inlined at the call site (you can work around it with e.g. -funfolding-use-threshold=100). This is really a case of manual worker/wrapper: we want to tell GHC that alloca is a wrapper, and the way to do that is with INLINE. Ideally GHC would manage this itself - there's a lot of scope for doing some general code splitting, I don't think anyone has explored that yet.

Cheers, Simon

Denys Rtveliashvili

23 Apr 23 Apr

3:39 a.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

Hi Simon,

...

Thanks - I already did this for alloca/malloc, I'll add the others from your patch.

Thank you.

...

We go to quite a lot of trouble to avoid locking in the common cases and fast paths - most of our data structures are CPU-local. Where in particular have you encountered locking that could be reduced?

...

The pinned_object_block is CPU-local, usually no locking is required. Only when the block is full do we have to get a new block from the block allocator, and that requires a lock, but it's a rare case.

OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h": extern bdescr * pinned_object_block; And in "rts/sm/Storage.c": bdescr *pinned_object_block; My C might be rusty, but I see no way for pinned_object_block to be CPU local. If it is truly CPU local then what makes it to be that kind? As for locking, here is one one of examples: StgPtr allocatePinned( lnat n ) { StgPtr p; bdescr *bd = pinned_object_block; // If the request is for a large object, then allocate() // will give us a pinned object anyway. if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { p = allocate(n); Bdescr(p)->flags |= BF_PINNED; return p; } ACQUIRE_SM_LOCK; // [RTVD: here we acquire the lock] TICK_ALLOC_HEAP_NOCTR(n); CCS_ALLOC(CCCS,n); // If we don't have a block of pinned objects yet, or the current // one isn't large enough to hold the new object, allocate a new one. if (bd == NULL || (bd->free + n) > (bd->start + BLOCK_SIZE_W)) { pinned_object_block = bd = allocBlock(); dbl_link_onto(bd, &g0s0->large_objects); g0s0->n_large_blocks++; bd->gen_no = 0; bd->step = g0s0; bd->flags = BF_PINNED | BF_LARGE; bd->free = bd->start; alloc_blocks++; } p = bd->free; bd->free += n; RELEASE_SM_LOCK; // [RTVD: here we release the lock] return p; } Of course, TICK_ALLOC_HEAP_NOCTR and CCS_ALLOC may require synchronization if they use shared state (which is, again, probably unnecessary). However, in case no profiling goes on and "pinned_object_block" is TSO-local, isn't it possible to remove locking completely from this code? The only case when locking will be necessary is when a fresh block has to be allocated, and that can be done within the "allocBlock" method (or, more precisely, by using "allocBlock_lock". ACQUIRE_SM_LOCK/RELEASE_SM_LOCK pair is present in other places too, but I have not analysed yet if it is really necessary there. For example, things like newCAF and newDynCAF are wrapped into it. With kind regards, Denys Rtveliashvili

Bertram Felgenhauer

9:57 a.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

Denys Rtveliashvili wrote:

...

OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h": [global pinned_object_block variable]

Odd. This was changed in ghc head by a patch dating Dec 1st 2009: Tue Dec 1 17:03:21 CET 2009 Simon Marlow * Make allocatePinned use local storage, and other refactorings This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. ... - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). ... which turned pinned_object_block into a per-capability variable. So which version of ghc are you looking at? regards, Bertram

Denys Rtveliashvili

6:07 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

Bertram, It appears that I am on 6.12. Strange, as I thought I have check-out the HEAD by following the instructions on the wiki: darcs get --partial http://darcs.haskell.org/ghc The wiki does not tell explicitly what will be checked out, so I expected it to be HEAD. With kind regards, Denys Rtveliashvili

...

Denys Rtveliashvili wrote:

...
OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h": [global pinned_object_block variable]

Odd. This was changed in ghc head by a patch dating Dec 1st 2009:

Tue Dec 1 17:03:21 CET 2009 Simon Marlow * Make allocatePinned use local storage, and other refactorings

This is a batch of refactoring to remove some of the GC's global state, as we move towards CPU-local GC. ... - allocatePinned() was still allocating from global storage and taking a lock each time, now it uses local storage. (mallocForeignPtrBytes should be faster with -threaded). ...

which turned pinned_object_block into a per-capability variable.

So which version of ghc are you looking at?

regards,

Bertram _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Ian Lynagh

24 Apr 24 Apr

9:58 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

On Fri, Apr 23, 2010 at 07:07:29PM +0100, Denys Rtveliashvili wrote:

...

It appears that I am on 6.12. Strange, as I thought I have check-out the HEAD by following the instructions on the wiki:

darcs get --partial http://darcs.haskell.org/ghc

The wiki does not tell explicitly what will be checked out, so I expected it to be HEAD.

That will give you the HEAD. If you check configure.ac, it will include this line: AC_INIT([The Glorious Glasgow Haskell Compilation System], [6.13], [glasgow-haskell-bugs@haskell.org], [ghc]) Thanks Ian

Simon Marlow

23 Apr 23 Apr

10:10 a.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

On 23/04/2010 04:39, Denys Rtveliashvili wrote:

...

OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h":

extern bdescr * pinned_object_block;

And in "rts/sm/Storage.c":

bdescr *pinned_object_block;

Ah, I was looking in the HEAD, where I've already fixed this by moving pinned_object_block into the Capability and hence making it CPU-local. The patch that fixed it was Tue Dec 1 16:03:21 GMT 2009 Simon Marlow * Make allocatePinned use local storage, and other refactorings

...

As for locking, here is one one of examples:

StgPtr allocatePinned( lnat n ) { StgPtr p; bdescr *bd = pinned_object_block;

// If the request is for a large object, then allocate() // will give us a pinned object anyway. if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { p = allocate(n); Bdescr(p)->flags |= BF_PINNED; return p; }

*ACQUIRE_SM_LOCK; // [RTVD: here we acquire the lock]*

TICK_ALLOC_HEAP_NOCTR(n); CCS_ALLOC(CCCS,n);

// If we don't have a block of pinned objects yet, or the current // one isn't large enough to hold the new object, allocate a new one. if (bd == NULL || (bd->free + n) > (bd->start + BLOCK_SIZE_W)) { pinned_object_block = bd = allocBlock(); dbl_link_onto(bd, &g0s0->large_objects); g0s0->n_large_blocks++; bd->gen_no = 0; bd->step = g0s0; bd->flags = BF_PINNED | BF_LARGE; bd->free = bd->start; alloc_blocks++; }

p = bd->free; bd->free += n; *RELEASE_SM_LOCK; // [RTVD: here we release the lock]* return p; }

Yes, this was also fixed by the aforementioned patch. Bear in mind that in the vast majority of programs allocatePinned is not in the inner loop, which is why it hasn't been a priority to optimise it until now.

...

Of course, TICK_ALLOC_HEAP_NOCTR and CCS_ALLOC may require synchronization if they use shared state (which is, again, probably unnecessary). However, in case no profiling goes on and "pinned_object_block" is TSO-local, isn't it possible to remove locking completely from this code? The only case when locking will be necessary is when a fresh block has to be allocated, and that can be done within the "allocBlock" method (or, more precisely, by using "allocBlock_lock".

TSO-local would be bad: TSOs are lightweight threads and in many cases are smaller than a block. Capability-local is what you want.

...

ACQUIRE_SM_LOCK/RELEASE_SM_LOCK pair is present in other places too, but I have not analysed yet if it is really necessary there. For example, things like newCAF and newDynCAF are wrapped into it.

Right, but these are not common cases that need to be optimised. newCAF is only called once per CAF, thereafter it is accessed without locks. It may be that we could find benchmarks where access to the block allocator is the performance bottleneck, indeed in the parallel GC we sometimes see contention for it. If that turns out to be a problem then we may need to think about per-CPU free lists in the block allocator, but I think it would entail a fair bit of complexity and if we're not careful extra memory overhead, e.g. where one CPU has all the free blocks in its local free list and the others have none. So I'd like to avoid going down that route unless we absolutely have to. The block allocator is nice and simple right now. Cheers, Simon

Denys Rtveliashvili

6:03 p.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

Hi Simon,

...

...
OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h":

extern bdescr * pinned_object_block;

And in "rts/sm/Storage.c":

bdescr *pinned_object_block;

Ah, I was looking in the HEAD, where I've already fixed this by moving pinned_object_block into the Capability and hence making it CPU-local. The patch that fixed it was

Tue Dec 1 16:03:21 GMT 2009 Simon Marlow * Make allocatePinned use local storage, and other refactorings

The version I have checked out is 6.12 and that's why I haven't seen this patch. Are there any plans for including this patch in the next GHC release?

...

Yes, this was also fixed by the aforementioned patch.

Bear in mind that in the vast majority of programs allocatePinned is not in the inner loop, which is why it hasn't been a priority to optimise it until now.

I guess the code which makes use of ByteStrings (especially, when it splits them into many smaller substrings) calls to allocatePinned very frequently even within inner loops.

...

TSO-local would be bad: TSOs are lightweight threads and in many cases are smaller than a block. Capability-local is what you want.

Ah... Yes, capabilities are a far better choice.

...

Right, but these are not common cases that need to be optimised. newCAF is only called once per CAF, thereafter it is accessed without locks.

Can't recall from the top of my head, but I think I had a case when newCAF was used very actively in a simple piece of code. The code looked like this: sequence_ $ replicate N $ doSmth The Cmm code showed that it produced calls to newCAF and something related to black holes. And when I added "return ()" after that line, the black holes new calls to "newCAF" have disappeared. It was on 6.12.1, I believe. I still have no idea why it happened and why these black holes where necessary, but I'll try to reproduce it one more time and show you an example if it has any interest for you.

...

It may be that we could find benchmarks where access to the block allocator is the performance bottleneck, indeed in the parallel GC we sometimes see contention for it. If that turns out to be a problem then we may need to think about per-CPU free lists in the block allocator, but I think it would entail a fair bit of complexity and if we're not careful extra memory overhead, e.g. where one CPU has all the free blocks in its local free list and the others have none. So I'd like to avoid going down that route unless we absolutely have to. The block allocator is nice and simple right now.

I suppose I should check out the HEAD then and give it a try, because earlier I had performance issues in the threaded runtime (~20% of overhead and far more noise) in an application which was doing some slicing, reshuffling and composing text via ByteStrings with a modest amount of passing data around via "Chan"s. On a slightly different topic: please could you point me to a place where stg_upd_frame_info is generated? I can't find it in *.c, *.cmm or *.hs and guess it is something very special. With kind regards, Denys Rtveliashvili

Simon Marlow

27 Apr 27 Apr

10:59 a.m.

New subject: FFI calls: is it possible to allocate a small memory block on a stack?

On 23/04/2010 19:03, Denys Rtveliashvili wrote:

...

...
Tue Dec 1 16:03:21 GMT 2009 Simon Marlowmailto:marlowsd@gmail.com> * Make allocatePinned use local storage, and other refactorings

The version I have checked out is 6.12 and that's why I haven't seen this patch. Are there any plans for including this patch in the next GHC release?

It'll be in the next major release (6.14.1).

...

...
Right, but these are not common cases that need to be optimised. newCAF is only called once per CAF, thereafter it is accessed without locks. Can't recall from the top of my head, but I think I had a case when newCAF was used very actively in a simple piece of code. The code looked like this:

sequence_ $ replicate N $ doSmth

The Cmm code showed that it produced calls to newCAF and something related to black holes.

Right, but newCAF should only be called once for any given CAF, thereafter the CAF will have been updated.

...

And when I added "return ()" after that line, the black holes new calls to "newCAF" have disappeared. It was on 6.12.1, I believe. I still have no idea why it happened and why these black holes where necessary, but I'll try to reproduce it one more time and show you an example if it has any interest for you.

If you find a case where newCAF is being called repeatedly, that would be interesting yes.

...

...
It may be that we could find benchmarks where access to the block allocator is the performance bottleneck, indeed in the parallel GC we sometimes see contention for it. If that turns out to be a problem then we may need to think about per-CPU free lists in the block allocator, but I think it would entail a fair bit of complexity and if we're not careful extra memory overhead, e.g. where one CPU has all the free blocks in its local free list and the others have none. So I'd like to avoid going down that route unless we absolutely have to. The block allocator is nice and simple right now.

I suppose I should check out the HEAD then and give it a try, because earlier I had performance issues in the threaded runtime (~20% of overhead and far more noise) in an application which was doing some slicing, reshuffling and composing text via ByteStrings with a modest amount of passing data around via "Chan"s.

I'd be interested in seeing a program that has 20% overhead with -threaded. You should watch out for bound threads though: with -threaded the main thread is a bound thread, and communication with the main thread is much slower than between unbound threads. See http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.1/Control-C...

...

On a slightly different topic: please could you point me to a place where stg_upd_frame_info is generated? I can't find it in *.c, *.cmm or *.hs and guess it is something very special.

rts/Updates.cmm: INFO_TABLE_RET( stg_upd_frame, UPDATE_FRAME, UPD_FRAME_PARAMS) { ... } Cheers, Simon

Bayley, Alistair

23 Apr 23 Apr

10:24 a.m.

New subject: London HUG domain expired

...

From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Denys Rtveliashvili

On the side note, is London HUG still active? The website seems to be down...

Looks like the London HUG domain (londonhug.net) registration has expired. Neil Bartlett was the registrant. Neil: do you plan to renew? Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

Yitzchak Gale

25 Apr 25 Apr

10:51 a.m.

New subject: [Haskell-cafe] London HUG domain expired

On Fri, Apr 23, 2010 at 1:24 PM, Bayley, Alistair wrote:

...

Looks like the London HUG domain (londonhug.net) registration has expired. Neil Bartlett was the registrant. Neil: do you plan to renew?

The whois database reports:

...

Domain name: LONDONHUG.NET This domain name is up for auction for a limited time. To place a bid, visit: http://www.namejet.com

And it appears that someone has already placed a $75 bid. The domain should be renewed promptly, before the grace period expires. Alternatively, sell the domain name before the grace period expires, and use the proceeds to register a different name for several years. Regards, Yitz

Bayley, Alistair

26 Apr 26 Apr

10:27 a.m.

New subject: [Haskell-cafe] London HUG domain expired

...

From: sefer.org@gmail.com [mailto:sefer.org@gmail.com] On Behalf Of Yitzchak Gale

On Fri, Apr 23, 2010 at 1:24 PM, Bayley, Alistair wrote:

...
Looks like the London HUG domain (londonhug.net) registration has expired. Neil Bartlett was the registrant. Neil: do you plan to renew?

The whois database reports:

...
Domain name: LONDONHUG.NET This domain name is up for auction for a limited time. To place a bid, visit: http://www.namejet.com

And it appears that someone has already placed a $75 bid.

The domain should be renewed promptly, before the grace period expires.

Probably the best option, but can anyone other than Neil do this? If not, does anyone know if Neil is reachable? (could try his phone number listed in the whois record, but that seems a little invasive). Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

5549

Age (days ago)

5562

Last active (days ago)

List overview

Download

17 comments

7 participants

participants (7)

Bayley, Alistair
Bertram Felgenhauer
Brandon S. Allbery KF8NH
Denys Rtveliashvili
Ian Lynagh
Simon Marlow
Yitzchak Gale