
Thank you, Simon I have identified a number of problems and have created patches for a couple of them. A ticket #4004 was raised in trac and I hope that someone would take a look and put it into repository if the patches look good. Things I did: * Inlining for a few functions * changed multiplication and division in include/Cmm.h to bit shifts Things that can be done: * optimizations in the threaded RTS. Locking is used frequently, and every locking on a normal mutex in "POSIX threads" costs about 20 nanoseconds on my computer. * moving some computations from Cmm code to Haskell. This requires passing an information on word size and things like that to Haskell code, but the benefit is that some computations can be performed statically as they depend primarily on the data type we allocate space for. * fix/improvement for Cmm compiler. There is some code in it already which substitutes divisions and multiplications by 2^n by bit shifts, but for some reason it does not work. Also, divisions can be replaced by multiplications with bit shifts in general case. --- Also, while looking at this thing I've got a number of questions. One of them is this: What is the meaning of "pinned_object_block" in rts/sm/Storage.h and why is it shared between TSOs? It looks like "allocatePinned" has to lock on SM_MUTEX every time it is called (in threaded RTS) because other threads can be accessing it. More than that, this block of memory is assigned to a nursery of one of the TSOs. Why should it be shared with the rest of the world then instead of being local to TSO? On the side note, is London HUG still active? The website seems to be down... With kind regards, Denys Rtveliashvili
Adding an INLINE pragma is the right thing for alloca and similar functions.
alloca is a small overloaded wrapper around allocaBytesAligned, and without the INLINE pragma the body of allocaBytesAligned gets inlined into alloca itself, making it too big to be inlined at the call site (you can work around it with e.g. -funfolding-use-threshold=100). This is really a case of manual worker/wrapper: we want to tell GHC that alloca is a wrapper, and the way to do that is with INLINE. Ideally GHC would manage this itself - there's a lot of scope for doing some general code splitting, I don't think anyone has explored that yet.
Cheers, Simon