While alloca is not as cheap as, say, C's alloca, you should find that 
it is much quicker than C's malloc.  I'm sure there's room for 
optimisation if it's critical for you.  There may well be low-hanging 
fruit: take a look at the Core for alloca.

The problem with using the stack is that alloca needs to allocate 
non-movable memory, and in GHC thread stacks are movable.

Cheers,
	Simon

Thank you for reply.

I think I have had a few wrong assumptions. One of them is that stack is non-movable. Of course, for this purpose I need a non-movable region and a pinned array on a heap is probably the only choice.
Also, I was hoping it is possible to use the low-level stack (the one which is being used when instructions such as "push" and "pop" are executed), but I guess it is not possible in case of GHC-generated code.

As for the performance of "alloca", I though it would be faster than "malloc". However, in a simple test I have just written it is actually slower. The test allocates 16-bytes arrays and immediately de-allocates them. This operation is repeated 1000000000 times. On my computer the C program takes 27 seconds to complete while Haskell version takes about 41.

------------
#include <stdlib.h>

int main (int argc, char **argv) {
    for(long i = 0; i < 1000000000; i ++) {
        free(malloc(16));
    }
}
------------
module Main where

import Control.Monad
import Foreign.Storable
import Foreign.Marshal.Alloc
import Foreign.Ptr

data Data = Data
instance Storable Data where
  sizeOf _ = 16
  alignment _ = 16
  peek _ = return Data
  poke _ _ = return ()

main = sequence_ $ replicate 1000000000 $ alloca $ \ptr ->
  if (nullPtr::Ptr Data) == ptr then fail "Can't be" else return ""
------------

I would gladly take a look at the Core of "alloca". But frankly, I am not sure how to tell ghc to show me that. With the help of -ddump-simpl and -fext-core I can make it show me the Core, but it does not have the body of the "alloca" itself, just a call to it. And when I look at C-- source with the help of -ddump-cmm the source is transformed too much already to tell where "alloca" is.