profiling of functions generated by LLVM

I use LLVM to create sub-routines for efficient inner loops. I like to profile them. However, it seems that GHC's profiler does not interact well with such generated functions. Say, I have a function that creates an efficient subroutine for filling a block of data. It looks like the following: generateFill :: IO (Word64 -> Ptr Float -> IO ()) generateFill = generateFunction (LLVM code for traversing block) main :: IO () main = do let n = 100*10^6 fill <- generateFill allocaArray n $ \ptr -> {-# SCC fill #-} fill (fromIntegral n) ptr I run this with profiling. The resulting .prof report says, that 'generateFunction' demands all the computation time and 'fill' inherits it. fill Main 1297 1 0.0 0.0 99.9 0.0 generateFill Main 1298 0 0.0 0.0 99.9 0.0 generateFunction Main 1299 0 99.9 0.0 99.9 0.0 I have a rough idea why GHC treats the generated functions the way it does. However, it is pretty counter-intuitive and difficult to understand in more complex programs.
participants (1)
-
Henning Thielemann