Questions about "time and space profiling for non-strict langs" paper and cost centre impl. of GHC

Hi all, I'm trying to understand the paper "Time and space profiling for non-strict, higher-order functional languages"[1] and I'm hoping that experienced GHC devs here could help me clarifying some points. Here's my question: In Figure 5, I don't understand why no costs are attributed to `app2` rule(substitution rule). It looks like to me that this is the function application rule so `A` cost should be attributed here, but in the paper `A` cost is attributed in `app1` rule, which if I understand correctly shows how function part of an application is evaluated. I know about the "lexical scoping rule" explained in 2.4 and the cost should be attributed to the cost center in the scope of lambda definition instead of the cost center in "application site", but Figure 5 still doesn't make sense to me. To me it looks like that there should be two costs attributed in application rules. First cost is "evaluation of the function" part(which should be attributed in `app1` rule) and second is "substitution" part, which should be attributed in `app2` rule but according to the paper we only have the former cost, and the latter is free(e.g. no costs are attributed in second rule). It would be appreciated if someone could help me clarifying this. I'm also looking for more recent papers about GHC's profiler implementation. I'm especially interested profiling in multi-threaded RTS. Lastly, it'd be appreciated if someone could explain me two Bool fields in `StgSCC` constructor of STG syntax. Thanks! UPDATE: I was reading the paper one more time before sending my question and I found one more confusing part. In 4.1 it says "Lexical scoping will subsume all the costs of applying `and1` to its call sites..." but in 2.4 it says "Lexical scoping: Attribute the cost to "fun", the cost centre which lexically encloses the abstraction." so this two definitions of same thing is actually different. To me it looks like the definition in 4.1 is wrong,, it should be "definition site", not "call site". What am I missing here? [1]: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=36FE097056F4E03CCDD38B598158292E?doi=10.1.1.43.6277&rep=rep1&type=pdf --- Ömer Sinan Ağacan http://osa1.net

Ömer Sinan Ağacan wrote:
To me it looks like that there should be two costs attributed in application rules. First cost is "evaluation of the function" part(which should be attributed in `app1` rule) and second is "substitution" part, which should be attributed in `app2` rule but according to the paper we only have the former cost, and the latter is free(e.g. no costs are attributed in second rule).
Well, sort of. "app1" and "app2" are just the two stages of applying a function: First the function expression gets evaluated, then the actual function gets called. In this case "A" stands for both costs, even if they don't necessarily happen right after each other. As we are only interested in the cost sum, this is the easiest way of doing it.
It would be appreciated if someone could help me clarifying this. I'm also looking for more recent papers about GHC's profiler implementation. I'm especially interested profiling in multi-threaded RTS.
There have been no new papers that I know of, but we had a talk by Simon Marlow[1] about improvements to cost-centre stacks, as well as a more precise description of the modern semantics by Edward Z. Yang[2]. [1] https://www.youtube.com/watch?v=J0c4L-AURDQ [2] http://blog.ezyang.com/2013/09/cost-semantics-for-stg-in-modern-ghc
Lastly, it'd be appreciated if someone could explain me two Bool fields in `StgSCC` constructor of STG syntax.
Cost centres have two somewhat separate profiling mechanisms: 1. Cost gets attributed 2. Entry counts are counted Sometimes it can be beneficial for the optimiser to separate the two. For example, if we have something like case e of D -> scc<...> let f = e1 in e2 and we want to float the "f" binding outwards we would do: let f = scc<...> e1 in case e of D -> scc<...> e2 However, if we just implemented it like this, we would see the entry-count to the cost-centre increase quite a bit, because now we are also counting every entry to "f". This is why the compiler can mark the duplicated tick as "non-counting".
UPDATE: I was reading the paper one more time before sending my question and I found one more confusing part. In 4.1 it says "Lexical scoping will subsume all the costs of applying `and1` to its call sites..." but in 2.4 it says "Lexical scoping: Attribute the cost to "fun", the cost centre which lexically encloses the abstraction." so this two definitions of same thing is actually different. To me it looks like the definition in 4.1 is wrong,, it should be "definition site", not "call site". What am I missing here?
This is a special case introduced in 2.3.2: Top-level functions get the cost-centre "SUB", which always makes them take the cost-centre of the caller. You are right in that this doesn't quite match the "literal" interpretation of lexical scoping. Greetings, Peter Wortmann

Thanks Peter. Simon Marlow's talk was really interesting. After reading the slides I read related GHC code and realized that "cost-centre stack" and the stack trace printed using GHC.Stack is same thing. `libraries/base/GHC/Stack.hsc` has this definition: currentCallStack :: IO [String] currentCallStack = ccsToStrings =<< getCurrentCCS () So actually string representation of cost-centre stack is returned when current call stack is requested. (off-topic: I'm wondering why an empty tuple is passed to `getCurrentCSS`?) Now my first question is: Assuming stack traces are implemented as explained by Simon Marlow in his talk and slides, can we say that costs are always assigned to top cost-centre in the stack? So in case of an allocation or when "tick counter" triggered, can I say that always the top-most cost-centre in `rCCCs` register will be modified? (then `inherited` costs would be calculated) I'm asking this because I can't read the code generated for cost-centre annotations(code generated for StgSCC code) because for that I need to follow code generation through compilation to Cmm, but I don't know anything about Cmm yet. As far as I can understand, current implementation is different from what's explained in Sansom and Jones, for example * I can't see "SUB" cost-centre in list of "built-in cost-centres" in `rts/Profiling.c`. * We now have "stacks" which are assigned to RHS of bindings. I tried reading output of -ddump-stg but generated STG is confusing. For example, this function: fib :: Integer -> Integer fib 0 = 1 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2) is annotated with `CCS_DONT_CARE`. Why is that? Also, I can see `_push_` and `_tick_`(what's this?) instructions placed in generated STG but no `_call_` instructions. There is also something like `CCCS` in generated STG but I have no ideas about this. So in short I'm trying to understand how cost-centre related annotations are generated and how are they used. Any paper/source code/blog post suggestions would be very appreciated. As far as I can see current implementation is different than ones explained in slides/papers. Thanks, --- Ömer Sinan Ağacan http://osa1.net

Ömer Sinan Ağacan wrote:
(off-topic: I'm wondering why an empty tuple is passed to `getCurrentCSS`?)
See the comment on getCurrentCCS# in compiler/prelude/primops.txt.pp - it's a token to prevent GHC optimisations from floating the primop up (which obviously might change the call stack).
Now my first question is: Assuming stack traces are implemented as explained by Simon Marlow in his talk and slides, can we say that costs are always assigned to top cost-centre in the stack?
Cost is attributed to the current cost-centre *stack*. So "a/f" != "b/f".
As far as I can understand, current implementation is different from what's explained in Sansom and Jones, for example
The papers actually never introduce cost-centre stacks, as far as I know. It's generally better to check Sansom's PhD thesis, the GHC source code or the other sources I mentioned. There's been quite a bit of work on this...
* I can't see "SUB" cost-centre in list of "built-in cost-centres" in `rts/Profiling.c`.
As I understand it, the "SUB" cost centre refers to situations where the cost-centre stack does *not* get updated on function entry. So it never exists physically.
is annotated with `CCS_DONT_CARE`. Why is that?
That is an annotation on lambdas, and refers to what cost-centre stack their allocation cost should be counted on. As top-level functions and static constructors are allocated statically, they don't count for the heap profile, therefore "don't care". See the definition of GenStgRhs in compiler/stgSyn/StgSyn.lhs.
Also, I can see `_push_` and `_tick_`(what's this?) instructions placed in generated STG but no `_call_` instructions.
This is what actually manages cost-centre stack - "_push_" pushes the given cost-centre on top of the cost centre stack, whereas "_tick" just increases entry count. These two aspects have slightly different properties as far as transformations are concerned, and therefore often end up getting separated during optimisations. Not sure what "_call_" is suppose to be. What's the context?
There is also something like `CCCS` in generated STG but I have no ideas about this.
That's simply "current cost-centre stack". I like to think that the hint of silliness was intentional.
So in short I'm trying to understand how cost-centre related annotations are generated and how are they used.
Sure. Better place for quick queries might be #ghc on FreeNode though - my nick is petermw. Greetings, Peter Wortmann

Thanks again for the answer.
Not sure what "_call_" is suppose to be. What's the context?
In Simon Marlow's slides, stack traces are implemented in terms of "call" and "push" operations. I guess `push` in STG syntax is stands for "push" operation explained in the slides but as far as I can see "call" is missing in generated STG.
There is also something like `CCCS` in generated STG but I have no ideas about this.
That's simply "current cost-centre stack". I like to think that the hint of silliness was intentional.
Yes, but I'm wondering what does that syntax in STG mean operationally? For example, here's some part of generated STG: let { sat_s2OT [Occ=Once] :: GHC.Integer.Type.Integer [LclId, Str=DmdType] = CCCS GHC.Integer.Type.S#! [0]; } What does "CCCS" stand for here? I guess in long term it'd be best for me to learn Cmm and see the generated code for this syntax to understand what does it really do.(assuming Cmm has better documentation that STG :) )
Sure. Better place for quick queries might be #ghc on FreeNode though - my nick is petermw.
Great, my nick is osa1 and I'll be pinging you for short questions. Thanks again! --- Ömer Sinan Ağacan http://osa1.net

In Simon Marlow's slides, stack traces are implemented in terms of "call" and "push" operations. I guess `push` in STG syntax is stands for "push" operation explained in the slides but as far as I can see "call" is missing in generated STG.
There seems to be some confusion here - "push" is a language construct, whereas "call" is a function in Simon's language interpreter, just like "eval".
What does "CCCS" stand for here?
Notice that it is an annotation on a constructor. It says what cost-centre stack to put the heap cost under, just like with lambdas. In your example the value is not statically allocated, therefore we put it into the "current cost-centre" instead of "don't care". This might be a pretty obvious piece of information, but that's essentially what Stg is - prepared Core with lots of annotations to facilitate code generation. Try -ddump-prep if you want something more sensible to look at. Greetings, Peter Wortmann

On 18/05/2014 15:20, Ömer Sinan Ağacan wrote:
Thanks again for the answer.
Not sure what "_call_" is suppose to be. What's the context?
In Simon Marlow's slides, stack traces are implemented in terms of "call" and "push" operations. I guess `push` in STG syntax is stands for "push" operation explained in the slides but as far as I can see "call" is missing in generated STG.
"call" is implemented by enterFunCCS() in rts/Profiling.c. I'm afraid there's no good high-level description of how profiling works in GHC currently. My plan was to flesh out the semantics and then write it up properly, but since leaving MSR I've had less time to work on it, so unfortunately we have an implementation but no semantics. Which is a sad state of affairs, I agree. All I can say is that a lot of experimentation went into the current implementation, so I'm fairly confident it behaves sensibly in most situations. Cheers, Simon
There is also something like `CCCS` in generated STG but I have no ideas about this.
That's simply "current cost-centre stack". I like to think that the hint of silliness was intentional.
Yes, but I'm wondering what does that syntax in STG mean operationally? For example, here's some part of generated STG:
let { sat_s2OT [Occ=Once] :: GHC.Integer.Type.Integer [LclId, Str=DmdType] = CCCS GHC.Integer.Type.S#! [0]; }
What does "CCCS" stand for here? I guess in long term it'd be best for me to learn Cmm and see the generated code for this syntax to understand what does it really do.(assuming Cmm has better documentation that STG :) )
Sure. Better place for quick queries might be #ghc on FreeNode though - my nick is petermw.
Great, my nick is osa1 and I'll be pinging you for short questions. Thanks again!
--- Ömer Sinan Ağacan http://osa1.net _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
participants (3)
-
Peter Wortmann
-
Simon Marlow
-
Ömer Sinan Ağacan