Incrementally consuming the eventlog

Johan Tibell

28 Apr 2011 28 Apr '11

9:31 p.m.

Hi, We were discussing how to consume the eventlog incrementally on #ghc today. Would it be feasible to offer an API (e.g. GHC.EventLog) that allows programs to register for events that occur in the program? Programs would register listeners like so: registerEventListener (\ event -> doSomethingWith event) The current write-to-file system could be implemented in terms of this API. We could even allow event logging to be turned on/off from within the application, allowing developers to attach to a running application, enable event logging, diagnose the app, and then turn of event logging again. The RTS would invoke listeners every time a new event is written. This design has many benefits: - We don't need to introduce the serialization, deserialization, and I/O overhead of first writing the eventlog to file and then parsing it again. - Programs could monitor themselves and provide debug output (e.g. via some UI component). - Users could write code that redirects the output elsewhere e.g. to a socket for remote monitoring. Would invoking a callback on each event add too big of an overhead? How about invoking the callback once every time the event buffer is full? Johan

Show replies by date

Donnie Jones

28 Apr 28 Apr

9:53 p.m.

Hello Johan, I did the initial implementation of GHC.Eventlog. Sadly, I haven't had time to work on it since starting a full-time job after university. That being said, I am still interested in GHC and the improvement of GHC.Eventlog. Hopefully soon, I will have the time to do more development on GHC... hopefully. ;) Anyway, from your description, I don't understand how a listener would consume the eventlog incrementally? I do think it would be useful to register listeners for events. I do not think the invocation of a callback would be too much overhead, rather the action the callback performs could be a very significant overhead, such as sending eventlog data over a network connection. But, if you are willing to accept the performance loss from the callback's action to gain the event data then it seems worthwhile to me. I'm sure Simon M knows better than I do regarding this... I look forward to hearing more. Thanks. -- Donnie On Thu, Apr 28, 2011 at 4:31 PM, Johan Tibell wrote:

...

Hi,

We were discussing how to consume the eventlog incrementally on #ghc today. Would it be feasible to offer an API (e.g. GHC.EventLog) that allows programs to register for events that occur in the program? Programs would register listeners like so:

registerEventListener (\ event -> doSomethingWith event)

The current write-to-file system could be implemented in terms of this API. We could even allow event logging to be turned on/off from within the application, allowing developers to attach to a running application, enable event logging, diagnose the app, and then turn of event logging again.

The RTS would invoke listeners every time a new event is written. This design has many benefits:

- We don't need to introduce the serialization, deserialization, and I/O overhead of first writing the eventlog to file and then parsing it again. - Programs could monitor themselves and provide debug output (e.g. via some UI component). - Users could write code that redirects the output elsewhere e.g. to a socket for remote monitoring.

Would invoking a callback on each event add too big of an overhead? How about invoking the callback once every time the event buffer is full?

Johan

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Don Stewart

10 p.m.

I'm very interested in what the best way to get incremental event data from a running GHC process would be. Looking at the code, we flush the event buffer fairly regularly, but the event parser is currently strict. So we'd need a lazy (or incremental) parser, that'll return a list of successful event parses, then block. I suspect this mode would be supported. *My evil plan is to write a little monitoring web app that just attaches to the event stream and renders it in a useful "heartbeat" format* , but I need incremental parsing. -- Don On Thu, Apr 28, 2011 at 2:53 PM, Donnie Jones wrote:

...

Hello Johan,

I did the initial implementation of GHC.Eventlog. Sadly, I haven't had time to work on it since starting a full-time job after university. That being said, I am still interested in GHC and the improvement of GHC.Eventlog. Hopefully soon, I will have the time to do more development on GHC... hopefully. ;)

Anyway, from your description, I don't understand how a listener would consume the eventlog incrementally?

I do think it would be useful to register listeners for events. I do not think the invocation of a callback would be too much overhead, rather the action the callback performs could be a very significant overhead, such as sending eventlog data over a network connection. But, if you are willing to accept the performance loss from the callback's action to gain the event data then it seems worthwhile to me.

I'm sure Simon M knows better than I do regarding this... I look forward to hearing more. Thanks. -- Donnie

On Thu, Apr 28, 2011 at 4:31 PM, Johan Tibell wrote:

...
Hi,

We were discussing how to consume the eventlog incrementally on #ghc today. Would it be feasible to offer an API (e.g. GHC.EventLog) that allows programs to register for events that occur in the program? Programs would register listeners like so:

registerEventListener (\ event -> doSomethingWith event)

The current write-to-file system could be implemented in terms of this API. We could even allow event logging to be turned on/off from within the application, allowing developers to attach to a running application, enable event logging, diagnose the app, and then turn of event logging again.

The RTS would invoke listeners every time a new event is written. This design has many benefits:

- We don't need to introduce the serialization, deserialization, and I/O overhead of first writing the eventlog to file and then parsing it again. - Programs could monitor themselves and provide debug output (e.g. via some UI component). - Users could write code that redirects the output elsewhere e.g. to a socket for remote monitoring.

Would invoking a callback on each event add too big of an overhead? How about invoking the callback once every time the event buffer is full?

Johan

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Johan Tibell

1 May 1 May

8:44 a.m.

On Fri, Apr 29, 2011 at 12:00 AM, Don Stewart wrote:

...

I'm very interested in what the best way to get incremental event data from a running GHC process would be.

Looking at the code, we flush the event buffer fairly regularly, but the event parser is currently strict.

So we'd need a lazy (or incremental) parser, that'll return a list of successful event parses, then block. I suspect this mode would be supported.

*My evil plan is to write a little monitoring web app that just attaches to the event stream and renders it in a useful "heartbeat" format* , but I need incremental parsing.

A less general solution might be to have the program itself start a little web server on some port and use the API I proposed to serve JSON data with the aggregate statistics you care about. Example: main = do eventData <- newIORef server <- serveOn 8080 $ \ _req -> readIORef eventData >>= sendResponse eventData registerEventListener $ \ ev -> updateEventData eventData ev runNormalProgram You can wrap the creation of the webserver in a little helper function an make any program "monitorable" simply by doing main = withMonitoring runApp withMonitoring would take care of starting/stopping the webserver and processing events. Just a thought. Johan

Don Stewart

5:22 p.m.

I've put a library for incremental parsing of the event log here: http://code.haskell.org/~dons/code/ghc-events-stream/ The goal is to implement something like: http://www.erlang.org/doc/man/heart.html On Sun, May 1, 2011 at 1:44 AM, Johan Tibell wrote:

...

On Fri, Apr 29, 2011 at 12:00 AM, Don Stewart wrote:

...
I'm very interested in what the best way to get incremental event data from a running GHC process would be.

Looking at the code, we flush the event buffer fairly regularly, but the event parser is currently strict.

So we'd need a lazy (or incremental) parser, that'll return a list of successful event parses, then block. I suspect this mode would be supported.

*My evil plan is to write a little monitoring web app that just attaches to the event stream and renders it in a useful "heartbeat" format* , but I need incremental parsing.

A less general solution might be to have the program itself start a little web server on some port and use the API I proposed to serve JSON data with the aggregate statistics you care about. Example:

main = do eventData <- newIORef server <- serveOn 8080 $ \ _req -> readIORef eventData >>= sendResponse eventData registerEventListener $ \ ev -> updateEventData eventData ev runNormalProgram

You can wrap the creation of the webserver in a little helper function an make any program "monitorable" simply by doing

main = withMonitoring runApp

withMonitoring would take care of starting/stopping the webserver and processing events.

Just a thought.

Johan

Bryan O'Sullivan

2 May 2 May

2:59 a.m.

On Thu, Apr 28, 2011 at 3:00 PM, Don Stewart wrote:

...

So we'd need a lazy (or incremental) parser, that'll return a list of successful event parses, then block. I suspect this mode would be supported.

A while ago, I hacked something together on top of the current eventlog parser that would consume an event at a time, and record the seek offset of each successful parse. If parsing failed (due to unflushed data), it would try again later. I think I might even claim that this is a somewhat sensible and parsimonious approach, but I'm drinking wine right now, so my judgment might be impaired.

Don Stewart

3:11 a.m.

I managed to build one on top of attoparsec's lazy parser that "seems to work" -- but I'd like ghc to flush a bit more regularly so I could test it better. -- Don On Sun, May 1, 2011 at 7:59 PM, Bryan O'Sullivan wrote:

...

On Thu, Apr 28, 2011 at 3:00 PM, Don Stewart wrote:

...
So we'd need a lazy (or incremental) parser, that'll return a list of successful event parses, then block. I suspect this mode would be supported.

A while ago, I hacked something together on top of the current eventlog parser that would consume an event at a time, and record the seek offset of each successful parse. If parsing failed (due to unflushed data), it would try again later. I think I might even claim that this is a somewhat sensible and parsimonious approach, but I'm drinking wine right now, so my judgment might be impaired.

Johan Tibell

1 May 1 May

8:39 a.m.

On Thu, Apr 28, 2011 at 11:53 PM, Donnie Jones wrote:

...

Anyway, from your description, I don't understand how a listener would consume the eventlog incrementally?

I simply meant that I want to be able to register listeners for events instead of having to parse the eventlog file after the fact.

...

I do think it would be useful to register listeners for events. I do not think the invocation of a callback would be too much overhead, rather the action the callback performs could be a very significant overhead, such as sending eventlog data over a network connection. But, if you are willing to accept the performance loss from the callback's action to gain the event data then it seems worthwhile to me.

A typical use of the callback would be to update some internal data structure of the program itself, thereby making the program self-monitoring. I've been toying with introducing log levels to the eventlog command line API so the consumer of the event log can specify the number of events it would like to receive. We could do something similar for the API e.g. registerEventListener (schedEvents .|. ioManagerEvents) (\ e -> ...) Johan

Duncan Coutts

8:51 p.m.

On Thu, 2011-04-28 at 23:31 +0200, Johan Tibell wrote:

...

The RTS would invoke listeners every time a new event is written. This design has many benefits:

- We don't need to introduce the serialization, deserialization, and I/O overhead of first writing the eventlog to file and then parsing it again.

The events are basically generated in serialised form (via C code that writes them directly into the event buffer). They never exist as Haskell data structures, or even C structures.

...

- Programs could monitor themselves and provide debug output (e.g. via some UI component). - Users could write code that redirects the output elsewhere e.g. to a socket for remote monitoring.

Would invoking a callback on each event add too big of an overhead?

Yes, by orders of magnitude. In fact it's impossible because the act of invoking the callback would generate more events... :-)

...

How about invoking the callback once every time the event buffer is full?

That's much more realistic. Still, do we need the generality of pushing the event buffers through the Haskell code? For some reason it makes me slightly nervous. How about just setting which output FD the event buffers get written to. Turning all events or various classes of events on/off at runtime should be doable. The design already supports multiple classes, though currently it just has one class (the 'scheduler' class). The current design does not support fine grained filtering at the point of event generation. Those two features combined (plus control over the frequency of event buffer flushing) would be enough to implement a monitoring socket interface (web http or local unix domain socket). Making the parser in the ghc-events package incremental would be sensible and quite doable as people have already demonstrated. Duncan

Don Stewart

10:34 p.m.

I've got a proof of concept event-log monitoring server and incremental parser for event streams: * http://code.haskell.org/~dons/code/ghc-events-stream/ * http://code.haskell.org/~dons/code/ghc-monitor/ Little screen shot of the snap server running, watching a Haskell process' eventlog fifo: * http://i.imgur.com/Xfr9I.png The main issue at the moment is that GHC is irregular in scheduling flusing of the event log stream, so it might be hours or days before you see any activity. This isn't useful for heartbeat style monitoring. Also, we need to break out a bit of ThreadScope to give access to its analytics (e.g. rendering time series). -- Don On Sun, May 1, 2011 at 1:51 PM, Duncan Coutts wrote:

...

On Thu, 2011-04-28 at 23:31 +0200, Johan Tibell wrote:

...
The RTS would invoke listeners every time a new event is written. This design has many benefits:

- We don't need to introduce the serialization, deserialization, and I/O overhead of first writing the eventlog to file and then parsing it again.

The events are basically generated in serialised form (via C code that writes them directly into the event buffer). They never exist as Haskell data structures, or even C structures.

...
- Programs could monitor themselves and provide debug output (e.g. via some UI component). - Users could write code that redirects the output elsewhere e.g. to a socket for remote monitoring.

Would invoking a callback on each event add too big of an overhead?

Yes, by orders of magnitude. In fact it's impossible because the act of invoking the callback would generate more events... :-)

...
How about invoking the callback once every time the event buffer is full?

That's much more realistic. Still, do we need the generality of pushing the event buffers through the Haskell code? For some reason it makes me slightly nervous. How about just setting which output FD the event buffers get written to.

Turning all events or various classes of events on/off at runtime should be doable. The design already supports multiple classes, though currently it just has one class (the 'scheduler' class). The current design does not support fine grained filtering at the point of event generation.

Those two features combined (plus control over the frequency of event buffer flushing) would be enough to implement a monitoring socket interface (web http or local unix domain socket).

Making the parser in the ghc-events package incremental would be sensible and quite doable as people have already demonstrated.

Duncan

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

5180

Age (days ago)

5184

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Bryan O'Sullivan
Don Stewart
Donnie Jones
Duncan Coutts
Johan Tibell